quantvec
Data-oblivious, zero-training vector quantization & search for TypeScript. Clean-room TurboQuant + RaBitQ . Runs in Node, browsers, Bun, and edge runtimes.
Get started → · GitHub · Architecture
Why quantvec
A data-independent random rotation makes every coordinate follow a known distribution, so an optimal scalar quantizer works with no fit step. Add vectors and search immediately.
Zero training, instant ingestBit-pack to 2, 3, or 4 bits per coordinate — about 16× smaller than float32 at 4-bit / d=1536, with the RaBitQ unbiased inner-product estimate preserving recall.
2–4 bit compressionOne package for Node, browsers, Bun, Cloudflare Workers, and React Native. Standard ESM/CJS builds with TypeScript types.
IsomorphicRestrict results by an allowlist mask or a per-id predicate. A qdrant-style filter DSL lands with the ergonomic layer.
Filtered searchQuick start
npm install quantvec # or: bun add quantvec / pnpm add quantvecimport { TurboQuantIndex } from 'quantvec';
// No training, no fit step — construct and add.
const index = new TurboQuantIndex({ dim: 768, bits: 4 });
index.add(vectors); // Float32Array (flat) | number[][] | Float32Array[]
const { indices, scores } = index.search(query, 10); // top-10, best-firstNeed stable external ids, payload filtering, and persistence? Reach for the id-keyed index:
import { IdMapIndex } from 'quantvec';
const index = new IdMapIndex({ dim: 768, bits: 4, metric: 'cosine' });
index.addWithIds(['a', 'b', 'c'], vectors);
const { ids, scores } = index.search(query, 10, {
filter: id => id !== 'b', // optional allowlist predicate
});
const bytes = index.toBytes(); // versioned, self-describing binary
const restored = IdMapIndex.fromBytes<string>(bytes);How it works
- Normalize each vector (store its norm).
- Random rotation (data-independent) so each coordinate follows a known Beta distribution.
- Lloyd-Max scalar quantization — the MSE-optimal codebook for that known distribution.
- Bit-pack to 2/3/4 bits per coordinate.
- RaBitQ length-renormalization scale per vector → an unbiased inner-product estimate at query time.
Read the Architecture page for the full pipeline, or jump to the API Reference.