Skip to Content
DocumentationBenchmarks

Benchmarks

Three harnesses in benchmarks/: synthetic (no download), SIFT-small (~5 MB), and GloVe-200 (~918 MB).

SIFT-small (real dataset)

npm run bench:real — 10k × 128-d vectors, 100 queries, 100-NN L2 ground truth from the dataset. dim=128 is a power of two → FWHT rotation + WASM kernel active:

bitsrecall@1recall@10recall@100encode (vec/s)QPSfastScan QPScompression
20.6200.6700.744~269k~105012.8×
30.7200.8010.863~197k~10849.1×
40.8600.8880.928~177k~1152~20557.1×

GloVe-200 (real text embeddings)

npm run bench:glove — 100k of 1.18M × 200-d GloVe word vectors, 1000 queries, brute-force cosine ground truth within the sub-sample. dim=200 is not a power of two → dense Householder rotation:

bitsrecall@1recall@10recall@100encode (vec/s)QPSfastScan QPScompression
20.5500.6100.653~27k~6913.8×
30.7300.7810.814~20k~729.6×
40.8450.8800.901~19k~71~4567.4×

Encode throughput is lower than SIFT-small because dim=200 uses the O(d²) dense rotation; SIFT-small uses the O(d·log d) FWHT.

FastScan speedup

FastScan (fastscan: true, 4-bit only) speedup scales with corpus size:

corpusexact WASMv128 FastScanspeedup
10k vecs~1152 QPS~2055 QPS1.8×
50k vecs~240 QPS~1350 QPS5.7×

The SIMD scan cost is O(n) while the rescore-pool overhead is constant, so the gain grows with n.

Synthetic (dataset-free)

npx tsx benchmarks/flat.ts — seeded PRNG, deterministic, no download. dim=768, n=5000, queries=500, anisotropy=0.3, cosine:

bitsrecall@10fastScan QPScompression
20.62515.4×
30.79410.4×
40.887~5287.8×

What’s measured

  • recall@{1,10,100} — fraction of the true top-k returned, averaged over queries.
  • encode throughput (vectors/sec, single-threaded).
  • QPS — exact WASM kernel path, single-threaded.
  • fastScan QPS (4-bit only) — separate measurement with fastscan: true; for 2/3-bit rows.
  • compression — float32 bytes ÷ serialized toBytes() bytes (true bit-packing).

Honest accounting

  • Recall regime. Synthetic isotropic Gaussian data (ANISOTROPY=1) is a worst case — neighbors are near-tied and a data-oblivious quantizer can’t perfectly order them. Real embeddings have a power-law spectrum that lifts recall, consistent with the TurboQuant paper’s >90% recall@10 at 2–4 bits on real DBpedia/OpenAI/GloVe data.
  • GloVe sub-sample. Ground truth is computed by brute-force cosine within the 100k sub-sample. The ann-benchmarks pre-computed indices reference the full 1.18M corpus; using them for a sub-sample yields misleadingly low recall@100 (only ~8% of true 1.18M neighbors land in 100k).
  • TQ+ calibration (calibrate: true) is opt-in and off in these runs. It can lift recall on real embeddings; neutral-to-negative on well-conditioned synthetic data.
  • FWHT is used automatically for power-of-two dims (128, 256, 512, 768, 1024, 1536…); O(d·log d) vs O(d²) for the dense rotation — ~25× faster encode at no recall cost.
Last updated on