📖 The AI Tool Bible

TurboVec

Rust-powered vector index with 2-4 bit TurboQuant compression for SIMD-accelerated RAG search.

Free· Free, MIT licensedRAG
Visit website →
Best for

Pick TurboVec if you need a lightweight in-process ANN index for RAG and want to fit millions of embeddings in a few GB of RAM.

Skip if

Skip it if you need a managed, multi-tenant vector database with replication, persistence guarantees, and a mature ops story.

TurboVec is an open-source vector index library written in Rust with Python bindings, implementing Google Research's TurboQuant quantization algorithm. It aggressively compresses high-dimensional embeddings down to 2-4 bits per dimension while keeping similarity search fast through hand-written SIMD kernels (ARM NEON, x86 AVX-512BW). The author claims a 10M-document corpus that would normally need 31GB of RAM fits in roughly 4GB with TurboVec.

The project is aimed at engineers building RAG pipelines or embedding-search backends who don't want to run a full vector database like Qdrant or Weaviate, but who also can't afford the memory cost of naive in-memory cosine search at scale. Unlike FAISS or HNSWlib, TurboVec emphasizes online ingestion with no training phase or hyperparameter tuning, plus filtered search via ID allowlists. It ships integrations for LangChain, LlamaIndex, Haystack, and Agno, so it can drop into existing retrieval stacks. It's free under MIT, requires Python 3.9+, and works on macOS, Linux, and Windows. Current release is 0.8.0 (June 2026), authored by Ryan Codrai — still pre-1.0 and relatively niche compared to mainstream ANN libraries.

Editor's take

TurboVec is a genuinely interesting bet: a Rust implementation of TurboQuant exposed to Python, aimed squarely at RAG engineers who find FAISS heavy and hosted vector DBs overkill. It's early-stage and single-maintainer, so we'd reach for it on side projects and prototypes before trusting it in production.

— The AI Tool Bible editorial team

Pros

  • Aggressive 2-4 bit quantization shrinks RAM cost ~8x vs float32
  • Hand-tuned SIMD kernels for ARM NEON and x86 AVX-512BW
  • Online ingestion, no training step or hyperparameter tuning
  • Drop-in integrations for LangChain, LlamaIndex, Haystack, Agno
  • MIT licensed and cross-platform

Cons

  • ⚠️ Pre-1.0 (0.8.0) and authored by a single developer
  • ⚠️ Niche compared to FAISS, HNSWlib, or hosted vector DBs
  • ⚠️ Limited ecosystem, docs, and production track record

Use cases

vector-searchragembedding-compressionann-indexfiltered-search

Explore related

Compare with similar tools

All in RAG