TurboVec
Rust-powered vector index with 2-4 bit TurboQuant compression for SIMD-accelerated RAG search.
Pick TurboVec if you need a lightweight in-process ANN index for RAG and want to fit millions of embeddings in a few GB of RAM.
Skip it if you need a managed, multi-tenant vector database with replication, persistence guarantees, and a mature ops story.
TurboVec is an open-source vector index library written in Rust with Python bindings, implementing Google Research's TurboQuant quantization algorithm. It aggressively compresses high-dimensional embeddings down to 2-4 bits per dimension while keeping similarity search fast through hand-written SIMD kernels (ARM NEON, x86 AVX-512BW). The author claims a 10M-document corpus that would normally need 31GB of RAM fits in roughly 4GB with TurboVec.
The project is aimed at engineers building RAG pipelines or embedding-search backends who don't want to run a full vector database like Qdrant or Weaviate, but who also can't afford the memory cost of naive in-memory cosine search at scale. Unlike FAISS or HNSWlib, TurboVec emphasizes online ingestion with no training phase or hyperparameter tuning, plus filtered search via ID allowlists. It ships integrations for LangChain, LlamaIndex, Haystack, and Agno, so it can drop into existing retrieval stacks. It's free under MIT, requires Python 3.9+, and works on macOS, Linux, and Windows. Current release is 0.8.0 (June 2026), authored by Ryan Codrai — still pre-1.0 and relatively niche compared to mainstream ANN libraries.
TurboVec is a genuinely interesting bet: a Rust implementation of TurboQuant exposed to Python, aimed squarely at RAG engineers who find FAISS heavy and hosted vector DBs overkill. It's early-stage and single-maintainer, so we'd reach for it on side projects and prototypes before trusting it in production.
— The AI Tool Bible editorial team
Pros
- ✅ Aggressive 2-4 bit quantization shrinks RAM cost ~8x vs float32
- ✅ Hand-tuned SIMD kernels for ARM NEON and x86 AVX-512BW
- ✅ Online ingestion, no training step or hyperparameter tuning
- ✅ Drop-in integrations for LangChain, LlamaIndex, Haystack, Agno
- ✅ MIT licensed and cross-platform
Cons
- ⚠️ Pre-1.0 (0.8.0) and authored by a single developer
- ⚠️ Niche compared to FAISS, HNSWlib, or hosted vector DBs
- ⚠️ Limited ecosystem, docs, and production track record
Use cases
Explore related
Compare with similar tools
All in RAG →Pinecone
FeaturedManaged vector database for production-scale similarity search.
LlamaIndex
FeaturedData framework for connecting LLMs to your data.
Weaviate
Open-source vector DB with hybrid search and modules.
LangChain
The broad LLM application framework — chains, agents, retrievers.
Vespa
Yahoo's open-source search engine with vector + sparse retrieval.
Chroma
Embedded, developer-friendly vector store for Python.