The RAG stack that actually scales
Twelve pieces — embedding, vector store, retrieval, eval, framework — that hold up past a prototype.
The RAG hello-world looks easy. The production RAG stack is a different beast: you need an embedding model, a vector store, a retrieval framework, a reranker, and an eval harness that catches regressions before users do. These twelve tools cover every layer. Pick one from each — don't pick two from the same.
Pinecone
FeaturedManaged vector database for production-scale similarity search.
Weaviate
Open-source vector DB with hybrid search and modules.
Chroma
Embedded, developer-friendly vector store for Python.
Vespa
Yahoo's open-source search engine with vector + sparse retrieval.
LlamaIndex
FeaturedData framework for connecting LLMs to your data.
LangChain
The broad LLM application framework — chains, agents, retrievers.
Feast
Open-source feature store that serves consistent features to ML training and online inference, with RAG vector search built in.
RAGFlow
Open-source RAG engine with deep document parsing, hybrid search, and visual agent orchestration.
Humata.ai
Chat-with-your-documents RAG tool with citation-backed answers across uploaded PDFs and files.
Exa
Web search API built for AI agents, with structured outputs and token-efficient highlights.
NotebookLM
Google's source-grounded research notebook that turns your documents into chats, briefs, and AI-hosted podcasts.
Scite
AI research assistant that grades citations as supporting, contrasting, or mentioning across 1.6B citation statements.
Cohere
Enterprise-grade LLM platform built for private, secure, and customizable deployment.