📖 The AI Tool Bible

Vanna.ai

✓ Editorially verified

Open-source text-to-SQL agent that learns your schema and writes queries against your real warehouse.

Freemium· Open-source free; paid cloud tier for hosted admin featuresRAGMulti-model (Anthropic, OpenAI, Gemini, Ollama)
Visit website →
Best for

Pick Vanna.ai if you want a self-hostable, model-agnostic text-to-SQL layer you can train on your own warehouse without shipping schemas to a closed SaaS.

Skip if

Skip it if you want a no-code BI dashboard out of the box or have no appetite to curate training examples for accuracy.

Vanna is a Python framework (and hosted cloud product) that turns natural-language questions into executable SQL against your own database. It connects to SQLite, Postgres, MySQL, Snowflake, BigQuery and other common engines, runs a RAG layer over your DDL, documentation, and example queries, then asks the LLM of your choice to produce a query, execute it, and return results plus a chart. The 2.0 release adds multi-turn conversations and an admin layer with access control, audit logs, and observability.

The differentiator is honesty about how text-to-SQL actually works: instead of pretending one zero-shot prompt is enough, Vanna leans on a trainable vector store of your schema and prior good queries, and it's model-agnostic across Anthropic, OpenAI, Gemini, and local Ollama. The core framework is MIT-licensed and self-hostable for free; the cloud tier is for teams that want a managed vector store, governance, and a hosted UI rather than wiring Streamlit/Flask themselves. It's aimed at data teams who want analyst-style self-serve without handing the warehouse to a black-box SaaS.

Because it's a library first, integrations are flexible: bring your own LLM, your own vector DB (Chroma, pgvector, Pinecone, etc.), and your own front-end. The trade-off is that quality scales with how much training data (DDL + curated Q/SQL pairs) you feed it, and it inherits whatever the underlying LLM gets wrong about joins on messy schemas.

Editor's take

Vanna is the most credible open-source take on text-to-SQL because it treats schema as a retrieval problem, not a prompting trick. The framework is genuinely useful even if you never touch the cloud tier, and being LLM-agnostic future-proofs it. Just budget time to feed it good examples; that's where the accuracy actually comes from.

— The AI Tool Bible editorial team

Pros

  • MIT-licensed core; fully self-hostable with your own LLM and vector store
  • Model-agnostic across Anthropic, OpenAI, Gemini, and local Ollama
  • Trainable on your schema, docs, and prior queries via RAG (not zero-shot)
  • Connects directly to Snowflake, BigQuery, Postgres, MySQL, SQLite and more
  • Cloud tier adds access control, audit logs, and observability for teams

Cons

  • ⚠️ Quality depends heavily on how much training data you curate
  • ⚠️ Self-hosted setup requires Python and some glue work
  • ⚠️ Inherits LLM hallucinations on complex joins or messy schemas

Use cases

text-to-sqlnatural-language-bidata-analyticswarehouse-queryingrag-over-schema

Explore related

Compare with similar tools

All in RAG