Vanna AI is an open-source Python framework that transforms natural language questions into SQL queries using RAG-powered retrieval augmented generation. With 12,000+ GitHub stars and MIT licensing, it has become the most popular open-source text-to-SQL solution, enabling anyone who can type a question to query a database without writing SQL. Version 2.0, released in 2026, represents a complete rewrite focused on production-ready agent-based architecture with user-aware execution, streaming UI components, and enterprise security features.
The technical approach is straightforward but effective. Vanna stores question-SQL pairs and schema documentation in a vector store, building a RAG application on top. When a user submits a natural language question, Vanna searches the vector store for similar examples, retrieves relevant schema context, and sends everything to an LLM to generate SQL. The generated query is executed against the database, results are returned as data tables, and the LLM generates Plotly chart code for visualization. The system learns from each successful interaction, improving accuracy over time.
Database and LLM support is comprehensive. Vanna works with PostgreSQL, MySQL, Snowflake, BigQuery, Redshift, SQLite, Oracle, SQL Server, DuckDB, ClickHouse, Apache Druid, and more. For LLMs, it supports OpenAI, Anthropic Claude, Google Gemini, Azure OpenAI, Ollama for local models, and NVIDIA NIM for accelerated inference. Vector store options include ChromaDB, Milvus, Pinecone, and others. This BYOM (bring your own model) flexibility means you can run entirely on-premises with Ollama and a local vector store if data privacy requires it.
Version 2.0 introduced fundamental architecture changes. The agent-based API replaces the legacy VannaBase class methods, making every component user-aware. User identity flows through system prompts, tool execution, and SQL filtering, enabling row-level security where queries are automatically filtered per user permissions. Audit logs track every query per user for compliance. Rate limiting via lifecycle hooks provides per-user quotas. The pre-built vanna-chat web component drops into any existing webpage with a single script tag, supporting React, Vue, or plain HTML.
The production-readiness story is where Vanna 2.0 differentiates from earlier versions and competitors. User-scoped execution means the agent knows who is asking and respects their permissions. Lifecycle hooks enable quota checking, custom logging, and content filtering without modifying core code. The streaming architecture delivers rich UI components — tables, charts, and interactive elements — in real-time rather than waiting for complete responses. The tool registry system allows custom tools to be built and registered, extending agent capabilities beyond SQL generation.
Delivery mechanisms are flexible. You can use Vanna through Jupyter notebooks for data exploration, Streamlit for quick dashboards, Flask for custom web applications, Slack for team-wide access, or the built-in web server with the vanna-chat component. A LangChain integration allows Vanna to be used as a tool within larger agent workflows, enabling multi-tool routing where the LLM decides whether a question needs database access or other capabilities.