Text2SQL, or Chat2SQL tools convert natural language or questions into SQL queries. Imagine having ChatGPT write beautiful, correct, and useful SQL queries for you!
These tools started to bridge the gap between non-tech users and databases, by allowing them to interact with databases using natural language and reduce the barrier to accessing and analyzing data. But with the advance of AI models, these tools now support more advanced features such as handling complex queries, joining multiple tables, or even supporting natural language conversations.
They can also help improve productivity by automating the process of generating SQL queries, thereby saving time and effort.
In this edition of Star History monthly, we have compiled a collection of open-source Text2SQL tools.
Chat2DB
Chat2DB aims to be a general-purpose SQL client and reporting tool that incorporates AI capabilities from the start. It supports connection to a handful of databases including MySQL, Postgres, Oracle, SQL Server, SQLite, ClickHouse and more.
There was a bit of drama involving Chat2DB a while ago, we won't get into details here but curious to know what you think.
SQL Chat
SQL Chat is a chat-based SQL client, and you can use natural language to communicate with your database to implement operations, such as query, modification, addition, and deletion (!) of the database.
It currently supports MySQL, Postgres, SQL Server and TiDB serverless.
It's open-sourced by Bytebase, a database migration tool for teams.
Vanna
Vanna is a Python framework that allows the training of an RAG model with queries, DDL, and documentation from a database.
You can use Vanna as is, or build your own custom UI with an existing tool (e.g. Streamlit, Slack).
It was open-sourced in July 2023, and got really popular this January.
DuckDB-NSQL
DuckDB-NSQL is a Text2SQL LLM built for local DuckDB SQL analytics tasks, by MontherDuck and Numbers Station. This can certainly help users leverage the full power of DuckDB and its analytic potential, without having to go back and forth between the DuckDB documentation and the SQL shell.
Langchain
With Langchain, you can build a Q&A chain and agent over an SQL database yourself.
LangChain also has an SQL Agent that you can add onto the chain. It can not only answer questions based on the databases’ schema and content, but also recover from errors by running a generated query, catching the traceback and regenerating it correctly.
Awesome Text2SQL
Awesome Text2SQL is a suite of curated tutorials and resources for LLMs, Text2SQL, Text2DSL, Text2API, Text2Vis and more. Most of the models are LLM+Text2SQL, and for each model, there are links for papers, code, dataset. If you want to dive deep into Text2SQL, take a look.
To Wrap up
LLM or not, you should still be extra careful when executing model-generated SQL queries. Some ways to minimize risks include describing your database schema, data; constraining the size of the output; validating and reviewing the generated SQL queries before executing them.
Lastly
If you want more AI content, check out earlier editions of the Star History open-source monthly: