Spaces:
Sleeping
π§© NL2SQL Copilot
A modular Text-to-SQL Copilot that converts natural-language questions into safe, verified SQL queries. Built with FastAPI, LangGraph, and SQLAlchemy, designed for read-only databases and benchmarked on Spider and Dr.Spider datasets.
π‘ Why it matters In real analytics teams, analysts often need quick insights without writing SQL. NL2SQL Copilot bridges that gap by translating plain-English questions into validated, read-only SQL β reducing query errors and saving hours of analyst time.
𧬠Evolution Note This repository is the next-generation version of NL2SQL Copilot Prototype. It refactors the original prototype into a production-grade, modular architecture β adding configuration-driven pipelines, safety layers, benchmarks, and a Streamlit UI for evaluation.
π Quick Start
1οΈβ£ Clone the repo
git clone https://github.com/melika-kheirieh/nl2sql-copilot.git
cd nl2sql-copilot
2οΈβ£ Build and run with Docker
docker build -t nl2sql-copilot .
docker run --rm -p 8000:8000 nl2sql-copilot
Then open http://localhost:8000/docs π Or launch the Streamlit Demo to test it interactively.
π§ Demo
π― Live Demo: Try it on Hugging Face Spaces β
You can ask a question in plain English β the Copilot plans, generates, verifies, and safely executes an SQL query.
User Query
show top 5 albums by total sales
Generated SQL
SELECT albums.Title, SUM(invoice_items.UnitPrice * invoice_items.Quantity) AS total_sales
FROM albums
JOIN tracks ON albums.AlbumId = tracks.AlbumId
JOIN invoice_items ON invoice_items.TrackId = tracks.TrackId
GROUP BY albums.Title
ORDER BY total_sales DESC
LIMIT 5;
Execution Result (preview)
| Album | Total Sales |
|---|---|
| Greatest Hits | 155.34 |
| Let There Be Rock | 133.09 |
| Big Ones | 128.44 |
Trace
[
{"stage": "planner", "duration_ms": 38, "summary": "Identified SQL intent"},
{"stage": "generator", "duration_ms": 201, "summary": "LLM generated SQL"},
{"stage": "safety", "duration_ms": 6, "summary": "Validated SELECT-only"},
{"stage": "executor", "duration_ms": 92, "summary": "Executed successfully"}
]
π§± Project Structure
nl2sql-copilot/
β
βββ app/ # FastAPI app, routers, schemas
βββ nl2sql/ # Core pipeline (Planner β Generator β Safety β Executor β Verifier)
βββ adapters/ # Database and LLM adapters
βββ benchmarks/ # Evaluation scripts and results
βββ ui/ # Streamlit dashboard (demo + benchmark)
βββ configs/ # Pipeline configs (YAML-based)
β
βββ Dockerfile
βββ requirements.in / .txt
βββ README.md
βοΈ How It Works
The Copilot runs a multi-stage pipeline ensuring every SQL query is both correct and safe:
Natural Language
β
[ Planner ] β [ Generator (LLM) ] β [ Safety ] β [ Executor ] β [ Verifier ] β [ Repair ]
Each stage is modular and configurable via configs/pipeline.yaml.
All queries execute inside a read-only sandbox.
π Safety Layer
Before execution, every SQL statement is validated:
| Rule | Example Blocked |
|---|---|
| DML not allowed | DELETE FROM users |
| Multi-statement | SELECT *; DROP TABLE users |
| Forbidden keywords | ALTER, TRUNCATE, UPDATE |
β
Only safe, single-statement SELECT queries are executed.
π Benchmark (sample)
Evaluated on a subset of the Spider dataset using gpt-4o-mini:
| Query | Type | Correct | Latency (ms) | Model |
|---|---|---|---|---|
| list all artists | simple select | β | 118 | gpt-4o-mini |
| total invoices per country | aggregation | β | 127 | gpt-4o-mini |
| top 3 customers by spending | aggregation | β | 141 | gpt-4o-mini |
| albums released before 2000 | filter | β | 122 | gpt-4o-mini |
| top 5 sales by genre | join | β | 149 | gpt-4o-mini |
(see benchmarks/results.csv for detailed results)
π§© Key Features
- β Modular pipeline (Planner β Generator β Safety β Executor β Verifier β Repair)
- π‘οΈ SQL safety filters (SELECT-only, blacklist, AST validation)
- π Self-repair loop for failed executions
- π§ LLM-driven generator (OpenAI / Ollama / Anthropic)
- π Evaluation toolkit for latency / accuracy / cost
- βοΈ Config-driven architecture (
Pipeline.from_config("configs/pipeline.yaml")) - π§° PostgreSQL + SQLite adapters
- ποΈ Streamlit UI for interactive demo & benchmark
- π§© Built with FastAPI, LangGraph, Pydantic-AI, SQLAlchemy
π§° Tech Stack
| Layer | Tools / Libraries |
|---|---|
| Backend API | FastAPI, Uvicorn |
| Pipeline Core | Python 3.12, Pydantic, SQLGlot |
| LLM Interface | Pydantic-AI (OpenAI / Anthropic / Ollama) |
| Database | SQLite (default), PostgreSQL |
| Evaluation | Spider / Dr.Spider |
| UI | Streamlit + Plotly |
| CI/CD | GitHub Actions, Makefile, Docker |
π§ͺ Development
pip install -r requirements.txt
pytest -q
ruff check .
mypy .
π§ Roadmap
- Add multilingual query support (Persian / English)
- Improve self-repair accuracy
- Add cost tracking per query
- Integrate Prometheus metrics
π License
MIT Β© 2025 Melika Kheirieh
