nl2sql-copilot / README.md
Melika Kheirieh
docs(readme): add evolution note and improve intro narrative
602cae0
|
raw
history blame
6.35 kB

🧩 NL2SQL Copilot

A modular Text-to-SQL Copilot that converts natural-language questions into safe, verified SQL queries. Built with FastAPI, LangGraph, and SQLAlchemy, designed for read-only databases and benchmarked on Spider and Dr.Spider datasets.


πŸ’‘ Why it matters In real analytics teams, analysts often need quick insights without writing SQL. NL2SQL Copilot bridges that gap by translating plain-English questions into validated, read-only SQL β€” reducing query errors and saving hours of analyst time.

🧬 Evolution Note This repository is the next-generation version of NL2SQL Copilot Prototype. It refactors the original prototype into a production-grade, modular architecture β€” adding configuration-driven pipelines, safety layers, benchmarks, and a Streamlit UI for evaluation.


πŸš€ Quick Start

1️⃣ Clone the repo

git clone https://github.com/melika-kheirieh/nl2sql-copilot.git
cd nl2sql-copilot

2️⃣ Build and run with Docker

docker build -t nl2sql-copilot .
docker run --rm -p 8000:8000 nl2sql-copilot

Then open http://localhost:8000/docs πŸš€ Or launch the Streamlit Demo to test it interactively.


🧠 Demo

🎯 Live Demo: Try it on Hugging Face Spaces β†’

You can ask a question in plain English β€” the Copilot plans, generates, verifies, and safely executes an SQL query.

User Query

show top 5 albums by total sales

Generated SQL

SELECT albums.Title, SUM(invoice_items.UnitPrice * invoice_items.Quantity) AS total_sales
FROM albums
JOIN tracks ON albums.AlbumId = tracks.AlbumId
JOIN invoice_items ON invoice_items.TrackId = tracks.TrackId
GROUP BY albums.Title
ORDER BY total_sales DESC
LIMIT 5;

Execution Result (preview)

Album Total Sales
Greatest Hits 155.34
Let There Be Rock 133.09
Big Ones 128.44

Trace

[
  {"stage": "planner", "duration_ms": 38, "summary": "Identified SQL intent"},
  {"stage": "generator", "duration_ms": 201, "summary": "LLM generated SQL"},
  {"stage": "safety", "duration_ms": 6, "summary": "Validated SELECT-only"},
  {"stage": "executor", "duration_ms": 92, "summary": "Executed successfully"}
]

Demo Screenshot


🧱 Project Structure

nl2sql-copilot/
β”‚
β”œβ”€β”€ app/                 # FastAPI app, routers, schemas
β”œβ”€β”€ nl2sql/              # Core pipeline (Planner β†’ Generator β†’ Safety β†’ Executor β†’ Verifier)
β”œβ”€β”€ adapters/            # Database and LLM adapters
β”œβ”€β”€ benchmarks/          # Evaluation scripts and results
β”œβ”€β”€ ui/                  # Streamlit dashboard (demo + benchmark)
β”œβ”€β”€ configs/             # Pipeline configs (YAML-based)
β”‚
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ requirements.in / .txt
└── README.md

βš™οΈ How It Works

The Copilot runs a multi-stage pipeline ensuring every SQL query is both correct and safe:

Natural Language
   ↓
[ Planner ] β†’ [ Generator (LLM) ] β†’ [ Safety ] β†’ [ Executor ] β†’ [ Verifier ] β†’ [ Repair ]

Each stage is modular and configurable via configs/pipeline.yaml. All queries execute inside a read-only sandbox.


πŸ”’ Safety Layer

Before execution, every SQL statement is validated:

Rule Example Blocked
DML not allowed DELETE FROM users
Multi-statement SELECT *; DROP TABLE users
Forbidden keywords ALTER, TRUNCATE, UPDATE

βœ… Only safe, single-statement SELECT queries are executed.


πŸ“Š Benchmark (sample)

Evaluated on a subset of the Spider dataset using gpt-4o-mini:

Query Type Correct Latency (ms) Model
list all artists simple select βœ… 118 gpt-4o-mini
total invoices per country aggregation βœ… 127 gpt-4o-mini
top 3 customers by spending aggregation βœ… 141 gpt-4o-mini
albums released before 2000 filter βœ… 122 gpt-4o-mini
top 5 sales by genre join βœ… 149 gpt-4o-mini

(see benchmarks/results.csv for detailed results)


🧩 Key Features

  • βœ… Modular pipeline (Planner β†’ Generator β†’ Safety β†’ Executor β†’ Verifier β†’ Repair)
  • πŸ›‘οΈ SQL safety filters (SELECT-only, blacklist, AST validation)
  • πŸ” Self-repair loop for failed executions
  • 🧠 LLM-driven generator (OpenAI / Ollama / Anthropic)
  • πŸ“Š Evaluation toolkit for latency / accuracy / cost
  • βš™οΈ Config-driven architecture (Pipeline.from_config("configs/pipeline.yaml"))
  • 🧰 PostgreSQL + SQLite adapters
  • πŸŽ›οΈ Streamlit UI for interactive demo & benchmark
  • 🧩 Built with FastAPI, LangGraph, Pydantic-AI, SQLAlchemy

🧰 Tech Stack

Layer Tools / Libraries
Backend API FastAPI, Uvicorn
Pipeline Core Python 3.12, Pydantic, SQLGlot
LLM Interface Pydantic-AI (OpenAI / Anthropic / Ollama)
Database SQLite (default), PostgreSQL
Evaluation Spider / Dr.Spider
UI Streamlit + Plotly
CI/CD GitHub Actions, Makefile, Docker

πŸ§ͺ Development

pip install -r requirements.txt
pytest -q
ruff check .
mypy .

🧭 Roadmap

  • Add multilingual query support (Persian / English)
  • Improve self-repair accuracy
  • Add cost tracking per query
  • Integrate Prometheus metrics

πŸ“„ License

MIT Β© 2025 Melika Kheirieh