Spaces:
Running
Running
| title: NL2SQL Copilot β Full Stack Demo | |
| emoji: π§© | |
| colorFrom: indigo | |
| colorTo: blue | |
| sdk: docker | |
| python_version: "3.12" | |
| pinned: false | |
| # π§© NL2SQL Copilot | |
| [](https://github.com/melika-kheirieh/nl2sql-copilot/actions/workflows/ci.yml) | |
| [](#) | |
| [](LICENSE) | |
| A production-grade **Text-to-SQL Copilot** that converts natural-language questions into **safe, verified SQL**. | |
| Built for analytics engineers who need accuracy, transparency, and control β powered by **FastAPI**, **LangGraph**, and **Pydantic-AI**. | |
| --- | |
| ## π Overview | |
| `NL2SQL Copilot` is an **agentic, modular pipeline** that plans, generates, verifies, and repairs SQL queries. | |
| It ensures correctness and safety through structured stages, evaluation on the **Spider** dataset, and full observability support. | |
| > π‘ Designed for **read-only production databases** with **self-repair**, **metrics**, and **CI/CD** baked in. | |
| --- | |
| ## π§ Agentic Architecture | |
| ``` | |
| Natural Language | |
| β | |
| [ Detector ] | |
| β | |
| [ Planner ] | |
| β | |
| [ Generator (LLM) ] | |
| β | |
| [ Safety ] | |
| β | |
| [ Executor ] | |
| β | |
| [ Verifier ] | |
| β | |
| [ Repair ] | |
| ```` | |
| Each stage is isolated, configurable via YAML, and observable through structured traces and Prometheus metrics. | |
| | Stage | Responsibility | | |
| |--------|----------------| | |
| | **Detector** | Identify whether a query is Text-to-SQL | | |
| | **Planner** | Extract user intent and SQL plan | | |
| | **Generator** | Call LLM to synthesize SQL | | |
| | **Safety** | Block unsafe or non-SELECT queries | | |
| | **Executor** | Execute query in read-only sandbox | | |
| | **Verifier** | Compare results, detect mismatch | | |
| | **Repair** | Self-healing loop triggered on failure | | |
| --- | |
| ## π Benchmark (Spider dataset) | |
| Dataset: [Spider](https://yale-lily.github.io/spider) by Yale LILY Lab. | |
| Evaluated on the **Spider dev subset (20 samples)** using the reproducible evaluation toolkit. | |
| | Metric | Value | | |
| |--------|--------| | |
| | EM (Exact Match) | 0.15 | | |
| | SM (Structural Match) | 0.70 | | |
| | ExecAcc (Execution Accuracy) | 0.73 | | |
| | Avg Latency | 8.11 s | | |
| | p50 Latency | 9.42 s | | |
| | p95 Latency | 13.88 s | | |
| > High **Structural Match** and **Execution Accuracy** indicate strong semantic correctness; | |
| > lower EM reflects harmless formatting differences. | |
| Run reproducible benchmarks: | |
| ```bash | |
| export SPIDER_ROOT="$PWD/data/spider" | |
| PYTHONPATH=$PWD python benchmarks/evaluate_spider_pro.py --spider --split dev --limit 20 | |
| PYTHONPATH=$PWD python benchmarks/plot_results.py | |
| ```` | |
| Results & plots β `benchmarks/results_pro/20251109-171247/` | |
|  | |
| --- | |
| ## βοΈ Key Features | |
| β **Agentic architecture** β multi-stage pipeline with feedback loop | |
| π‘οΈ **Safety layer** β SELECT-only guardrails and AST validation | |
| π **Self-repair** β automatic retry when verification fails | |
| π **Reproducible evaluation** β integrated Spider / Dr.Spider benchmarking | |
| π¦ **Config-driven design** β YAML pipeline factory | |
| π§© **Plug-and-play adapters** β SQLite / PostgreSQL / OpenAI / Anthropic / Ollama | |
| π§ **FastAPI service + Streamlit UI** β demo or API mode | |
| π§° **CI/CD ready** β Makefile, Ruff, Mypy, Pytest, Docker, GitHub Actions | |
| π **Observability stack** β Prometheus & Grafana metrics for latency and errors | |
| --- | |
| ## π§© Observability & GenAIOps | |
| Monitor every stage of the pipeline in real-time: | |
| * `/metrics` endpoint exposed via FastAPI | |
| * Prometheus + Grafana stack with `make obs-up` | |
| * Metrics tracked: | |
| * `nl2sql_stage_latency_ms` | |
| * `nl2sql_stage_error_total` | |
| * `nl2sql_query_exec_count` | |
| * `nl2sql_repair_success_rate` | |
| ```bash | |
| make obs-up # start Prometheus + Grafana | |
| make obs-down # stop the stack | |
| ``` | |
| --- | |
| ## π§ͺ Quick Start | |
| ### 1οΈβ£ Clone & Run | |
| ```bash | |
| git clone https://github.com/melika-kheirieh/nl2sql-copilot.git | |
| cd nl2sql-copilot | |
| make run | |
| ``` | |
| Or build with Docker: | |
| ```bash | |
| docker build -t nl2sql-copilot . | |
| docker run --rm -p 8000:8000 nl2sql-copilot | |
| ``` | |
| API available at [http://localhost:8000/docs](http://localhost:8000/docs) | |
| Streamlit demo at [http://localhost:7860](http://localhost:7860) | |
| --- | |
| ## π§ For Developers & CI/CD | |
| ```bash | |
| make lint # Ruff | |
| make typecheck # Mypy | |
| make test # Pytest | |
| make bench # Run benchmark suite | |
| ``` | |
| ### CI/CD Highlights | |
| * Runs on GitHub Actions (`make check`) | |
| * Enforces formatting, typing, tests, and Docker build | |
| * Publishes Docker image to GHCR on successful merge | |
| --- | |
| ## π― Why it matters | |
| * Bridges **natural language and databases** with measurable reliability | |
| * Provides **reproducible evaluation** for continuous model tracking | |
| * Delivers **production-level resilience** via self-repair and observability | |
| * Demonstrates **AI software engineering** beyond prompt design | |
| --- | |
| ## π€ Author | |
| **Melika Kheirieh** | |
| AI Engineer & Researcher in Natural Language Interfaces for Databases | |
| [GitHub](https://github.com/melika-kheirieh) Β· [LinkedIn](https://www.linkedin.com/in/melika-kheirieh-03a7b5176/) | |
| > This project evolved from [NL2SQL Copilot Prototype](https://github.com/melika-kheirieh/nl2sql-copilot-prototype), refactored into a production-grade, modular agent. | |
| --- | |
| ## π License | |
| MIT Β© 2025 Melika Kheirieh | |