Spaces:
Sleeping
Sleeping
File size: 5,410 Bytes
80c6154 3c2f1c5 80c6154 3c2f1c5 80c6154 8e92467 e250f79 8f50117 602cae0 8f50117 df092a2 8f50117 752cafc 8f50117 df092a2 8f50117 602cae0 8f50117 602cae0 8f50117 602cae0 8f50117 602cae0 8f50117 602cae0 8f50117 602cae0 8f50117 602cae0 8f50117 602cae0 8f50117 602cae0 8f50117 602cae0 8f50117 602cae0 8f50117 602cae0 8f50117 6430910 8f50117 6430910 8f50117 6430910 8f50117 6430910 8f50117 df092a2 8f50117 df092a2 8f50117 df092a2 8f50117 df092a2 8f50117 df092a2 8f50117 df092a2 8f50117 602cae0 8f50117 602cae0 8f50117 602cae0 8f50117 602cae0 8f50117 602cae0 8f50117 602cae0 8f50117 df092a2 8f50117 df092a2 8f50117 df092a2 8f50117 df092a2 8f50117 df092a2 8f50117 602cae0 8f50117 602cae0 8f50117 602cae0 8f50117 602cae0 8f50117 602cae0 8f50117 df092a2 8e92467 df092a2 8f50117 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 |
---
title: NL2SQL Copilot β Full Stack Demo
emoji: π§©
colorFrom: indigo
colorTo: blue
sdk: docker
python_version: "3.12"
pinned: false
---
# π§© NL2SQL Copilot
[](https://github.com/melika-kheirieh/nl2sql-copilot/actions/workflows/ci.yml)
[](#)
[](LICENSE)
A production-grade **Text-to-SQL Copilot** that converts natural-language questions into **safe, verified SQL**.
Built for analytics engineers who need accuracy, transparency, and control β powered by **FastAPI**, **LangGraph**, and **Pydantic-AI**.
---
## π Overview
`NL2SQL Copilot` is an **agentic, modular pipeline** that plans, generates, verifies, and repairs SQL queries.
It ensures correctness and safety through structured stages, evaluation on the **Spider** dataset, and full observability support.
> π‘ Designed for **read-only production databases** with **self-repair**, **metrics**, and **CI/CD** baked in.
---
## π§ Agentic Architecture
```
Natural Language
β
[ Detector ]
β
[ Planner ]
β
[ Generator (LLM) ]
β
[ Safety ]
β
[ Executor ]
β
[ Verifier ]
β
[ Repair ]
````
Each stage is isolated, configurable via YAML, and observable through structured traces and Prometheus metrics.
| Stage | Responsibility |
|--------|----------------|
| **Detector** | Identify whether a query is Text-to-SQL |
| **Planner** | Extract user intent and SQL plan |
| **Generator** | Call LLM to synthesize SQL |
| **Safety** | Block unsafe or non-SELECT queries |
| **Executor** | Execute query in read-only sandbox |
| **Verifier** | Compare results, detect mismatch |
| **Repair** | Self-healing loop triggered on failure |
---
## π Benchmark (Spider dataset)
Dataset: [Spider](https://yale-lily.github.io/spider) by Yale LILY Lab.
Evaluated on the **Spider dev subset (20 samples)** using the reproducible evaluation toolkit.
| Metric | Value |
|--------|--------|
| EM (Exact Match) | 0.15 |
| SM (Structural Match) | 0.70 |
| ExecAcc (Execution Accuracy) | 0.73 |
| Avg Latency | 8.11 s |
| p50 Latency | 9.42 s |
| p95 Latency | 13.88 s |
> High **Structural Match** and **Execution Accuracy** indicate strong semantic correctness;
> lower EM reflects harmless formatting differences.
Run reproducible benchmarks:
```bash
export SPIDER_ROOT="$PWD/data/spider"
PYTHONPATH=$PWD python benchmarks/evaluate_spider_pro.py --spider --split dev --limit 20
PYTHONPATH=$PWD python benchmarks/plot_results.py
````
Results & plots β `benchmarks/results_pro/20251109-171247/`

---
## βοΈ Key Features
β
**Agentic architecture** β multi-stage pipeline with feedback loop
π‘οΈ **Safety layer** β SELECT-only guardrails and AST validation
π **Self-repair** β automatic retry when verification fails
π **Reproducible evaluation** β integrated Spider / Dr.Spider benchmarking
π¦ **Config-driven design** β YAML pipeline factory
π§© **Plug-and-play adapters** β SQLite / PostgreSQL / OpenAI / Anthropic / Ollama
π§ **FastAPI service + Streamlit UI** β demo or API mode
π§° **CI/CD ready** β Makefile, Ruff, Mypy, Pytest, Docker, GitHub Actions
π **Observability stack** β Prometheus & Grafana metrics for latency and errors
---
## π§© Observability & GenAIOps
Monitor every stage of the pipeline in real-time:
* `/metrics` endpoint exposed via FastAPI
* Prometheus + Grafana stack with `make obs-up`
* Metrics tracked:
* `nl2sql_stage_latency_ms`
* `nl2sql_stage_error_total`
* `nl2sql_query_exec_count`
* `nl2sql_repair_success_rate`
```bash
make obs-up # start Prometheus + Grafana
make obs-down # stop the stack
```
---
## π§ͺ Quick Start
### 1οΈβ£ Clone & Run
```bash
git clone https://github.com/melika-kheirieh/nl2sql-copilot.git
cd nl2sql-copilot
make run
```
Or build with Docker:
```bash
docker build -t nl2sql-copilot .
docker run --rm -p 8000:8000 nl2sql-copilot
```
API available at [http://localhost:8000/docs](http://localhost:8000/docs)
Streamlit demo at [http://localhost:7860](http://localhost:7860)
---
## π§ For Developers & CI/CD
```bash
make lint # Ruff
make typecheck # Mypy
make test # Pytest
make bench # Run benchmark suite
```
### CI/CD Highlights
* Runs on GitHub Actions (`make check`)
* Enforces formatting, typing, tests, and Docker build
* Publishes Docker image to GHCR on successful merge
---
## π― Why it matters
* Bridges **natural language and databases** with measurable reliability
* Provides **reproducible evaluation** for continuous model tracking
* Delivers **production-level resilience** via self-repair and observability
* Demonstrates **AI software engineering** beyond prompt design
---
## π€ Author
**Melika Kheirieh**
AI Engineer & Researcher in Natural Language Interfaces for Databases
[GitHub](https://github.com/melika-kheirieh) Β· [LinkedIn](https://www.linkedin.com/in/melika-kheirieh-03a7b5176/)
> This project evolved from [NL2SQL Copilot Prototype](https://github.com/melika-kheirieh/nl2sql-copilot-prototype), refactored into a production-grade, modular agent.
---
## π License
MIT Β© 2025 Melika Kheirieh
|