Spaces:
Sleeping
Sleeping
File size: 8,085 Bytes
80c6154 3c2f1c5 80c6154 3c2f1c5 80c6154 cc371b0 8f50117 602cae0 3ef53b4 cc371b0 3ef53b4 cc371b0 3ef53b4 df092a2 cc371b0 752cafc cc371b0 3ef53b4 cc371b0 df092a2 cc371b0 3ef53b4 602cae0 cc371b0 602cae0 3ef53b4 602cae0 3ef53b4 cc371b0 3ef53b4 602cae0 3ef53b4 cc371b0 3ef53b4 602cae0 cc371b0 3ef53b4 cc371b0 3ef53b4 cc371b0 3ef53b4 602cae0 3ef53b4 cc371b0 3ef53b4 602cae0 8f50117 602cae0 3ef53b4 602cae0 3ef53b4 602cae0 cc371b0 602cae0 3ef53b4 cc371b0 602cae0 3ef53b4 cc371b0 3ef53b4 6430910 3ef53b4 cc371b0 3ef53b4 cc371b0 6430910 3ef53b4 6430910 cc371b0 3ef53b4 cc371b0 3ef53b4 6430910 3ef53b4 df092a2 cc371b0 df092a2 3ef53b4 df092a2 3ef53b4 df092a2 3ef53b4 602cae0 cc371b0 602cae0 3ef53b4 8f50117 3ef53b4 602cae0 cc371b0 3ef53b4 cc371b0 3ef53b4 cc371b0 3ef53b4 cc371b0 3ef53b4 cc371b0 3ef53b4 cc371b0 3ef53b4 cc371b0 3ef53b4 8f50117 cc371b0 602cae0 3ef53b4 602cae0 3ef53b4 cc371b0 3ef53b4 602cae0 8f50117 cc371b0 3ef53b4 cc371b0 3ef53b4 cc371b0 3ef53b4 cc371b0 8f50117 df092a2 3ef53b4 df092a2 cc371b0 df092a2 8f50117 cc371b0 8f50117 df092a2 3ef53b4 cc371b0 3ef53b4 cc371b0 3ef53b4 cc371b0 3ef53b4 cc371b0 3ef53b4 df092a2 8f50117 cc371b0 8f50117 602cae0 cc371b0 602cae0 cc371b0 602cae0 cc371b0 8f50117 3ef53b4 cc371b0 3ef53b4 8f50117 3ef53b4 8f50117 3ef53b4 8f50117 3ef53b4 cc371b0 3ef53b4 cc371b0 3ef53b4 602cae0 3ef53b4 df092a2 3ef53b4 df092a2 cc371b0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 |
---
title: NL2SQL Copilot โ Full Stack Demo
emoji: ๐งฉ
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false
---
# ๐งฉ **NL2SQL Copilot โ Natural-Language โ Safe SQL**
[](https://github.com/melika-kheirieh/nl2sql-copilot/actions/workflows/ci.yml)
[](#)
[](LICENSE)
**Modular Text-to-SQL Copilot built with FastAPI & Pydantic-AI.**
Generates *safe, verified, executable SQL* through a multi-stage agentic pipeline.
Includes: schema introspection, self-repair, Spider benchmarks, Prometheus metrics, and a full demo UI.
๐ **Live Demo:**
๐ **[https://huggingface.co/spaces/melika-kheirieh/nl2sql-copilot](https://huggingface.co/spaces/melika-kheirieh/nl2sql-copilot)**
---
# **1) Quick Start**
```bash
git clone https://github.com/melika-kheirieh/nl2sql-copilot
cd nl2sql-copilot
make setup # install dependencies
make run # start API + Gradio UI
```
Open:
* [http://localhost:8000](http://localhost:8000) (FastAPI Swagger UI)
* [http://localhost:7860](http://localhost:7860) (Gradio Demo)
---
# **2) Demo (Gradio UI)**
The demo supports:
* Uploading any SQLite database
* Asking natural-language questions
* Viewing generated SQL
* Viewing query results
* Full multi-stage trace (detector โ planner โ generator โ safety โ executor โ verifier โ repair)
* Per-stage timings
* Example queries
* And a default demo DB (no upload required)
Everything runs on the same backend as the API.
---
# **3) Agentic Architecture**
```
user query
โ
detector (ambiguity, missing info)
planner (schema reasoning + task decomposition)
generator (SQL generation)
safety (SELECT-only validation)
executor (sandboxed DB execution)
verifier (semantic + execution checks)
repair (minimal-diff SQL repair loop)
โ
final SQL + result + traces
```
### โ๏ธ Tech Stack
* FastAPI
* Pydantic-AI
* SQLiteAdapter
* Prometheus + Grafana
* pytest + mypy + Makefile
* Gradio UI
The pipeline is fully modular: each stage has a clean, swappable interface.
---
# **4) Evolution (Prototype โ Copilot)**
This project is the **second-generation, production-grade** version of an earlier prototype:
๐ [https://github.com/melika-kheirieh/nl2sql-copilot-prototype](https://github.com/melika-kheirieh/nl2sql-copilot-prototype)
The prototype explored single-step, prompt-based SQL generation.
The current version is a **complete architectural redesign**, adding:
* multi-stage agentic pipeline
* schema introspection
* safety guardrails
* self-repair loop
* caching
* observability
* Spider benchmarks
* multi-DB support with upload + TTL handling
This repository is the first **end-to-end, production-oriented** version.
---
# **5) Key Features**
### โ Agentic Pipeline
Planner โ Generator โ Safety โ Executor โ Verifier โ Repair.
### โ Schema-Aware
Automatic schema preview for any uploaded SQLite database.
### โ Safety by Design
* SELECT-only
* Column/table validation
* No multi-statement SQL
* Prevents schema hallucination
### โ Self-Repair
Automatic minimal-diff correction when SQL fails.
### โ Caching
TTL-based, with key = (db_id, normalized_query, schema_hash).
Hit/miss metrics included.
### โ Observability
* Per-stage latency
* Pipeline success ratio
* Repair success rate
* Cache hit ratio
* p95 latency
* Full Grafana dashboard
### โ Benchmarks
Reproducible Spider evaluation with plots + summary.
---
# **6) Benchmarks (Spider dev, 20 samples)**
[](#benchmarks)
Evaluated on a curated 20-sample subset of the Spider **dev** split
(focused on `concert_singer`), using the full production pipeline.
### ๐งฎ Summary
* **Total samples:** 20
* **Successful runs:** 20/20 (**100%**)
* **Exact Match (EM):** **0.10**
* **Structural Match (SM):** **0.70**
* **Execution Accuracy:** **0.725**
This reflects a *production-oriented* NL2SQL system:
the model optimizes for **executable SQL**, not literal gold-string alignment.
---
### โฑ Latency
* **Avg latency:** ~**8066 ms**
* **p50:** ~**9229 ms**
* **p95:** ~**14936 ms**
Latency is **bimodal**:
simple queries โ fast, reasoning-heavy queries โ planner-dominated.
---
### โ๏ธ Per-Stage Latency
| Stage | Avg latency (ms) |
| --------- | ---------------- |
| detector | ~1 |
| planner | ~8360 |
| generator | ~1645 |
| safety | ~2 |
| executor | ~1 |
| verifier | ~1 |
| repair | ~1200 |
Planner is the main bottleneck (expected for schema-level reasoning).
Safety/executor/verifier stay **single-digit ms**.
---
### โ Failure Modes (Why EM is low)
Even when EM = 0, **SM and ExecAcc are often 1.0**.
Typical causes:
* Capitalization differences (`Age` vs `age`)
* Different column ordering
* LIMIT differences
* Alias mismatch
* Gold SQL is `EMPTY` but the model infers a valid SQL
In real-world systems, **execution correctness matters more than exact string match**.
---
### ๐ Reproducing the Benchmark
```bash
export SPIDER_ROOT="$PWD/data/spider"
PYTHONPATH=$PWD \
python benchmarks/evaluate_spider_pro.py --spider --split dev --limit 20 --debug
PYTHONPATH=$PWD \
python benchmarks/plot_results.py
```
Artifacts saved under:
```
benchmarks/results_pro/<timestamp>/
summary.json
eval.jsonl
metrics_overview.png
latency_histogram.png
latency_per_stage.png
errors_overview.png
```
---
# **7) API Usage**
## ๐ NL โ SQL
```bash
curl -X POST "http://localhost:8000/api/v1/nl2sql" \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-key" \
-d '{
"query": "Top 5 customers by total invoice amount",
"db_id": null
}'
```
### โ Sample Response (accurate)
```json
{
"ambiguous": false,
"sql": "SELECT ...",
"rationale": "Explanation of why this SQL was generated.",
"result": {
"rows": 5,
"columns": ["CustomerId", "Total"],
"rows_data": [
[1, 39.6],
[2, 38.7],
[3, 35.4]
]
},
"traces": [
{"stage": "detector", "duration_ms": 1},
{"stage": "planner", "duration_ms": 8943},
{"stage": "generator","duration_ms": 1722},
{"stage": "safety", "duration_ms": 2},
{"stage": "executor", "duration_ms": 1},
{"stage": "verifier", "duration_ms": 1},
{"stage": "repair", "duration_ms": 522}
]
}
```
---
## ๐ Upload a SQLite DB
```bash
curl -X POST "http://localhost:8000/api/v1/nl2sql/upload_db" \
-H "X-API-Key: dev-key" \
-F "file=@/path/to/db.sqlite"
```
---
## ๐ Schema Preview
```bash
curl "http://localhost:8000/api/v1/nl2sql/schema?db_id=<uuid>" \
-H "X-API-Key: dev-key"
```
---
# **8) Environment Variables**
| Variable | Purpose |
| ---------------------- | ---------------------------------------- |
| `API_KEYS` | Comma-separated list of backend API keys |
| `API_KEY` | Used by Gradio UI to call the backend |
| `DEV_MODE` | Enables strict ambiguity detection |
| `NL2SQL_CACHE_TTL_SEC` | Cache TTL |
| `NL2SQL_CACHE_MAX` | Max cache entries |
| `SPIDER_ROOT` | Path to Spider dataset |
| `USE_MOCK` | Skip execution (for testing) |
> Gradio uses `API_KEY` โ backend expects it as `X-API-Key`.
> Backend accepts multiple keys via `API_KEYS`.
---
# **9) Future Work**
### 1) Streaming SQL Generation (SSE)
### 2) Redis Distributed Cache
### 3) Multi-Model Planner/Generator
### 4) A/B Testing Framework
### 5) Schema Embeddings
### 6) Nightly CI Benchmarks
### 7) Advanced Repair (diff-based)
### 8) Helm / Compose Deployment Template
---
# **10) License**
MIT License.
|