File size: 8,085 Bytes
80c6154
3c2f1c5
80c6154
 
 
3c2f1c5
80c6154
 
cc371b0
8f50117
 
 
602cae0
3ef53b4
cc371b0
3ef53b4
 
cc371b0
3ef53b4
 
df092a2
 
 
cc371b0
752cafc
cc371b0
 
 
3ef53b4
 
cc371b0
df092a2
cc371b0
3ef53b4
 
 
602cae0
 
 
cc371b0
602cae0
3ef53b4
602cae0
3ef53b4
cc371b0
 
3ef53b4
 
 
 
 
602cae0
3ef53b4
cc371b0
 
 
3ef53b4
602cae0
cc371b0
 
 
3ef53b4
 
cc371b0
3ef53b4
 
cc371b0
 
 
 
 
 
3ef53b4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
602cae0
3ef53b4
 
 
 
 
 
 
 
cc371b0
3ef53b4
602cae0
8f50117
602cae0
3ef53b4
602cae0
3ef53b4
602cae0
cc371b0
602cae0
3ef53b4
 
 
 
cc371b0
602cae0
3ef53b4
cc371b0
3ef53b4
 
6430910
3ef53b4
cc371b0
3ef53b4
cc371b0
 
6430910
3ef53b4
 
6430910
cc371b0
 
 
 
 
 
3ef53b4
cc371b0
 
3ef53b4
6430910
3ef53b4
df092a2
cc371b0
df092a2
3ef53b4
df092a2
3ef53b4
df092a2
3ef53b4
 
602cae0
cc371b0
602cae0
3ef53b4
 
 
 
 
8f50117
3ef53b4
 
602cae0
 
 
cc371b0
 
3ef53b4
 
 
cc371b0
3ef53b4
 
cc371b0
 
 
3ef53b4
cc371b0
3ef53b4
 
 
 
 
 
 
 
 
cc371b0
3ef53b4
 
cc371b0
 
 
3ef53b4
cc371b0
3ef53b4
8f50117
cc371b0
602cae0
3ef53b4
 
 
 
 
602cae0
3ef53b4
cc371b0
 
 
3ef53b4
602cae0
8f50117
cc371b0
 
 
 
 
 
 
3ef53b4
cc371b0
3ef53b4
cc371b0
 
3ef53b4
cc371b0
 
 
 
 
 
8f50117
df092a2
 
 
3ef53b4
df092a2
cc371b0
df092a2
8f50117
cc371b0
 
 
 
 
 
 
8f50117
df092a2
3ef53b4
cc371b0
 
 
 
3ef53b4
 
 
 
 
 
 
 
 
 
 
cc371b0
 
3ef53b4
 
 
cc371b0
 
3ef53b4
cc371b0
 
 
 
 
 
3ef53b4
df092a2
8f50117
cc371b0
 
 
8f50117
 
602cae0
 
cc371b0
602cae0
 
cc371b0
 
602cae0
 
cc371b0
8f50117
3ef53b4
cc371b0
3ef53b4
 
 
 
 
 
 
 
 
8f50117
3ef53b4
 
8f50117
3ef53b4
8f50117
3ef53b4
cc371b0
3ef53b4
cc371b0
 
 
 
 
 
 
 
 
 
 
3ef53b4
602cae0
3ef53b4
df092a2
 
 
3ef53b4
df092a2
cc371b0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
---
title: NL2SQL Copilot โ€” Full Stack Demo
emoji: ๐Ÿงฉ
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false
---
# ๐Ÿงฉ **NL2SQL Copilot โ€” Natural-Language โ†’ Safe SQL**
[![CI](https://github.com/melika-kheirieh/nl2sql-copilot/actions/workflows/ci.yml/badge.svg)](https://github.com/melika-kheirieh/nl2sql-copilot/actions/workflows/ci.yml)
[![Docker](https://img.shields.io/badge/docker-ready-blue?logo=docker)](#)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)


**Modular Text-to-SQL Copilot built with FastAPI & Pydantic-AI.**
Generates *safe, verified, executable SQL* through a multi-stage agentic pipeline.
Includes: schema introspection, self-repair, Spider benchmarks, Prometheus metrics, and a full demo UI.

๐Ÿš€ **Live Demo:**
๐Ÿ‘‰ **[https://huggingface.co/spaces/melika-kheirieh/nl2sql-copilot](https://huggingface.co/spaces/melika-kheirieh/nl2sql-copilot)**

---

# **1) Quick Start**

```bash
git clone https://github.com/melika-kheirieh/nl2sql-copilot
cd nl2sql-copilot
make setup       # install dependencies
make run         # start API + Gradio UI
```

Open:

* [http://localhost:8000](http://localhost:8000) (FastAPI Swagger UI)
* [http://localhost:7860](http://localhost:7860) (Gradio Demo)

---

# **2) Demo (Gradio UI)**

The demo supports:

* Uploading any SQLite database
* Asking natural-language questions
* Viewing generated SQL
* Viewing query results
* Full multi-stage trace (detector โ†’ planner โ†’ generator โ†’ safety โ†’ executor โ†’ verifier โ†’ repair)
* Per-stage timings
* Example queries
* And a default demo DB (no upload required)

Everything runs on the same backend as the API.

---

# **3) Agentic Architecture**

```
user query
    โ†“
detector      (ambiguity, missing info)
planner       (schema reasoning + task decomposition)
generator     (SQL generation)
safety        (SELECT-only validation)
executor      (sandboxed DB execution)
verifier      (semantic + execution checks)
repair        (minimal-diff SQL repair loop)
    โ†“
final SQL + result + traces
```

### โš™๏ธ Tech Stack

* FastAPI
* Pydantic-AI
* SQLiteAdapter
* Prometheus + Grafana
* pytest + mypy + Makefile
* Gradio UI

The pipeline is fully modular: each stage has a clean, swappable interface.

---

# **4) Evolution (Prototype โ†’ Copilot)**

This project is the **second-generation, production-grade** version of an earlier prototype:
๐Ÿ‘‰ [https://github.com/melika-kheirieh/nl2sql-copilot-prototype](https://github.com/melika-kheirieh/nl2sql-copilot-prototype)

The prototype explored single-step, prompt-based SQL generation.
The current version is a **complete architectural redesign**, adding:

* multi-stage agentic pipeline
* schema introspection
* safety guardrails
* self-repair loop
* caching
* observability
* Spider benchmarks
* multi-DB support with upload + TTL handling

This repository is the first **end-to-end, production-oriented** version.

---

# **5) Key Features**

### โœ” Agentic Pipeline

Planner โ†’ Generator โ†’ Safety โ†’ Executor โ†’ Verifier โ†’ Repair.

### โœ” Schema-Aware

Automatic schema preview for any uploaded SQLite database.

### โœ” Safety by Design

* SELECT-only
* Column/table validation
* No multi-statement SQL
* Prevents schema hallucination

### โœ” Self-Repair

Automatic minimal-diff correction when SQL fails.

### โœ” Caching

TTL-based, with key = (db_id, normalized_query, schema_hash).
Hit/miss metrics included.

### โœ” Observability

* Per-stage latency
* Pipeline success ratio
* Repair success rate
* Cache hit ratio
* p95 latency
* Full Grafana dashboard

### โœ” Benchmarks

Reproducible Spider evaluation with plots + summary.

---

# **6) Benchmarks (Spider dev, 20 samples)**

[![Benchmarks](https://img.shields.io/badge/Benchmarks-Spider%20dev-blue)](#benchmarks)

Evaluated on a curated 20-sample subset of the Spider **dev** split
(focused on `concert_singer`), using the full production pipeline.

### ๐Ÿงฎ Summary

* **Total samples:** 20
* **Successful runs:** 20/20 (**100%**)
* **Exact Match (EM):** **0.10**
* **Structural Match (SM):** **0.70**
* **Execution Accuracy:** **0.725**

This reflects a *production-oriented* NL2SQL system:
the model optimizes for **executable SQL**, not literal gold-string alignment.

---

### โฑ Latency

* **Avg latency:** ~**8066 ms**
* **p50:** ~**9229 ms**
* **p95:** ~**14936 ms**

Latency is **bimodal**:
simple queries โ†’ fast, reasoning-heavy queries โ†’ planner-dominated.

---

### โš™๏ธ Per-Stage Latency

| Stage     | Avg latency (ms) |
| --------- | ---------------- |
| detector  | ~1               |
| planner   | ~8360            |
| generator | ~1645            |
| safety    | ~2               |
| executor  | ~1               |
| verifier  | ~1               |
| repair    | ~1200            |

Planner is the main bottleneck (expected for schema-level reasoning).
Safety/executor/verifier stay **single-digit ms**.

---

### โŒ Failure Modes (Why EM is low)

Even when EM = 0, **SM and ExecAcc are often 1.0**.

Typical causes:

* Capitalization differences (`Age` vs `age`)
* Different column ordering
* LIMIT differences
* Alias mismatch
* Gold SQL is `EMPTY` but the model infers a valid SQL

In real-world systems, **execution correctness matters more than exact string match**.

---

### ๐Ÿ“‚ Reproducing the Benchmark

```bash
export SPIDER_ROOT="$PWD/data/spider"

PYTHONPATH=$PWD \
  python benchmarks/evaluate_spider_pro.py --spider --split dev --limit 20 --debug

PYTHONPATH=$PWD \
  python benchmarks/plot_results.py
```

Artifacts saved under:

```
benchmarks/results_pro/<timestamp>/
    summary.json
    eval.jsonl
    metrics_overview.png
    latency_histogram.png
    latency_per_stage.png
    errors_overview.png
```

---

# **7) API Usage**

## ๐Ÿ” NL โ†’ SQL

```bash
curl -X POST "http://localhost:8000/api/v1/nl2sql" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{
        "query": "Top 5 customers by total invoice amount",
        "db_id": null
      }'
```

### โœ” Sample Response (accurate)

```json
{
  "ambiguous": false,
  "sql": "SELECT ...",
  "rationale": "Explanation of why this SQL was generated.",
  "result": {
    "rows": 5,
    "columns": ["CustomerId", "Total"],
    "rows_data": [
      [1, 39.6],
      [2, 38.7],
      [3, 35.4]
    ]
  },
  "traces": [
    {"stage": "detector", "duration_ms": 1},
    {"stage": "planner",  "duration_ms": 8943},
    {"stage": "generator","duration_ms": 1722},
    {"stage": "safety",   "duration_ms": 2},
    {"stage": "executor", "duration_ms": 1},
    {"stage": "verifier", "duration_ms": 1},
    {"stage": "repair",   "duration_ms": 522}
  ]
}
```

---

## ๐Ÿ“ Upload a SQLite DB

```bash
curl -X POST "http://localhost:8000/api/v1/nl2sql/upload_db" \
  -H "X-API-Key: dev-key" \
  -F "file=@/path/to/db.sqlite"
```

---

## ๐Ÿ“‘ Schema Preview

```bash
curl "http://localhost:8000/api/v1/nl2sql/schema?db_id=<uuid>" \
  -H "X-API-Key: dev-key"
```

---

# **8) Environment Variables**

| Variable               | Purpose                                  |
| ---------------------- | ---------------------------------------- |
| `API_KEYS`             | Comma-separated list of backend API keys |
| `API_KEY`              | Used by Gradio UI to call the backend    |
| `DEV_MODE`             | Enables strict ambiguity detection       |
| `NL2SQL_CACHE_TTL_SEC` | Cache TTL                                |
| `NL2SQL_CACHE_MAX`     | Max cache entries                        |
| `SPIDER_ROOT`          | Path to Spider dataset                   |
| `USE_MOCK`             | Skip execution (for testing)             |

> Gradio uses `API_KEY` โ†’ backend expects it as `X-API-Key`.
> Backend accepts multiple keys via `API_KEYS`.

---

# **9) Future Work**

### 1) Streaming SQL Generation (SSE)

### 2) Redis Distributed Cache

### 3) Multi-Model Planner/Generator

### 4) A/B Testing Framework

### 5) Schema Embeddings

### 6) Nightly CI Benchmarks

### 7) Advanced Repair (diff-based)

### 8) Helm / Compose Deployment Template

---

# **10) License**

MIT License.