File size: 5,410 Bytes
80c6154
3c2f1c5
80c6154
 
 
3c2f1c5
 
80c6154
 
8e92467
e250f79
8f50117
 
 
602cae0
8f50117
 
df092a2
 
 
8f50117
752cafc
8f50117
 
df092a2
8f50117
602cae0
 
 
8f50117
602cae0
8f50117
602cae0
8f50117
 
 
 
 
 
 
 
 
 
 
 
 
 
 
602cae0
8f50117
602cae0
8f50117
602cae0
8f50117
 
 
 
 
 
 
 
 
602cae0
8f50117
602cae0
8f50117
602cae0
8f50117
 
602cae0
8f50117
 
 
 
 
 
 
 
602cae0
8f50117
 
602cae0
8f50117
6430910
8f50117
 
 
 
 
6430910
8f50117
6430910
8f50117
6430910
 
 
8f50117
df092a2
8f50117
df092a2
8f50117
df092a2
8f50117
df092a2
8f50117
df092a2
8f50117
df092a2
8f50117
602cae0
8f50117
602cae0
8f50117
 
 
602cae0
 
 
8f50117
 
 
602cae0
8f50117
 
 
602cae0
8f50117
 
 
 
602cae0
8f50117
 
 
 
df092a2
 
 
8f50117
df092a2
8f50117
df092a2
8f50117
 
 
 
 
df092a2
8f50117
df092a2
8f50117
 
 
 
 
 
 
602cae0
 
 
8f50117
602cae0
 
8f50117
 
 
 
602cae0
 
8f50117
 
 
 
 
 
 
 
 
 
 
 
 
 
 
602cae0
 
8f50117
 
 
 
 
602cae0
8f50117
df092a2
 
 
8e92467
df092a2
8f50117
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
---
title: NL2SQL Copilot β€” Full Stack Demo
emoji: 🧩
colorFrom: indigo
colorTo: blue
sdk: docker
python_version: "3.12"
pinned: false
---
# 🧩 NL2SQL Copilot

[![CI](https://github.com/melika-kheirieh/nl2sql-copilot/actions/workflows/ci.yml/badge.svg)](https://github.com/melika-kheirieh/nl2sql-copilot/actions/workflows/ci.yml)
[![Docker](https://img.shields.io/badge/docker-ready-blue?logo=docker)](#)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

A production-grade **Text-to-SQL Copilot** that converts natural-language questions into **safe, verified SQL**.
Built for analytics engineers who need accuracy, transparency, and control β€” powered by **FastAPI**, **LangGraph**, and **Pydantic-AI**.

---

## πŸš€ Overview

`NL2SQL Copilot` is an **agentic, modular pipeline** that plans, generates, verifies, and repairs SQL queries.
It ensures correctness and safety through structured stages, evaluation on the **Spider** dataset, and full observability support.

> πŸ’‘ Designed for **read-only production databases** with **self-repair**, **metrics**, and **CI/CD** baked in.

---

## 🧠 Agentic Architecture

```

Natural Language
↓
[ Detector ]
↓
[ Planner ]
↓
[ Generator (LLM) ]
↓
[ Safety ]
↓
[ Executor ]
↓
[ Verifier ]
↓
[ Repair ]

````

Each stage is isolated, configurable via YAML, and observable through structured traces and Prometheus metrics.

| Stage | Responsibility |
|--------|----------------|
| **Detector** | Identify whether a query is Text-to-SQL |
| **Planner** | Extract user intent and SQL plan |
| **Generator** | Call LLM to synthesize SQL |
| **Safety** | Block unsafe or non-SELECT queries |
| **Executor** | Execute query in read-only sandbox |
| **Verifier** | Compare results, detect mismatch |
| **Repair** | Self-healing loop triggered on failure |

---

## πŸ“Š Benchmark (Spider dataset)

Dataset: [Spider](https://yale-lily.github.io/spider) by Yale LILY Lab.
Evaluated on the **Spider dev subset (20 samples)** using the reproducible evaluation toolkit.

| Metric | Value |
|--------|--------|
| EM (Exact Match) | 0.15 |
| SM (Structural Match) | 0.70 |
| ExecAcc (Execution Accuracy) | 0.73 |
| Avg Latency | 8.11 s |
| p50 Latency | 9.42 s |
| p95 Latency | 13.88 s |

> High **Structural Match** and **Execution Accuracy** indicate strong semantic correctness;
> lower EM reflects harmless formatting differences.

Run reproducible benchmarks:

```bash
export SPIDER_ROOT="$PWD/data/spider"
PYTHONPATH=$PWD python benchmarks/evaluate_spider_pro.py --spider --split dev --limit 20
PYTHONPATH=$PWD python benchmarks/plot_results.py
````

Results & plots β†’ `benchmarks/results_pro/20251109-171247/`

![Metrics Overview](benchmarks/results_pro/20251109-171247/metrics_overview.png)

---

## βš™οΈ Key Features

βœ… **Agentic architecture** – multi-stage pipeline with feedback loop

πŸ›‘οΈ **Safety layer** – SELECT-only guardrails and AST validation

πŸ” **Self-repair** – automatic retry when verification fails

πŸ“Š **Reproducible evaluation** – integrated Spider / Dr.Spider benchmarking

πŸ“¦ **Config-driven design** – YAML pipeline factory

🧩 **Plug-and-play adapters** – SQLite / PostgreSQL / OpenAI / Anthropic / Ollama

🧠 **FastAPI service + Streamlit UI** – demo or API mode

🧰 **CI/CD ready** – Makefile, Ruff, Mypy, Pytest, Docker, GitHub Actions

πŸ“ˆ **Observability stack** – Prometheus & Grafana metrics for latency and errors

---

## 🧩 Observability & GenAIOps

Monitor every stage of the pipeline in real-time:

* `/metrics` endpoint exposed via FastAPI
* Prometheus + Grafana stack with `make obs-up`
* Metrics tracked:

  * `nl2sql_stage_latency_ms`
  * `nl2sql_stage_error_total`
  * `nl2sql_query_exec_count`
  * `nl2sql_repair_success_rate`

```bash
make obs-up      # start Prometheus + Grafana
make obs-down    # stop the stack
```

---

## πŸ§ͺ Quick Start

### 1️⃣ Clone & Run

```bash
git clone https://github.com/melika-kheirieh/nl2sql-copilot.git
cd nl2sql-copilot
make run
```

Or build with Docker:

```bash
docker build -t nl2sql-copilot .
docker run --rm -p 8000:8000 nl2sql-copilot
```

API available at [http://localhost:8000/docs](http://localhost:8000/docs)
Streamlit demo at [http://localhost:7860](http://localhost:7860)

---

## 🧭 For Developers & CI/CD

```bash
make lint          # Ruff
make typecheck     # Mypy
make test          # Pytest
make bench         # Run benchmark suite
```

### CI/CD Highlights

* Runs on GitHub Actions (`make check`)
* Enforces formatting, typing, tests, and Docker build
* Publishes Docker image to GHCR on successful merge

---

## 🎯 Why it matters

* Bridges **natural language and databases** with measurable reliability
* Provides **reproducible evaluation** for continuous model tracking
* Delivers **production-level resilience** via self-repair and observability
* Demonstrates **AI software engineering** beyond prompt design

---

## πŸ‘€ Author

**Melika Kheirieh**
AI Engineer & Researcher in Natural Language Interfaces for Databases
[GitHub](https://github.com/melika-kheirieh) Β· [LinkedIn](https://www.linkedin.com/in/melika-kheirieh-03a7b5176/)

> This project evolved from [NL2SQL Copilot Prototype](https://github.com/melika-kheirieh/nl2sql-copilot-prototype), refactored into a production-grade, modular agent.

---

## πŸ“„ License

MIT Β© 2025 Melika Kheirieh