Melika Kheirieh commited on
Commit
8e92467
Β·
1 Parent(s): 570f7bd

docs: update README with project overview

Browse files
Files changed (1) hide show
  1. README.md +58 -157
README.md CHANGED
@@ -1,198 +1,99 @@
1
- ---
2
- title: NL2SQL Copilot
3
- emoji: 🧠
4
- colorFrom: indigo
5
- colorTo: purple
6
- sdk: gradio
7
- python_version: "3.11"
8
- app_file: app.py
9
- pinned: false
10
- ---
11
- # 🧠 NL2SQL Copilot β€” Prototype
12
-
13
- A minimal **Text-to-SQL Copilot** built with **LangChain + Gradio**, designed to translate natural language questions into **safe SQL** and run them on a **read-only SQLite** database.
14
-
15
- πŸ‘‰ [Live Demo on Hugging Face Spaces](https://huggingface.co/spaces/melikakheirieh/nl2sql-copilot-prototype)
16
-
17
 
18
- > **Status:** Prototype (v0.1). This demonstrates structure and UX; advanced safety/verification pipelines are planned.
 
19
 
20
  ---
21
 
22
- ## ✨ Features (v0.1)
23
- - Gradio UI for quick interactions
24
- - Config-driven environment (dotenv)
25
- - Pluggable LLM endpoint (proxy or direct OpenAI)
26
- - SQLite **read-only** connection (no data mutation)
27
 
28
- **Planned next:**
29
- - Query planning and verification
30
- - Safer SQL guardrails (AST / blocklist / dialect checks)
31
- - Self-repair on failed queries
32
- - Semantic cache and telemetry
33
 
34
- ---
35
-
36
- ## πŸ“‚ Project Structure
37
- ```
38
- nl2sql-copilot-prototype/
39
- β”œβ”€ app.py
40
- β”œβ”€ config.py
41
- β”œβ”€ requirements.txt
42
- β”œβ”€ .env.example
43
- β”œβ”€ .gitignore
44
- └─ README.md
45
 
 
 
 
46
  ```
47
- ## 🧩 Database Samples
48
 
49
- Two example SQLite databases are included in the `db/` folder for quick testing:
50
-
51
- | File | Description | Download |
52
- |------|--------------|-----------|
53
- | **Chinook_Sqlite.sqlite** | Classic sample DB with artists, albums, and tracks (music store example). | [⬇️ Download](https://github.com/melika-kheirieh/nl2sql-copilot-prototype/raw/main/db/Chinook_Sqlite.sqlite) |
54
- | **WMSales.sqlite** | Simple sales database (for demoing aggregate and filter queries). | [⬇️ Download](https://github.com/melika-kheirieh/nl2sql-copilot-prototype/raw/main/db/WMSales.sqlite) |
55
-
56
- You can use them directly in the Gradio UI by uploading one of these files, or reference them in code for local runs.
57
 
58
  ---
59
 
60
- ### 🧠 Sample Questions for *Chinook_Sqlite.sqlite*
61
- Try asking your copilot questions like:
62
-
63
- 1. β€œList the top 5 artists by total track count.”
64
- 2. β€œWhich album has the most tracks?”
65
- 3. β€œShow all tracks longer than 6 minutes.”
66
- 4. β€œFind the average track length by genre.”
67
- 5. β€œShow total invoice amount by billing country.”
68
- 6. β€œTop 10 most popular genres by number of tracks.”
69
- 7. β€œHow many customers have purchased Jazz albums?”
70
- 8. β€œShow the total revenue by employee (sales support).”
71
- 9. β€œList customers who spent more than $100.”
72
- 10. β€œWhich customers are from Canada?”
73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
 
75
  ---
76
 
77
- ### πŸ“Š Sample Questions for *WMSales.sqlite*
78
- You can try:
79
-
80
- 1. β€œShow total sales per month in 2024.”
81
- 2. β€œList the top 10 customers by revenue.”
82
- 3. β€œWhich product category had the highest sales this year?”
83
- 4. β€œFind the average unit price per product.”
84
- 5. β€œShow all orders placed in the last 30 days.”
85
- 6. β€œList total sales by region and salesperson.”
86
- 7. β€œWhat is the best-selling product overall?”
87
- 8. β€œShow total discount given per month.”
88
- 9. β€œFind customers who made more than 5 purchases.”
89
- 10. β€œWhat’s the total revenue by payment method?”
90
- ---
91
-
92
- ## βš™οΈ Requirements
93
- - Python 3.10+
94
- - A proxy/provider API key (OpenAI / custom proxy)
95
- - SQLite DB file (uploaded via UI)
96
-
97
- ---
98
 
99
- ## πŸ” Environment Variables
100
 
101
- Copy the example and fill your own values:
102
 
103
  ```bash
104
- cp .env.example .env
105
  ```
106
 
107
- `.env.example` (proxy-agnostic):
108
- ```bash
109
- # ---- LLM provider or proxy (preferred) ----
110
- PROXY_API_KEY="your-proxy-or-provider-api-key"
111
- PROXY_BASE_URL="https://your-proxy-or-provider-base-url/v1"
112
 
113
- # ---- Optional direct OpenAI fallback ----
114
- #OPENAI_API_KEY="your-openai-api-key"
115
- #OPENAI_BASE_URL="https://api.openai.com/v1"
116
  ```
117
 
118
- `config.py` should select `PROXY_*` first; if empty, it falls back to `OPENAI_*`.
119
-
120
- ---
121
-
122
- ## πŸ§ͺ Local Quickstart
123
 
124
  ```bash
125
- python -m venv .venv
126
- source .venv/bin/activate # Windows: .venv\Scripts\activate
127
- pip install -r requirements.txt
128
- cp .env.example .env # then edit .env and add your keys
129
- python app.py # open the Gradio link in browser
130
  ```
131
 
132
- Upload a SQLite file and try a prompt like:
133
- > β€œTop 5 customers by total orders in 2024.”
134
-
135
  ---
136
 
137
- ## 🧰 Safety Notes (Prototype)
138
- - DB is opened in **read-only** mode, but you should still block multi-statement payloads and dangerous tokens (e.g., `ATTACH`, `PRAGMA`, `sqlite_master`, DDL/INSERT/UPDATE/DELETE).
139
- - Consider an AST approach (e.g., `sqlglot`) for a stricter parse/allow-list.
140
-
141
- ---
142
 
143
- ## ☁️ Deploy to Hugging Face Spaces (Gradio)
144
-
145
- ### 1) Create a new Space
146
- - Go to Hugging Face β†’ Spaces β†’ **New Space**
147
- - **Name:** `nl2sql-copilot-prototype`
148
- - **Space SDK:** Gradio
149
- - **Hardware:** CPU Basic
150
- - **Visibility:** Public (or Private)
151
-
152
- ### 2) Add project files
153
- Commit/push these files to the Space repo:
154
- - `app.py`, `config.py`, `requirements.txt`, `.env.example`, `README.md`, `.gitignore`
155
-
156
- ### 3) Set Secrets (Variables and secrets)
157
- In Space β†’ **Settings β†’ Variables and secrets**:
158
- - `PROXY_API_KEY`: your real key
159
- - `PROXY_BASE_URL`: e.g., `https://.../v1`
160
- - (Optional) `OPENAI_API_KEY` and `OPENAI_BASE_URL`
161
-
162
- > Do **not** commit a real `.env`. Use Space **Secrets**.
163
-
164
- ### 4) Build & Run
165
- - Spaces auto-install from `requirements.txt`.
166
- - If not auto-started, set **App file: main.py**, SDK: **Gradio**, Python: **3.10+**.
167
-
168
- ### 5) Test
169
- - Open Space URL
170
- - Upload a small sample SQLite DB
171
- - Check **Logs** tab for errors
172
-
173
- **Persistence note:** Uploads are ephemeral; include a tiny demo DB in the repo if needed.
174
 
175
  ---
176
 
177
- ## 🧭 Usage Tips
178
- - Prefer concise prompts (e.g., β€œShow avg price by category for 2023”).
179
- - If a query fails, rephrase or reduce columns.
180
- - For bigger DBs, add a schema introspection step or a β€œDescribe tables” helper.
181
-
182
- ---
183
 
184
- ## πŸ›‘οΈ Security & Privacy
185
- - Never log raw API keys.
186
- - Keep `.env` out of Git; commit only `.env.example`.
187
- - Enforce read-only and block multi-statement SQL.
 
 
 
 
 
188
 
189
  ---
190
 
191
- ## πŸ—ΊοΈ Roadmap
192
- - [ ] Planner β†’ Generator β†’ Safety β†’ Executor β†’ Verifier loop
193
- - [ ] AST-based guardrails (sqlglot)
194
- - [ ] Self-repair on DB/SQL errors
195
- - [ ] Semantic cache + telemetry
196
- - [ ] Streamlit / FastAPI variants
197
 
 
198
 
 
1
+ # 🧩 NL2SQL Copilot
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
+ A modular **Text-to-SQL Copilot** that converts natural language questions into safe and verified SQL queries.
4
+ Built with **FastAPI**, **LangGraph**, and **SQLAlchemy**, designed for read-only databases and evaluation on Spider/Dr.Spider benchmarks.
5
 
6
  ---
7
 
8
+ ## πŸš€ Quick Start
 
 
 
 
9
 
10
+ ### 1️⃣ Clone the repo
11
+ ```bash
12
+ git clone https://github.com/melika-kheirieh/nl2sql-copilot.git
13
+ cd nl2sql-copilot
14
+ ````
15
 
16
+ ### 2️⃣ Build and run with Docker
 
 
 
 
 
 
 
 
 
 
17
 
18
+ ```bash
19
+ docker build -t nl2sql-copilot .
20
+ docker run --rm -p 8000:8000 nl2sql-copilot
21
  ```
 
22
 
23
+ Then open [http://localhost:8000/docs](http://localhost:8000/docs) πŸš€
 
 
 
 
 
 
 
24
 
25
  ---
26
 
27
+ ## 🧱 Project Structure
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
+ ```
30
+ nl2sql-copilot/
31
+ β”‚
32
+ β”œβ”€β”€ app/ # FastAPI app, routers, schemas
33
+ β”œβ”€β”€ nl2sql/ # Core pipeline (planner β†’ generator β†’ safety β†’ executor β†’ verifier)
34
+ β”œβ”€β”€ adapters/ # Database and LLM adapters
35
+ β”œβ”€β”€ benchmarks/ # Evaluation scripts and results
36
+ β”œβ”€β”€ ui/ # Streamlit dashboard
37
+ β”‚
38
+ β”œβ”€β”€ Dockerfile
39
+ β”œβ”€β”€ requirements.in
40
+ β”œβ”€β”€ requirements.txt
41
+ └── README.md
42
+ ```
43
 
44
  ---
45
 
46
+ ## πŸ§ͺ Development
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
+ ### Install dependencies
49
 
50
+ (Recommended: Python 3.12+ and virtualenv)
51
 
52
  ```bash
53
+ pip install -r requirements.txt
54
  ```
55
 
56
+ ### Run tests
 
 
 
 
57
 
58
+ ```bash
59
+ pytest -q
 
60
  ```
61
 
62
+ ### Lint and type-check
 
 
 
 
63
 
64
  ```bash
65
+ ruff check .
66
+ mypy .
 
 
 
67
  ```
68
 
 
 
 
69
  ---
70
 
71
+ ## 🧠 Features
 
 
 
 
72
 
73
+ * βœ… Modular multi-stage pipeline (Planner β†’ Generator β†’ Safety β†’ Executor β†’ Verifier β†’ Repair)
74
+ * πŸ›‘οΈ SQL safety filters (SELECT-only, forbidden keywords)
75
+ * πŸ” Self-repair loop on failed executions
76
+ * πŸ“Š Streamlit benchmark dashboard (latency, accuracy, cost)
77
+ * 🧩 PostgreSQL + SQLite adapters
78
+ * 🧠 Powered by `pydantic-ai` and `LangGraph`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
 
80
  ---
81
 
82
+ ## 🧰 Tech Stack
 
 
 
 
 
83
 
84
+ | Layer | Tools |
85
+ | ---------------- | --------------------------------------- |
86
+ | Backend API | FastAPI, Uvicorn |
87
+ | Pipeline Core | Python 3.12, Pydantic, SQLGlot |
88
+ | LLM Interface | pydantic-ai (OpenAI, Anthropic, Ollama) |
89
+ | Database | SQLite (default), PostgreSQL |
90
+ | Evaluation | Spider / Dr.Spider |
91
+ | UI | Streamlit + Plotly |
92
+ | Containerization | Docker / Docker Compose |
93
 
94
  ---
95
 
96
+ ## πŸ“„ License
 
 
 
 
 
97
 
98
+ MIT Β© 2025 Melika Kheirieh
99