Melika Kheirieh
docs: add README front-matter for HF repo card (title, license, tags)
c50d16d

A newer version of the Gradio SDK is available: 6.5.1

Upgrade
metadata
title: NL2SQL Copilot
emoji: 🧠
colorFrom: indigo
colorTo: purple
sdk: gradio
python_version: '3.11'
app_file: app.py
pinned: false

🧠 NL2SQL Copilot — Prototype

A minimal Text-to-SQL Copilot built with LangChain + Gradio, designed to translate natural language questions into safe SQL and run them on a read-only SQLite database.

👉 Live Demo on Hugging Face Spaces

Status: Prototype (v0.1). This demonstrates structure and UX; advanced safety/verification pipelines are planned.


✨ Features (v0.1)

  • Gradio UI for quick interactions
  • Config-driven environment (dotenv)
  • Pluggable LLM endpoint (proxy or direct OpenAI)
  • SQLite read-only connection (no data mutation)

Planned next:

  • Query planning and verification
  • Safer SQL guardrails (AST / blocklist / dialect checks)
  • Self-repair on failed queries
  • Semantic cache and telemetry

📂 Project Structure

nl2sql-copilot-prototype/
├─ app.py
├─ config.py
├─ requirements.txt
├─ .env.example
├─ .gitignore
└─ README.md

🧩 Database Samples

Two example SQLite databases are included in the db/ folder for quick testing:

File Description Download
Chinook_Sqlite.sqlite Classic sample DB with artists, albums, and tracks (music store example). ⬇️ Download
WMSales.sqlite Simple sales database (for demoing aggregate and filter queries). ⬇️ Download

You can use them directly in the Gradio UI by uploading one of these files, or reference them in code for local runs.


🧠 Sample Questions for Chinook_Sqlite.sqlite

Try asking your copilot questions like:

  1. “List the top 5 artists by total track count.”
  2. “Which album has the most tracks?”
  3. “Show all tracks longer than 6 minutes.”
  4. “Find the average track length by genre.”
  5. “Show total invoice amount by billing country.”
  6. “Top 10 most popular genres by number of tracks.”
  7. “How many customers have purchased Jazz albums?”
  8. “Show the total revenue by employee (sales support).”
  9. “List customers who spent more than $100.”
  10. “Which customers are from Canada?”

📊 Sample Questions for WMSales.sqlite

You can try:

  1. “Show total sales per month in 2024.”
  2. “List the top 10 customers by revenue.”
  3. “Which product category had the highest sales this year?”
  4. “Find the average unit price per product.”
  5. “Show all orders placed in the last 30 days.”
  6. “List total sales by region and salesperson.”
  7. “What is the best-selling product overall?”
  8. “Show total discount given per month.”
  9. “Find customers who made more than 5 purchases.”
  10. “What’s the total revenue by payment method?”

⚙️ Requirements

  • Python 3.10+
  • A proxy/provider API key (OpenAI / custom proxy)
  • SQLite DB file (uploaded via UI)

🔐 Environment Variables

Copy the example and fill your own values:

cp .env.example .env

.env.example (proxy-agnostic):

# ---- LLM provider or proxy (preferred) ----
PROXY_API_KEY="your-proxy-or-provider-api-key"
PROXY_BASE_URL="https://your-proxy-or-provider-base-url/v1"

# ---- Optional direct OpenAI fallback ----
#OPENAI_API_KEY="your-openai-api-key"
#OPENAI_BASE_URL="https://api.openai.com/v1"

config.py should select PROXY_* first; if empty, it falls back to OPENAI_*.


🧪 Local Quickstart

python -m venv .venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env           # then edit .env and add your keys
python app.py                 # open the Gradio link in browser

Upload a SQLite file and try a prompt like:

“Top 5 customers by total orders in 2024.”


🧰 Safety Notes (Prototype)

  • DB is opened in read-only mode, but you should still block multi-statement payloads and dangerous tokens (e.g., ATTACH, PRAGMA, sqlite_master, DDL/INSERT/UPDATE/DELETE).
  • Consider an AST approach (e.g., sqlglot) for a stricter parse/allow-list.

☁️ Deploy to Hugging Face Spaces (Gradio)

1) Create a new Space

  • Go to Hugging Face → Spaces → New Space
  • Name: nl2sql-copilot-prototype
  • Space SDK: Gradio
  • Hardware: CPU Basic
  • Visibility: Public (or Private)

2) Add project files

Commit/push these files to the Space repo:

  • app.py, config.py, requirements.txt, .env.example, README.md, .gitignore

3) Set Secrets (Variables and secrets)

In Space → Settings → Variables and secrets:

  • PROXY_API_KEY: your real key
  • PROXY_BASE_URL: e.g., https://.../v1
  • (Optional) OPENAI_API_KEY and OPENAI_BASE_URL

Do not commit a real .env. Use Space Secrets.

4) Build & Run

  • Spaces auto-install from requirements.txt.
  • If not auto-started, set App file: main.py, SDK: Gradio, Python: 3.10+.

5) Test

  • Open Space URL
  • Upload a small sample SQLite DB
  • Check Logs tab for errors

Persistence note: Uploads are ephemeral; include a tiny demo DB in the repo if needed.


🧭 Usage Tips

  • Prefer concise prompts (e.g., “Show avg price by category for 2023”).
  • If a query fails, rephrase or reduce columns.
  • For bigger DBs, add a schema introspection step or a “Describe tables” helper.

🛡️ Security & Privacy

  • Never log raw API keys.
  • Keep .env out of Git; commit only .env.example.
  • Enforce read-only and block multi-statement SQL.

🗺️ Roadmap

  • Planner → Generator → Safety → Executor → Verifier loop
  • AST-based guardrails (sqlglot)
  • Self-repair on DB/SQL errors
  • Semantic cache + telemetry
  • Streamlit / FastAPI variants