Spaces:
Sleeping
Sleeping
File size: 7,142 Bytes
c293dc3 cb9cfe8 9f7dd14 cb9cfe8 9f7dd14 24ef2cf 9f7dd14 24ef2cf 9f7dd14 24ef2cf 9f7dd14 24ef2cf 9f7dd14 24ef2cf 9f7dd14 24ef2cf 9f7dd14 24ef2cf 9f7dd14 cb9cfe8 9f7dd14 cb9cfe8 9f7dd14 cb9cfe8 9f7dd14 24ef2cf b799708 24ef2cf 9f7dd14 24ef2cf 9f7dd14 24ef2cf 9f7dd14 cb9cfe8 9f7dd14 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 | ---
title: Self-Improving SQL Agent
emoji: π§
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
tags:
- sql
- reinforcement-learning
- contextual-bandit
- linucb
- gepa
---
# Self-Improving SQL Agent
> **Live demo:** [huggingface.co/spaces/ar9av/sql-agent-openenv](https://huggingface.co/spaces/ar9av/sql-agent-openenv)
> **GitHub:** [Ar9av/sql-agent-openenv](https://github.com/Ar9av/sql-agent-openenv)
A SQL agent that gets better the more you use it. Ask questions in plain English β the agent writes SQL, executes it, and repairs its own mistakes using reinforcement learning. Every failure feeds back into a prompt evolution cycle (GEPA) that makes the next attempt smarter.
---
## What it does
1. **Natural language β SQL** β type a question, get a query
2. **Self-repair loop** β if the SQL fails, the agent diagnoses the error and retries with a different strategy (up to 5 attempts). Each retry sees the full history of previous failures so it doesn't repeat the same mistake
3. **Reinforcement learning** β a LinUCB contextual bandit learns which of 8 repair strategies works best for each error class (wrong column, bad JOIN, syntax error, wrong dialect, etc.)
4. **Prompt evolution (GEPA)** β every N queries the system reflects on its failure patterns and rewrites its own system prompt to be more accurate going forward
5. **Connect your own DB** β drop in any SQLite file or PostgreSQL connection string; the agent introspects the schema and generates relevant example questions automatically
---
## Quickstart
### Run locally
```bash
# 1. Clone
git clone https://github.com/Ar9av/sql-agent-openenv
cd sql-agent-openenv
# 2. Install backend dependencies
cd backend
pip install -r requirements.txt
# 3. Set environment variables
export HF_TOKEN=your_huggingface_token # required β no default
export API_BASE_URL=https://router.huggingface.co/v1 # optional
export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct # optional
# 4. Build the frontend
cd ../frontend
npm install && npm run build
# 5. Start the server
cd ../backend
uvicorn main:app --host 0.0.0.0 --port 8000
```
Open [http://localhost:8000](http://localhost:8000).
### Run with Docker
```bash
docker build -t self-improving-sql-agent .
docker run -p 7860:7860 \
-e HF_TOKEN=your_token \
self-improving-sql-agent
```
### Environment variables
| Variable | Default | Required |
|---|---|---|
| `HF_TOKEN` | β | **Yes** |
| `API_BASE_URL` | `https://router.huggingface.co/v1` | No |
| `MODEL_NAME` | `Qwen/Qwen2.5-72B-Instruct` | No |
| `GEPA_OPTIMIZE_EVERY` | `4` | No |
| `DATA_DIR` | `./data` | No |
---
## Using the UI
### Chat tab
Type any question about your data. The agent streams SQL token-by-token, executes it, and shows results in a table. If it fails, watch it diagnose the error and retry with a new strategy.
- **Correct / Wrong buttons** β rate the result. Wrong answers open a remark field; your feedback is fed directly into the next GEPA optimization cycle
- **Retry differently** β re-runs the query with the previous bad SQL as context so the agent avoids repeating the same approach
### ER Diagram tab
Visual schema explorer showing all tables, columns, and foreign key relationships.
### Benchmark tab *(built-in DB only)*
Run the agent against a fixed set of easy / medium / hard questions and get an overall accuracy score.
### Right sidebar β System Prompt & GEPA
See the live system prompt the agent is using. A progress bar shows how far through the current optimization cycle you are (e.g. `2/4 Β· optimizes every 4 queries`). After each cycle the prompt is rewritten and the generation badge updates.
### Connect your own database
Click **Connect DB** in the top-right:
- **SQLite:** `/path/to/database.db` or `:memory:`
- **PostgreSQL:** `postgresql://user:password@host:5432/dbname`
The agent auto-detects the dialect (SQLite vs PostgreSQL), adjusts its prompt, introspects the schema, and uses the LLM to generate 5 example questions specific to your data. The Benchmark tab and difficulty controls are hidden for custom databases.
---
## OpenEnv API
The environment exposes a standard OpenEnv interface for agent training:
| Endpoint | Method | Description |
|---|---|---|
| `POST /reset` | β | Start a new episode, returns `Observation` |
| `POST /step` | β | Execute one repair action, returns `{observation, reward}` |
| `GET /state` | β | Current episode state |
| `GET /env/tasks` | β | List all tasks and questions |
| `GET /env/info` | β | Environment metadata (action/observation space) |
**Stdout** emits structured logs for each episode:
```
[START] {"task_id": "...", "question": "...", "max_attempts": 5}
[STEP] {"attempt": 1, "action": "generate", "reward": 0.8, "success": true, "done": true}
[END] {"success": true, "attempts": 1, "total_reward": 0.8}
```
**Action space** β 8 discrete repair strategies:
`generate`, `rewrite_full`, `fix_column`, `fix_table`, `add_groupby`, `rewrite_cte`, `fix_syntax`, `change_dialect`, `relax_filter`
---
## Architecture
```
frontend/ React + Vite (served as static files by FastAPI)
backend/
main.py FastAPI entry point
api/
demo.py SSE streaming endpoints (chat, benchmark, GEPA events)
openenv.py OpenEnv spec routes (/reset, /step, /state)
env/
sql_env.py SQLAgentEnv β episode management, LLM calls
database.py SQLite + PostgreSQL abstraction
tasks.py Benchmark task definitions and grader
rl/
types.py RepairAction enum, RLState, featurize()
bandit.py LinUCB contextual bandit
repair_strategies.py 8 repair prompt templates
grader.py Shaped reward function
gepa/
optimizer.py GEPA: reflect β mutate β score β pareto front
```
---
## Background
> **Origin:** This is a port of [gepa-tuned-sql-agent](https://github.com/Ar9av/gepa-tuned-sql-agent) initally was trying to make it in typescript later to realise we need submission in specific format
The original explored three research ideas in a Next.js stack, started ~1 week before the submission deadline. When it became clear the submission required a Python OpenEnv environment, the whole stack was migrated.
1. **Self-debug loop** β the agent critiques and fixes its own SQL errors without human intervention
2. **GEPA prompt evolution** β after user feedback, an LLM reflects on failures and evolves the system prompt
3. **Mini-RL environment** β a LinUCB contextual bandit learns which repair strategy works best for each error class
### Key differences from the original
| | gepa-tuned-sql-agent | Self-Improving SQL Agent (this repo) |
|---|---|---|
| Backend | Next.js API routes (TypeScript) | FastAPI (Python) |
| Frontend | Next.js pages | React + Vite (static, served by FastAPI) |
| LLM | Azure OpenAI | HF Router (Qwen 2.5-72B) |
| Deployment | Vercel / local | Hugging Face Spaces (Docker) |
| DB support | SQLite, PostgreSQL, MySQL | SQLite file + PostgreSQL DSN |
| Repair context | Single-shot per attempt | Multi-turn β full failure history passed to each retry |
|