sql-agent-openenv / README.md
ar9avg's picture
fix
c293dc3
---
title: Self-Improving SQL Agent
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
tags:
- sql
- reinforcement-learning
- contextual-bandit
- linucb
- gepa
---
# Self-Improving SQL Agent
> **Live demo:** [huggingface.co/spaces/ar9av/sql-agent-openenv](https://huggingface.co/spaces/ar9av/sql-agent-openenv)
> **GitHub:** [Ar9av/sql-agent-openenv](https://github.com/Ar9av/sql-agent-openenv)
A SQL agent that gets better the more you use it. Ask questions in plain English β€” the agent writes SQL, executes it, and repairs its own mistakes using reinforcement learning. Every failure feeds back into a prompt evolution cycle (GEPA) that makes the next attempt smarter.
---
## What it does
1. **Natural language β†’ SQL** β€” type a question, get a query
2. **Self-repair loop** β€” if the SQL fails, the agent diagnoses the error and retries with a different strategy (up to 5 attempts). Each retry sees the full history of previous failures so it doesn't repeat the same mistake
3. **Reinforcement learning** β€” a LinUCB contextual bandit learns which of 8 repair strategies works best for each error class (wrong column, bad JOIN, syntax error, wrong dialect, etc.)
4. **Prompt evolution (GEPA)** β€” every N queries the system reflects on its failure patterns and rewrites its own system prompt to be more accurate going forward
5. **Connect your own DB** β€” drop in any SQLite file or PostgreSQL connection string; the agent introspects the schema and generates relevant example questions automatically
---
## Quickstart
### Run locally
```bash
# 1. Clone
git clone https://github.com/Ar9av/sql-agent-openenv
cd sql-agent-openenv
# 2. Install backend dependencies
cd backend
pip install -r requirements.txt
# 3. Set environment variables
export HF_TOKEN=your_huggingface_token # required β€” no default
export API_BASE_URL=https://router.huggingface.co/v1 # optional
export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct # optional
# 4. Build the frontend
cd ../frontend
npm install && npm run build
# 5. Start the server
cd ../backend
uvicorn main:app --host 0.0.0.0 --port 8000
```
Open [http://localhost:8000](http://localhost:8000).
### Run with Docker
```bash
docker build -t self-improving-sql-agent .
docker run -p 7860:7860 \
-e HF_TOKEN=your_token \
self-improving-sql-agent
```
### Environment variables
| Variable | Default | Required |
|---|---|---|
| `HF_TOKEN` | β€” | **Yes** |
| `API_BASE_URL` | `https://router.huggingface.co/v1` | No |
| `MODEL_NAME` | `Qwen/Qwen2.5-72B-Instruct` | No |
| `GEPA_OPTIMIZE_EVERY` | `4` | No |
| `DATA_DIR` | `./data` | No |
---
## Using the UI
### Chat tab
Type any question about your data. The agent streams SQL token-by-token, executes it, and shows results in a table. If it fails, watch it diagnose the error and retry with a new strategy.
- **Correct / Wrong buttons** β€” rate the result. Wrong answers open a remark field; your feedback is fed directly into the next GEPA optimization cycle
- **Retry differently** β€” re-runs the query with the previous bad SQL as context so the agent avoids repeating the same approach
### ER Diagram tab
Visual schema explorer showing all tables, columns, and foreign key relationships.
### Benchmark tab *(built-in DB only)*
Run the agent against a fixed set of easy / medium / hard questions and get an overall accuracy score.
### Right sidebar β€” System Prompt & GEPA
See the live system prompt the agent is using. A progress bar shows how far through the current optimization cycle you are (e.g. `2/4 Β· optimizes every 4 queries`). After each cycle the prompt is rewritten and the generation badge updates.
### Connect your own database
Click **Connect DB** in the top-right:
- **SQLite:** `/path/to/database.db` or `:memory:`
- **PostgreSQL:** `postgresql://user:password@host:5432/dbname`
The agent auto-detects the dialect (SQLite vs PostgreSQL), adjusts its prompt, introspects the schema, and uses the LLM to generate 5 example questions specific to your data. The Benchmark tab and difficulty controls are hidden for custom databases.
---
## OpenEnv API
The environment exposes a standard OpenEnv interface for agent training:
| Endpoint | Method | Description |
|---|---|---|
| `POST /reset` | β€” | Start a new episode, returns `Observation` |
| `POST /step` | β€” | Execute one repair action, returns `{observation, reward}` |
| `GET /state` | β€” | Current episode state |
| `GET /env/tasks` | β€” | List all tasks and questions |
| `GET /env/info` | β€” | Environment metadata (action/observation space) |
**Stdout** emits structured logs for each episode:
```
[START] {"task_id": "...", "question": "...", "max_attempts": 5}
[STEP] {"attempt": 1, "action": "generate", "reward": 0.8, "success": true, "done": true}
[END] {"success": true, "attempts": 1, "total_reward": 0.8}
```
**Action space** β€” 8 discrete repair strategies:
`generate`, `rewrite_full`, `fix_column`, `fix_table`, `add_groupby`, `rewrite_cte`, `fix_syntax`, `change_dialect`, `relax_filter`
---
## Architecture
```
frontend/ React + Vite (served as static files by FastAPI)
backend/
main.py FastAPI entry point
api/
demo.py SSE streaming endpoints (chat, benchmark, GEPA events)
openenv.py OpenEnv spec routes (/reset, /step, /state)
env/
sql_env.py SQLAgentEnv β€” episode management, LLM calls
database.py SQLite + PostgreSQL abstraction
tasks.py Benchmark task definitions and grader
rl/
types.py RepairAction enum, RLState, featurize()
bandit.py LinUCB contextual bandit
repair_strategies.py 8 repair prompt templates
grader.py Shaped reward function
gepa/
optimizer.py GEPA: reflect β†’ mutate β†’ score β†’ pareto front
```
---
## Background
> **Origin:** This is a port of [gepa-tuned-sql-agent](https://github.com/Ar9av/gepa-tuned-sql-agent) initally was trying to make it in typescript later to realise we need submission in specific format
The original explored three research ideas in a Next.js stack, started ~1 week before the submission deadline. When it became clear the submission required a Python OpenEnv environment, the whole stack was migrated.
1. **Self-debug loop** β€” the agent critiques and fixes its own SQL errors without human intervention
2. **GEPA prompt evolution** β€” after user feedback, an LLM reflects on failures and evolves the system prompt
3. **Mini-RL environment** β€” a LinUCB contextual bandit learns which repair strategy works best for each error class
### Key differences from the original
| | gepa-tuned-sql-agent | Self-Improving SQL Agent (this repo) |
|---|---|---|
| Backend | Next.js API routes (TypeScript) | FastAPI (Python) |
| Frontend | Next.js pages | React + Vite (static, served by FastAPI) |
| LLM | Azure OpenAI | HF Router (Qwen 2.5-72B) |
| Deployment | Vercel / local | Hugging Face Spaces (Docker) |
| DB support | SQLite, PostgreSQL, MySQL | SQLite file + PostgreSQL DSN |
| Repair context | Single-shot per attempt | Multi-turn β€” full failure history passed to each retry |