File size: 7,142 Bytes
c293dc3
 
 
 
 
 
 
 
 
 
 
 
 
 
cb9cfe8
9f7dd14
cb9cfe8
9f7dd14
 
24ef2cf
9f7dd14
24ef2cf
9f7dd14
24ef2cf
9f7dd14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24ef2cf
9f7dd14
 
 
24ef2cf
9f7dd14
 
 
 
24ef2cf
9f7dd14
 
 
 
 
 
 
 
 
 
 
 
 
 
24ef2cf
9f7dd14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cb9cfe8
 
 
9f7dd14
 
cb9cfe8
 
9f7dd14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cb9cfe8
9f7dd14
24ef2cf
b799708
24ef2cf
9f7dd14
24ef2cf
9f7dd14
 
 
24ef2cf
9f7dd14
cb9cfe8
9f7dd14
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
---
title: Self-Improving SQL Agent
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
tags:
  - sql
  - reinforcement-learning
  - contextual-bandit
  - linucb
  - gepa
---

# Self-Improving SQL Agent

> **Live demo:** [huggingface.co/spaces/ar9av/sql-agent-openenv](https://huggingface.co/spaces/ar9av/sql-agent-openenv)
> **GitHub:** [Ar9av/sql-agent-openenv](https://github.com/Ar9av/sql-agent-openenv)

A SQL agent that gets better the more you use it. Ask questions in plain English β€” the agent writes SQL, executes it, and repairs its own mistakes using reinforcement learning. Every failure feeds back into a prompt evolution cycle (GEPA) that makes the next attempt smarter.

---

## What it does

1. **Natural language β†’ SQL** β€” type a question, get a query
2. **Self-repair loop** β€” if the SQL fails, the agent diagnoses the error and retries with a different strategy (up to 5 attempts). Each retry sees the full history of previous failures so it doesn't repeat the same mistake
3. **Reinforcement learning** β€” a LinUCB contextual bandit learns which of 8 repair strategies works best for each error class (wrong column, bad JOIN, syntax error, wrong dialect, etc.)
4. **Prompt evolution (GEPA)** β€” every N queries the system reflects on its failure patterns and rewrites its own system prompt to be more accurate going forward
5. **Connect your own DB** β€” drop in any SQLite file or PostgreSQL connection string; the agent introspects the schema and generates relevant example questions automatically

---

## Quickstart

### Run locally

```bash
# 1. Clone
git clone https://github.com/Ar9av/sql-agent-openenv
cd sql-agent-openenv

# 2. Install backend dependencies
cd backend
pip install -r requirements.txt

# 3. Set environment variables
export HF_TOKEN=your_huggingface_token        # required β€” no default
export API_BASE_URL=https://router.huggingface.co/v1   # optional
export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct            # optional

# 4. Build the frontend
cd ../frontend
npm install && npm run build

# 5. Start the server
cd ../backend
uvicorn main:app --host 0.0.0.0 --port 8000
```

Open [http://localhost:8000](http://localhost:8000).

### Run with Docker

```bash
docker build -t self-improving-sql-agent .
docker run -p 7860:7860 \
  -e HF_TOKEN=your_token \
  self-improving-sql-agent
```

### Environment variables

| Variable | Default | Required |
|---|---|---|
| `HF_TOKEN` | β€” | **Yes** |
| `API_BASE_URL` | `https://router.huggingface.co/v1` | No |
| `MODEL_NAME` | `Qwen/Qwen2.5-72B-Instruct` | No |
| `GEPA_OPTIMIZE_EVERY` | `4` | No |
| `DATA_DIR` | `./data` | No |

---

## Using the UI

### Chat tab
Type any question about your data. The agent streams SQL token-by-token, executes it, and shows results in a table. If it fails, watch it diagnose the error and retry with a new strategy.

- **Correct / Wrong buttons** β€” rate the result. Wrong answers open a remark field; your feedback is fed directly into the next GEPA optimization cycle
- **Retry differently** β€” re-runs the query with the previous bad SQL as context so the agent avoids repeating the same approach

### ER Diagram tab
Visual schema explorer showing all tables, columns, and foreign key relationships.

### Benchmark tab *(built-in DB only)*
Run the agent against a fixed set of easy / medium / hard questions and get an overall accuracy score.

### Right sidebar β€” System Prompt & GEPA
See the live system prompt the agent is using. A progress bar shows how far through the current optimization cycle you are (e.g. `2/4 Β· optimizes every 4 queries`). After each cycle the prompt is rewritten and the generation badge updates.

### Connect your own database
Click **Connect DB** in the top-right:

- **SQLite:** `/path/to/database.db` or `:memory:`
- **PostgreSQL:** `postgresql://user:password@host:5432/dbname`

The agent auto-detects the dialect (SQLite vs PostgreSQL), adjusts its prompt, introspects the schema, and uses the LLM to generate 5 example questions specific to your data. The Benchmark tab and difficulty controls are hidden for custom databases.

---

## OpenEnv API

The environment exposes a standard OpenEnv interface for agent training:

| Endpoint | Method | Description |
|---|---|---|
| `POST /reset` | β€” | Start a new episode, returns `Observation` |
| `POST /step` | β€” | Execute one repair action, returns `{observation, reward}` |
| `GET /state` | β€” | Current episode state |
| `GET /env/tasks` | β€” | List all tasks and questions |
| `GET /env/info` | β€” | Environment metadata (action/observation space) |

**Stdout** emits structured logs for each episode:
```
[START] {"task_id": "...", "question": "...", "max_attempts": 5}
[STEP]  {"attempt": 1, "action": "generate", "reward": 0.8, "success": true, "done": true}
[END]   {"success": true, "attempts": 1, "total_reward": 0.8}
```

**Action space** β€” 8 discrete repair strategies:
`generate`, `rewrite_full`, `fix_column`, `fix_table`, `add_groupby`, `rewrite_cte`, `fix_syntax`, `change_dialect`, `relax_filter`

---

## Architecture

```
frontend/          React + Vite (served as static files by FastAPI)
backend/
  main.py          FastAPI entry point
  api/
    demo.py        SSE streaming endpoints (chat, benchmark, GEPA events)
    openenv.py     OpenEnv spec routes (/reset, /step, /state)
  env/
    sql_env.py     SQLAgentEnv β€” episode management, LLM calls
    database.py    SQLite + PostgreSQL abstraction
    tasks.py       Benchmark task definitions and grader
  rl/
    types.py       RepairAction enum, RLState, featurize()
    bandit.py      LinUCB contextual bandit
    repair_strategies.py  8 repair prompt templates
    grader.py      Shaped reward function
  gepa/
    optimizer.py   GEPA: reflect β†’ mutate β†’ score β†’ pareto front
```

---

## Background

> **Origin:** This is a port of [gepa-tuned-sql-agent](https://github.com/Ar9av/gepa-tuned-sql-agent) initally was trying to make it in typescript later to realise we need submission in specific format

The original explored three research ideas in a Next.js stack, started ~1 week before the submission deadline. When it became clear the submission required a Python OpenEnv environment, the whole stack was migrated.

1. **Self-debug loop** β€” the agent critiques and fixes its own SQL errors without human intervention
2. **GEPA prompt evolution** β€” after user feedback, an LLM reflects on failures and evolves the system prompt
3. **Mini-RL environment** β€” a LinUCB contextual bandit learns which repair strategy works best for each error class

### Key differences from the original

| | gepa-tuned-sql-agent | Self-Improving SQL Agent (this repo) |
|---|---|---|
| Backend | Next.js API routes (TypeScript) | FastAPI (Python) |
| Frontend | Next.js pages | React + Vite (static, served by FastAPI) |
| LLM | Azure OpenAI | HF Router (Qwen 2.5-72B) |
| Deployment | Vercel / local | Hugging Face Spaces (Docker) |
| DB support | SQLite, PostgreSQL, MySQL | SQLite file + PostgreSQL DSN |
| Repair context | Single-shot per attempt | Multi-turn β€” full failure history passed to each retry |