Spaces:

ar9av
/

sql-agent-openenv

Sleeping

App Files Files Community

sql-agent-openenv / README.md

ar9avg

fix

c293dc3 about 1 month ago

preview code

raw

history blame contribute delete

7.14 kB

	---
	title: Self-Improving SQL Agent
	emoji: 🧠
	colorFrom: blue
	colorTo: purple
	sdk: docker
	pinned: false
	tags:
	- sql
	- reinforcement-learning
	- contextual-bandit
	- linucb
	- gepa
	---

	# Self-Improving SQL Agent

	> Live demo: [huggingface.co/spaces/ar9av/sql-agent-openenv](https://huggingface.co/spaces/ar9av/sql-agent-openenv)
	> GitHub: [Ar9av/sql-agent-openenv](https://github.com/Ar9av/sql-agent-openenv)

	A SQL agent that gets better the more you use it. Ask questions in plain English — the agent writes SQL, executes it, and repairs its own mistakes using reinforcement learning. Every failure feeds back into a prompt evolution cycle (GEPA) that makes the next attempt smarter.

	---

	## What it does

	1. Natural language → SQL — type a question, get a query
	2. Self-repair loop — if the SQL fails, the agent diagnoses the error and retries with a different strategy (up to 5 attempts). Each retry sees the full history of previous failures so it doesn't repeat the same mistake
	3. Reinforcement learning — a LinUCB contextual bandit learns which of 8 repair strategies works best for each error class (wrong column, bad JOIN, syntax error, wrong dialect, etc.)
	4. Prompt evolution (GEPA) — every N queries the system reflects on its failure patterns and rewrites its own system prompt to be more accurate going forward
	5. Connect your own DB — drop in any SQLite file or PostgreSQL connection string; the agent introspects the schema and generates relevant example questions automatically

	---

	## Quickstart

	### Run locally

	```bash
	# 1. Clone
	git clone https://github.com/Ar9av/sql-agent-openenv
	cd sql-agent-openenv

	# 2. Install backend dependencies
	cd backend
	pip install -r requirements.txt

	# 3. Set environment variables
	export HF_TOKEN=your_huggingface_token # required — no default
	export API_BASE_URL=https://router.huggingface.co/v1 # optional
	export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct # optional

	# 4. Build the frontend
	cd ../frontend
	npm install && npm run build

	# 5. Start the server
	cd ../backend
	uvicorn main:app --host 0.0.0.0 --port 8000
	```

	Open [http://localhost:8000](http://localhost:8000).

	### Run with Docker

	```bash
	docker build -t self-improving-sql-agent .
	docker run -p 7860:7860 \
	-e HF_TOKEN=your_token \
	self-improving-sql-agent
	```

	### Environment variables

	\| Variable \| Default \| Required \|
	\|---\|---\|---\|
	\| `HF_TOKEN` \| — \| Yes \|
	\| `API_BASE_URL` \| `https://router.huggingface.co/v1` \| No \|
	\| `MODEL_NAME` \| `Qwen/Qwen2.5-72B-Instruct` \| No \|
	\| `GEPA_OPTIMIZE_EVERY` \| `4` \| No \|
	\| `DATA_DIR` \| `./data` \| No \|

	---

	## Using the UI

	### Chat tab
	Type any question about your data. The agent streams SQL token-by-token, executes it, and shows results in a table. If it fails, watch it diagnose the error and retry with a new strategy.

	- Correct / Wrong buttons — rate the result. Wrong answers open a remark field; your feedback is fed directly into the next GEPA optimization cycle
	- Retry differently — re-runs the query with the previous bad SQL as context so the agent avoids repeating the same approach

	### ER Diagram tab
	Visual schema explorer showing all tables, columns, and foreign key relationships.

	### Benchmark tab (built-in DB only)
	Run the agent against a fixed set of easy / medium / hard questions and get an overall accuracy score.

	### Right sidebar — System Prompt & GEPA
	See the live system prompt the agent is using. A progress bar shows how far through the current optimization cycle you are (e.g. `2/4 · optimizes every 4 queries`). After each cycle the prompt is rewritten and the generation badge updates.

	### Connect your own database
	Click Connect DB in the top-right:

	- SQLite: `/path/to/database.db` or `:memory:`
	- PostgreSQL: `postgresql://user:password@host:5432/dbname`

	The agent auto-detects the dialect (SQLite vs PostgreSQL), adjusts its prompt, introspects the schema, and uses the LLM to generate 5 example questions specific to your data. The Benchmark tab and difficulty controls are hidden for custom databases.

	---

	## OpenEnv API

	The environment exposes a standard OpenEnv interface for agent training:

	\| Endpoint \| Method \| Description \|
	\|---\|---\|---\|
	\| `POST /reset` \| — \| Start a new episode, returns `Observation` \|
	\| `POST /step` \| — \| Execute one repair action, returns `{observation, reward}` \|
	\| `GET /state` \| — \| Current episode state \|
	\| `GET /env/tasks` \| — \| List all tasks and questions \|
	\| `GET /env/info` \| — \| Environment metadata (action/observation space) \|

	Stdout emits structured logs for each episode:
	```
	[START] {"task_id": "...", "question": "...", "max_attempts": 5}
	[STEP] {"attempt": 1, "action": "generate", "reward": 0.8, "success": true, "done": true}
	[END] {"success": true, "attempts": 1, "total_reward": 0.8}
	```

	Action space — 8 discrete repair strategies:
	`generate`, `rewrite_full`, `fix_column`, `fix_table`, `add_groupby`, `rewrite_cte`, `fix_syntax`, `change_dialect`, `relax_filter`

	---

	## Architecture

	```
	frontend/ React + Vite (served as static files by FastAPI)
	backend/
	main.py FastAPI entry point
	api/
	demo.py SSE streaming endpoints (chat, benchmark, GEPA events)
	openenv.py OpenEnv spec routes (/reset, /step, /state)
	env/
	sql_env.py SQLAgentEnv — episode management, LLM calls
	database.py SQLite + PostgreSQL abstraction
	tasks.py Benchmark task definitions and grader
	rl/
	types.py RepairAction enum, RLState, featurize()
	bandit.py LinUCB contextual bandit
	repair_strategies.py 8 repair prompt templates
	grader.py Shaped reward function
	gepa/
	optimizer.py GEPA: reflect → mutate → score → pareto front
	```

	---

	## Background

	> Origin: This is a port of [gepa-tuned-sql-agent](https://github.com/Ar9av/gepa-tuned-sql-agent) initally was trying to make it in typescript later to realise we need submission in specific format

	The original explored three research ideas in a Next.js stack, started ~1 week before the submission deadline. When it became clear the submission required a Python OpenEnv environment, the whole stack was migrated.

	1. Self-debug loop — the agent critiques and fixes its own SQL errors without human intervention
	2. GEPA prompt evolution — after user feedback, an LLM reflects on failures and evolves the system prompt
	3. Mini-RL environment — a LinUCB contextual bandit learns which repair strategy works best for each error class

	### Key differences from the original

	\| \| gepa-tuned-sql-agent \| Self-Improving SQL Agent (this repo) \|
	\|---\|---\|---\|
	\| Backend \| Next.js API routes (TypeScript) \| FastAPI (Python) \|
	\| Frontend \| Next.js pages \| React + Vite (static, served by FastAPI) \|
	\| LLM \| Azure OpenAI \| HF Router (Qwen 2.5-72B) \|
	\| Deployment \| Vercel / local \| Hugging Face Spaces (Docker) \|
	\| DB support \| SQLite, PostgreSQL, MySQL \| SQLite file + PostgreSQL DSN \|
	\| Repair context \| Single-shot per attempt \| Multi-turn — full failure history passed to each retry \|