Spaces:
Sleeping
Sleeping
| title: Self-Improving SQL Agent | |
| emoji: π§ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| pinned: false | |
| tags: | |
| - sql | |
| - reinforcement-learning | |
| - contextual-bandit | |
| - linucb | |
| - gepa | |
| # Self-Improving SQL Agent | |
| > **Live demo:** [huggingface.co/spaces/ar9av/sql-agent-openenv](https://huggingface.co/spaces/ar9av/sql-agent-openenv) | |
| > **GitHub:** [Ar9av/sql-agent-openenv](https://github.com/Ar9av/sql-agent-openenv) | |
| A SQL agent that gets better the more you use it. Ask questions in plain English β the agent writes SQL, executes it, and repairs its own mistakes using reinforcement learning. Every failure feeds back into a prompt evolution cycle (GEPA) that makes the next attempt smarter. | |
| --- | |
| ## What it does | |
| 1. **Natural language β SQL** β type a question, get a query | |
| 2. **Self-repair loop** β if the SQL fails, the agent diagnoses the error and retries with a different strategy (up to 5 attempts). Each retry sees the full history of previous failures so it doesn't repeat the same mistake | |
| 3. **Reinforcement learning** β a LinUCB contextual bandit learns which of 8 repair strategies works best for each error class (wrong column, bad JOIN, syntax error, wrong dialect, etc.) | |
| 4. **Prompt evolution (GEPA)** β every N queries the system reflects on its failure patterns and rewrites its own system prompt to be more accurate going forward | |
| 5. **Connect your own DB** β drop in any SQLite file or PostgreSQL connection string; the agent introspects the schema and generates relevant example questions automatically | |
| --- | |
| ## Quickstart | |
| ### Run locally | |
| ```bash | |
| # 1. Clone | |
| git clone https://github.com/Ar9av/sql-agent-openenv | |
| cd sql-agent-openenv | |
| # 2. Install backend dependencies | |
| cd backend | |
| pip install -r requirements.txt | |
| # 3. Set environment variables | |
| export HF_TOKEN=your_huggingface_token # required β no default | |
| export API_BASE_URL=https://router.huggingface.co/v1 # optional | |
| export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct # optional | |
| # 4. Build the frontend | |
| cd ../frontend | |
| npm install && npm run build | |
| # 5. Start the server | |
| cd ../backend | |
| uvicorn main:app --host 0.0.0.0 --port 8000 | |
| ``` | |
| Open [http://localhost:8000](http://localhost:8000). | |
| ### Run with Docker | |
| ```bash | |
| docker build -t self-improving-sql-agent . | |
| docker run -p 7860:7860 \ | |
| -e HF_TOKEN=your_token \ | |
| self-improving-sql-agent | |
| ``` | |
| ### Environment variables | |
| | Variable | Default | Required | | |
| |---|---|---| | |
| | `HF_TOKEN` | β | **Yes** | | |
| | `API_BASE_URL` | `https://router.huggingface.co/v1` | No | | |
| | `MODEL_NAME` | `Qwen/Qwen2.5-72B-Instruct` | No | | |
| | `GEPA_OPTIMIZE_EVERY` | `4` | No | | |
| | `DATA_DIR` | `./data` | No | | |
| --- | |
| ## Using the UI | |
| ### Chat tab | |
| Type any question about your data. The agent streams SQL token-by-token, executes it, and shows results in a table. If it fails, watch it diagnose the error and retry with a new strategy. | |
| - **Correct / Wrong buttons** β rate the result. Wrong answers open a remark field; your feedback is fed directly into the next GEPA optimization cycle | |
| - **Retry differently** β re-runs the query with the previous bad SQL as context so the agent avoids repeating the same approach | |
| ### ER Diagram tab | |
| Visual schema explorer showing all tables, columns, and foreign key relationships. | |
| ### Benchmark tab *(built-in DB only)* | |
| Run the agent against a fixed set of easy / medium / hard questions and get an overall accuracy score. | |
| ### Right sidebar β System Prompt & GEPA | |
| See the live system prompt the agent is using. A progress bar shows how far through the current optimization cycle you are (e.g. `2/4 Β· optimizes every 4 queries`). After each cycle the prompt is rewritten and the generation badge updates. | |
| ### Connect your own database | |
| Click **Connect DB** in the top-right: | |
| - **SQLite:** `/path/to/database.db` or `:memory:` | |
| - **PostgreSQL:** `postgresql://user:password@host:5432/dbname` | |
| The agent auto-detects the dialect (SQLite vs PostgreSQL), adjusts its prompt, introspects the schema, and uses the LLM to generate 5 example questions specific to your data. The Benchmark tab and difficulty controls are hidden for custom databases. | |
| --- | |
| ## OpenEnv API | |
| The environment exposes a standard OpenEnv interface for agent training: | |
| | Endpoint | Method | Description | | |
| |---|---|---| | |
| | `POST /reset` | β | Start a new episode, returns `Observation` | | |
| | `POST /step` | β | Execute one repair action, returns `{observation, reward}` | | |
| | `GET /state` | β | Current episode state | | |
| | `GET /env/tasks` | β | List all tasks and questions | | |
| | `GET /env/info` | β | Environment metadata (action/observation space) | | |
| **Stdout** emits structured logs for each episode: | |
| ``` | |
| [START] {"task_id": "...", "question": "...", "max_attempts": 5} | |
| [STEP] {"attempt": 1, "action": "generate", "reward": 0.8, "success": true, "done": true} | |
| [END] {"success": true, "attempts": 1, "total_reward": 0.8} | |
| ``` | |
| **Action space** β 8 discrete repair strategies: | |
| `generate`, `rewrite_full`, `fix_column`, `fix_table`, `add_groupby`, `rewrite_cte`, `fix_syntax`, `change_dialect`, `relax_filter` | |
| --- | |
| ## Architecture | |
| ``` | |
| frontend/ React + Vite (served as static files by FastAPI) | |
| backend/ | |
| main.py FastAPI entry point | |
| api/ | |
| demo.py SSE streaming endpoints (chat, benchmark, GEPA events) | |
| openenv.py OpenEnv spec routes (/reset, /step, /state) | |
| env/ | |
| sql_env.py SQLAgentEnv β episode management, LLM calls | |
| database.py SQLite + PostgreSQL abstraction | |
| tasks.py Benchmark task definitions and grader | |
| rl/ | |
| types.py RepairAction enum, RLState, featurize() | |
| bandit.py LinUCB contextual bandit | |
| repair_strategies.py 8 repair prompt templates | |
| grader.py Shaped reward function | |
| gepa/ | |
| optimizer.py GEPA: reflect β mutate β score β pareto front | |
| ``` | |
| --- | |
| ## Background | |
| > **Origin:** This is a port of [gepa-tuned-sql-agent](https://github.com/Ar9av/gepa-tuned-sql-agent) initally was trying to make it in typescript later to realise we need submission in specific format | |
| The original explored three research ideas in a Next.js stack, started ~1 week before the submission deadline. When it became clear the submission required a Python OpenEnv environment, the whole stack was migrated. | |
| 1. **Self-debug loop** β the agent critiques and fixes its own SQL errors without human intervention | |
| 2. **GEPA prompt evolution** β after user feedback, an LLM reflects on failures and evolves the system prompt | |
| 3. **Mini-RL environment** β a LinUCB contextual bandit learns which repair strategy works best for each error class | |
| ### Key differences from the original | |
| | | gepa-tuned-sql-agent | Self-Improving SQL Agent (this repo) | | |
| |---|---|---| | |
| | Backend | Next.js API routes (TypeScript) | FastAPI (Python) | | |
| | Frontend | Next.js pages | React + Vite (static, served by FastAPI) | | |
| | LLM | Azure OpenAI | HF Router (Qwen 2.5-72B) | | |
| | Deployment | Vercel / local | Hugging Face Spaces (Docker) | | |
| | DB support | SQLite, PostgreSQL, MySQL | SQLite file + PostgreSQL DSN | | |
| | Repair context | Single-shot per attempt | Multi-turn β full failure history passed to each retry | | |