Spaces:
Sleeping
title: SQL Database Engineer Agent
emoji: ποΈ
colorFrom: blue
colorTo: green
sdk: docker
pinned: true
tags:
- openenv
- reinforcement-learning
- sql
- database
- engineering
- long-horizon
- self-improvement
- wildcard
license: mit
SQL Database Engineer Agent β OpenEnv Environment
META Γ PyTorch Γ SST OpenEnv Hackathon | Finals April 25β26, 2026 | Bangalore Evolved from SQL Query Debugger (Round 1 β all 4 checks passed β )
An OpenEnv-compliant reinforcement learning environment where AI agents learn to act like senior database engineers. The agent manages a simulated production database over 50+ steps β inspecting slow queries, creating indexes, rewriting queries, and partitioning tables.
π Quick Links
| Resource | Link |
|---|---|
| Live Demo | https://huggingface.co/spaces/junaid0600/sql-db-agent-demo-ui |
| Training Notebook | https://huggingface.co/spaces/junaid0600/sql-db-engineer-agent/blob/main/SDEA_Training_Notebook.ipynb |
| **Google Collab ** | https://colab.research.google.com/drive/1dTRcnVb9VotCFUnGeZSacaznb4fn_PD7?usp=sharing |
| Blog Post | https://huggingface.co/spaces/junaid0600/sql-db-engineer-agent/blob/main/blog_post.md |
| Source Code (HF Space) | https://huggingface.co/spaces/junaid0600/sql-db-engineer-agent |
| Source Code (GitHub) | https://github.com/Mdjunaid06/sql-db-engineer-agent |
From Round 1 β Round 2
| Round 1 β SQL Query Debugger | Round 2 β SQL Database Engineer Agent | |
|---|---|---|
| Task | Fix one broken SQL query | Optimize entire production database |
| Steps | 20 per episode | 50 per episode |
| Actions | 6 (identify, fix, submit...) | 15 (inspect, index, rewrite, partition...) |
| Reward | Dense per step | Dense + milestone bonuses |
| Scenarios | 15 single-query tasks | 30 total (15 new + 15 original) |
| Training | Rule-based baseline | Unsloth + GRPO on Qwen2.5-7B |
| Theme | Real-world SQL | Long-Horizon + World Modeling + Wildcard |
Motivation
Every production database degrades over time.
Your app launches. Queries run in 50ms. Six months later, users are complaining. P95 query time: 8,500ms. A senior DBA sits down β runs EXPLAIN queries, finds missing indexes, rewrites bad JOINs, partitions 50-million-row tables. This takes 10 years to learn.
We asked: can we train an LLM to do it?
SQL database engineering is uniquely well-suited for RL:
- 100% measurable β query time in milliseconds, index hit rates, performance scores
- Long-horizon β real fixes require 10-50 careful, ordered steps
- World modeling β agent must maintain internal model of DB state, indexes, query plans
- Self-improving β curriculum generates harder scenarios as agent improves
- Novel β no OpenEnv environment for DB engineering exists anywhere
π Training Results
Trained Qwen2.5-7B-Instruct with GRPO using Unsloth (only 0.53% of parameters via LoRA):
GRPO Training Curves β 200 Steps
| Metric | Value |
|---|---|
| Training steps | 200 |
| Loss | 4.92e-07 β 1.23e-05 |
| Reward | 0.235 β 0.456 |
| Improvement | +94% |
| Model | Qwen2.5-7B (0.53% trainable via LoRA) |
| Epochs | 29 |
| Batch size | 8 (4 Γ 2 grad accum Γ 1 GPU) |
β οΈ Note: GRPO policy loss rises as the model becomes more confident β this is expected behaviour, not divergence. The reward curve confirms consistent improvement.
Evaluation β Trained vs Random Agent (15 Scenarios)
| Agent | Avg Improvement | Best Scenario | Worst Scenario |
|---|---|---|---|
| Random (wrong index) | +0.0 pts | 0 pts | 0 pts |
| Trained (GRPO) | +31.4 pts | +59 pts (Scenario 8 ) | +10 pts |
- Trained agent outperformed random baseline on every single scenario
- Scenario 8 flagged as outlier (Β±1.5Ο) β agent found especially impactful index combination
- Relative gain: β (baseline scored exactly 0 on all scenarios)
Training Progression
| Stage | Avg Reward | Agent Behavior |
|---|---|---|
| Before training | 0.05 | Random actions, no strategy |
| 50 steps | 0.25 | Learns to inspect before acting |
| 200 steps | 0.456 | Multi-step planning emerges |
Environment Overview
| Property | Value |
|---|---|
| Domain | Database Engineering |
| Tasks | 30 (15 Round 2 scenarios + 15 Round 1 cases) |
| Max Steps | 50 per episode |
| Reward Type | Dense + milestone bonuses |
| Performance Score | 0β100 (real DB metric) |
| API Port | 7860 |
| Themes | Long-Horizon (2) + World Modeling (3.1) + Self-Improvement (4) + Wildcard (5) |
Action Space (15 Actions)
Round 2 β DB Engineering Actions
| Action | What It Does | Reward |
|---|---|---|
inspect_query |
EXPLAIN a slow query β scan type, rows examined, cost | +0.05 |
analyze_indexes |
Show all indexes + missing index hints | +0.05 |
create_index |
Add composite index on specified columns | +0.10 + delta |
rewrite_query |
Submit rewritten SQL β measures improvement | +0.15 + delta |
add_column |
Add denormalization column to reduce JOINs | +0.08 + delta |
drop_index |
Remove unused index (reduce write overhead) | +0.05 + delta |
partition_table |
Partition large table by date/ID range | +0.15 + delta |
analyze_statistics |
Update table statistics for query planner | +0.05 + delta |
request_hint |
Get progressive hint | β0.10 penalty |
submit_report |
TERMINAL: Final optimization report + full score | 0.0β1.0 |
Round 1 β SQL Debugging Actions (backward compatible)
identify_error Β· propose_fix Β· submit_answer Β· explain_issue Β· optimize_query Β· request_hint
Observation Space
Every observation contains the full DB state:
{
"task_id": "medium_s001",
"task_description": "E-commerce DB: 50K orders. P95 query time > 8s. Target: < 500ms.",
"current_context": {
"performance_score": 12.5,
"target_score": 75.0,
"tables": [
{"name": "orders", "rows": 50000, "indexes": ["PRIMARY"], "size_mb": 280},
{"name": "users", "rows": 8000, "indexes": ["PRIMARY", "email_idx"]}
],
"slow_queries": [
{"id": "q1", "sql": "SELECT * FROM orders WHERE user_id=? AND status=?", "avg_ms": 8500},
{"id": "q2", "sql": "SELECT COUNT(*) FROM orders o JOIN users u ON o.user_id=u.id", "avg_ms": 3200}
],
"improvement_history": [12.5],
"milestones_earned": [],
"steps_remaining": 50
},
"step_count": 0,
"difficulty": "medium",
"max_steps": 50
}
Reward Design
Dense reward at every step + milestone bonuses:
inspect_query / analyze_indexes β +0.05 (investigation rewarded)
create_index with improvement β +0.10 + delta_reward
Milestone: 25% improvement β +0.15 ONE-TIME bonus
Milestone: 50% improvement β +0.25 ONE-TIME bonus
Milestone: 75% improvement β +0.40 ONE-TIME bonus
submit_report (terminal) β 0.0β1.0 full score
Efficiency bonus (< 70% budget) β +0.10
Loop penalty (same action x2+) β β0.08
Hint penalty β β0.10
Backtrack penalty β β0.05
Budget exhaustion β β0.15
GRPO Reward Breakdown (Expected per action)
inspect_query / analyze_indexes β ~0.10
create_index (no table/col match) β ~0.10
create_index (partial hint match) β ~0.20β0.45
create_index (perfect hint match) β ~0.55β0.80
create_index (simulator confirms) β ~0.75β0.99
Milestones: 25%=+0.15 50%=+0.25 75%=+0.40 (cumulative)
Terminal Score Formula
perf_improvement = (final_score - baseline) / (100 - baseline)
step_efficiency = 1.0 - (steps_used / max_steps)
terminal_score = (perf_improvement * 0.60) + (step_efficiency * 0.20) + 0.10
Scenarios β 30 Tasks
Round 2: DB Engineering (15 new tasks)
Easy (15 steps, target 80+)
| ID | Description |
|---|---|
| easy_s001 | User lookup β missing email index on 10K users |
| easy_s002 | Order status β composite index on 50K orders |
| easy_s003 | Product search β LIKE query on 20K products |
| easy_s004 | Session lookup β 15K sessions, no index |
| easy_s005 | Log filter β compound index on 30K logs |
Medium (25β30 steps, target 72β78)
| ID | Description |
|---|---|
| medium_s001 | E-commerce: 50K orders + 8K users, 2 slow queries |
| medium_s002 | Blog: 100K posts + 20K authors, search slow |
| medium_s003 | Inventory: 200K stock movements, rewrite + index |
| medium_s004 | Ticketing: 60K tickets, status queue degraded |
| medium_s005 | Analytics: 150K events, funnel query slow |
Hard (50 steps, target 65β70)
| ID | Description |
|---|---|
| hard_s001 | Financial: 500K transactions, 4 tables, 3 slow queries |
| hard_s002 | SaaS: 8-table schema, 2M activity log, dashboard 20s+ |
| hard_s003 | Healthcare: 1M patient records, compliance queries |
| hard_s004 | Gaming: 2M players, 5M matches, leaderboard degraded |
| hard_s005 | Logistics: 6 tables, 3M shipments + 10M tracking rows |
Round 1: SQL Debugging (15 original tasks β backward compatible)
Easy: syntax errors Β· Medium: logic bugs Β· Hard: performance anti-patterns
Self-Improving Curriculum
Agent avg score > 0.75 β Advance to harder tier
Agent avg score < 0.30 β Drop back a tier
Ultra tier (tier 3) β Auto-generated 5-8 table scenarios, no hints
The environment gets harder as the agent gets smarter. Genuine adaptive curriculum.
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Liveness check β always 200 |
/reset |
POST | Start new episode β Observation |
/step |
POST | Submit action β (obs, reward, done, info) |
/state |
GET | Current episode state |
/tasks |
GET | All 30 tasks + action schema |
/grader |
POST | Grade an episode β float score |
/baseline |
POST | Run baseline agent β scores |
/progress |
GET | DB performance history + milestones |
Live Demo
# Reset with e-commerce scenario
curl -X POST https://junaid0600-sql-db-engineer-agent.hf.space/reset \
-H "Content-Type: application/json" \
-d '{"difficulty": "easy", "task_id": "easy_s001"}'
# Agent inspects slow query β sees FULL TABLE SCAN
curl -X POST https://junaid0600-sql-db-engineer-agent.hf.space/step \
-H "Content-Type: application/json" \
-d '{"action_type": "inspect_query", "payload": {"query_id": "q1"}}'
# Agent creates index β performance score 8.0 β 82.0
curl -X POST https://junaid0600-sql-db-engineer-agent.hf.space/step \
-H "Content-Type: application/json" \
-d '{"action_type": "create_index", "payload": {"table": "users", "columns": ["email"]}}'
# Agent submits report β terminal score 0.82
curl -X POST https://junaid0600-sql-db-engineer-agent.hf.space/step \
-H "Content-Type: application/json" \
-d '{"action_type": "submit_report", "payload": {"summary": "Added email index. Performance 8 to 82."}}'
Project Structure
sql-query-debugger/
βββ .env # Environment variables
βββ .env.example # Environment variables template
βββ .gitignore
βββ Dockerfile # Container definition
βββ README.md # This file
βββ blog_post.md # HF blog post (separate from README)
βββ loss_curve.png # GRPO training curves β
evidence
βββ reward_curve.png # Evaluation results β
evidence
βββ openenv.yaml # OpenEnv metadata (v2.0.0)
βββ pyproject.toml
βββ requirements.txt # Pinned dependencies
βββ uv.lock
βββ baseline.py # Rule-based baseline agent
βββ demo_app.py # Gradio demo app
βββ inference.py # LLM inference agent
β
βββ api/
β βββ __init__.py
β βββ server.py # FastAPI β 11 endpoints
β
βββ dataset/
β βββ easy_cases.json # Round 1: easy SQL tasks
β βββ easy_scenarios.json # Round 2: easy DB scenarios
β βββ hard_cases.json # Round 1: hard SQL tasks
β βββ hard_scenarios.json # Round 2: hard DB scenarios
β βββ medium_cases.json # Round 1: medium SQL tasks
β βββ medium_scenarios.json # Round 2: medium DB scenarios
β
βββ env/
β βββ __init__.py
β βββ scenarios/ # Scenario definitions
β βββ curriculum.py # Self-improving curriculum
β βββ db_simulator.py # DB performance simulator
β βββ environment.py # Core: reset() step() state()
β βββ graders.py # Deterministic graders
β βββ models.py # Pydantic models (15 action types)
β βββ reward.py # Dense reward + milestones
β βββ scenario_generator.py # Dynamic scenario generation
β βββ tasks.py # Task manager (30 tasks)
β
βββ sdea-trained/
β βββ eval_results.json # Evaluation results JSON
β
βββ training/
β βββ colab_notebook.py # Colab training notebook
β βββ evaluate_agent.py # Evaluation + reward curve generator
β βββ generate_plots.py # Fixed plot generator
β βββ generate_training_data.py # Expert trajectory collector
β βββ train_agent.py # Unsloth + GRPO training script
β
βββ tests/
βββ __init__.py
βββ test_environment.py # Environment tests
βββ test_graders.py # Grader tests
βββ test_reward.py # Reward tests
βββ test_tasks.py # Task tests
Setup & Installation
# Clone
git clone https://github.com/Mdjunaid06/sql-db-engineer-agent
cd sql-db-engineer-agent
# Install
pip install -r requirements.txt
# Configure
cp .env.example .env
# Add HF_TOKEN to .env
# Run
uvicorn api.server:app --host 0.0.0.0 --port 7860 --reload
# Verify
curl http://localhost:7860/health
# {"status":"ok","version":"2.0.0"}
# Open demo
# http://localhost:7860/demo
Validation
pytest tests/ -v # 24/24 passed
openenv validate . # [OK] Ready for multi-mode deployment
Built For
META Γ PyTorch Γ SST OpenEnv Hackathon Finals: April 25β26, 2026 | Bangalore
"We didn't build an environment. We built a DBA training simulator."

