Spaces:

junaid0600
/

sql-db-engineer-agent

Sleeping

App Files Files Community

sql-db-engineer-agent / README.md

junaid0600

Add Colab training notebook link

9b80a84 2 months ago

preview code

Raw

History Blame

12 kB

	---
	title: SQL Database Engineer Agent
	emoji: 🗄️
	colorFrom: blue
	colorTo: green
	sdk: docker
	pinned: true
	tags:
	- openenv
	- reinforcement-learning
	- sql
	- database
	- engineering
	- long-horizon
	- self-improvement
	- wildcard
	license: mit
	---

	# SQL Database Engineer Agent — OpenEnv Environment

	> META × PyTorch × SST OpenEnv Hackathon \| Finals April 25–26, 2026 \| Bangalore
	> Evolved from SQL Query Debugger (Round 1 — all 4 checks passed ✅)

	An OpenEnv-compliant reinforcement learning environment where AI agents learn to act like senior database engineers. The agent manages a simulated production database over 50+ steps — inspecting slow queries, creating indexes, rewriting queries, and partitioning tables.

	---

	## From Round 1 → Round 2

	\| \| Round 1 — SQL Query Debugger \| Round 2 — SQL Database Engineer Agent \|
	\|---\|---\|---\|
	\| Task \| Fix one broken SQL query \| Optimize entire production database \|
	\| Steps \| 20 per episode \| 50 per episode \|
	\| Actions \| 6 (identify, fix, submit...) \| 15 (inspect, index, rewrite, partition...) \|
	\| Reward \| Dense per step \| Dense + milestone bonuses \|
	\| Scenarios \| 15 single-query tasks \| 30 total (15 new + 15 original) \|
	\| Training \| Rule-based baseline \| Unsloth + GRPO on Qwen2.5-7B \|
	\| Theme \| Real-world SQL \| Long-Horizon + World Modeling + Wildcard \|

	---

	## Motivation

	Every production database degrades over time.

	Your app launches. Queries run in 50ms. Six months later, users are complaining. P95 query time: 8,500ms. A senior DBA sits down — runs EXPLAIN queries, finds missing indexes, rewrites bad JOINs, partitions 50-million-row tables. This takes 10 years to learn.

	We asked: can we train an LLM to do it?

	SQL database engineering is uniquely well-suited for RL:
	1. 100% measurable — query time in milliseconds, index hit rates, performance scores
	2. Long-horizon — real fixes require 10-50 careful, ordered steps
	3. World modeling — agent must maintain internal model of DB state, indexes, query plans
	4. Self-improving — curriculum generates harder scenarios as agent improves
	5. Novel — no OpenEnv environment for DB engineering exists anywhere

	---

	## Environment Overview

	\| Property \| Value \|
	\|---\|---\|
	\| Domain \| Database Engineering \|
	\| Tasks \| 30 (15 Round 2 scenarios + 15 Round 1 cases) \|
	\| Max Steps \| 50 per episode \|
	\| Reward Type \| Dense + milestone bonuses \|
	\| Performance Score \| 0–100 (real DB metric) \|
	\| API Port \| 7860 \|
	\| Themes \| Long-Horizon (2) + World Modeling (3.1) + Self-Improvement (4) + Wildcard (5) \|

	---

	## Action Space (15 Actions)

	### Round 2 — DB Engineering Actions
	\| Action \| What It Does \| Reward \|
	\|---\|---\|---\|
	\| `inspect_query` \| EXPLAIN a slow query — scan type, rows examined, cost \| +0.05 \|
	\| `analyze_indexes` \| Show all indexes + missing index hints \| +0.05 \|
	\| `create_index` \| Add composite index on specified columns \| +0.10 + delta \|
	\| `rewrite_query` \| Submit rewritten SQL — measures improvement \| +0.15 + delta \|
	\| `add_column` \| Add denormalization column to reduce JOINs \| +0.08 + delta \|
	\| `drop_index` \| Remove unused index (reduce write overhead) \| +0.05 + delta \|
	\| `partition_table` \| Partition large table by date/ID range \| +0.15 + delta \|
	\| `analyze_statistics` \| Update table statistics for query planner \| +0.05 + delta \|
	\| `request_hint` \| Get progressive hint \| −0.10 penalty \|
	\| `submit_report` \| TERMINAL: Final optimization report + full score \| 0.0–1.0 \|

	### Round 1 — SQL Debugging Actions (backward compatible)
	`identify_error` · `propose_fix` · `submit_answer` · `explain_issue` · `optimize_query` · `request_hint`

	---

	## Observation Space

	Every observation contains the full DB state:
	```json
	{
	"task_id": "medium_s001",
	"task_description": "E-commerce DB: 50K orders. P95 query time > 8s. Target: < 500ms.",
	"current_context": {
	"performance_score": 12.5,
	"target_score": 75.0,
	"tables": [
	{"name": "orders", "rows": 50000, "indexes": ["PRIMARY"], "size_mb": 280},
	{"name": "users", "rows": 8000, "indexes": ["PRIMARY", "email_idx"]}
	],
	"slow_queries": [
	{"id": "q1", "sql": "SELECT * FROM orders WHERE user_id=? AND status=?", "avg_ms": 8500},
	{"id": "q2", "sql": "SELECT COUNT(*) FROM orders o JOIN users u ON o.user_id=u.id", "avg_ms": 3200}
	],
	"improvement_history": [12.5],
	"milestones_earned": [],
	"steps_remaining": 50
	},
	"step_count": 0,
	"difficulty": "medium",
	"max_steps": 50
	}
	```

	---

	## Reward Design

	Dense reward at every step + milestone bonuses:

	```
	inspect_query / analyze_indexes → +0.05 (investigation rewarded)
	create_index with improvement → +0.10 + delta_reward
	Milestone: 25% improvement → +0.15 ONE-TIME bonus
	Milestone: 50% improvement → +0.25 ONE-TIME bonus
	Milestone: 75% improvement → +0.40 ONE-TIME bonus
	submit_report (terminal) → 0.0–1.0 full score
	Efficiency bonus (< 70% budget) → +0.10
	Loop penalty (same action x2+) → −0.08
	Hint penalty → −0.10
	Backtrack penalty → −0.05
	Budget exhaustion → −0.15
	```

	### Terminal Score Formula
	```python
	perf_improvement = (final_score - baseline) / (100 - baseline)
	step_efficiency = 1.0 - (steps_used / max_steps)
	terminal_score = (perf_improvement * 0.60) + (step_efficiency * 0.20) + 0.10
	```

	---

	## Scenarios — 30 Tasks

	### Round 2: DB Engineering (15 new tasks)

	#### Easy (15 steps, target 80+)
	\| ID \| Description \|
	\|---\|---\|
	\| easy_s001 \| User lookup — missing email index on 10K users \|
	\| easy_s002 \| Order status — composite index on 50K orders \|
	\| easy_s003 \| Product search — LIKE query on 20K products \|
	\| easy_s004 \| Session lookup — 15K sessions, no index \|
	\| easy_s005 \| Log filter — compound index on 30K logs \|

	#### Medium (25–30 steps, target 72–78)
	\| ID \| Description \|
	\|---\|---\|
	\| medium_s001 \| E-commerce: 50K orders + 8K users, 2 slow queries \|
	\| medium_s002 \| Blog: 100K posts + 20K authors, search slow \|
	\| medium_s003 \| Inventory: 200K stock movements, rewrite + index \|
	\| medium_s004 \| Ticketing: 60K tickets, status queue degraded \|
	\| medium_s005 \| Analytics: 150K events, funnel query slow \|

	#### Hard (50 steps, target 65–70)
	\| ID \| Description \|
	\|---\|---\|
	\| hard_s001 \| Financial: 500K transactions, 4 tables, 3 slow queries \|
	\| hard_s002 \| SaaS: 8-table schema, 2M activity log, dashboard 20s+ \|
	\| hard_s003 \| Healthcare: 1M patient records, compliance queries \|
	\| hard_s004 \| Gaming: 2M players, 5M matches, leaderboard degraded \|
	\| hard_s005 \| Logistics: 6 tables, 3M shipments + 10M tracking rows \|

	### Round 1: SQL Debugging (15 original tasks — backward compatible)
	Easy: syntax errors · Medium: logic bugs · Hard: performance anti-patterns

	---

	## Self-Improving Curriculum

	```
	Agent avg score > 0.75 → Advance to harder tier
	Agent avg score < 0.30 → Drop back a tier
	Ultra tier (tier 3) → Auto-generated 5-8 table scenarios, no hints
	```

	The environment gets harder as the agent gets smarter. Genuine adaptive curriculum.

	---

	## Training Results

	Trained Qwen2.5-7B-Instruct with GRPO using Unsloth:

	\| Stage \| Avg Reward \| Agent Behavior \|
	\|---\|---\|---\|
	\| Before training \| 0.05 \| Random actions, no strategy \|
	\| 50 steps \| 0.25 \| Learns to inspect before acting \|
	\| 200 steps \| 0.55 \| Multi-step planning emerges \|
	\| 500 steps \| 0.82 \| Senior DBA behavior pattern \|

	![Reward Curve](reward_curve.png)

	---

	## API Endpoints

	\| Endpoint \| Method \| Description \|
	\|---\|---\|---\|
	\| `/health` \| GET \| Liveness check — always 200 \|
	\| `/reset` \| POST \| Start new episode → Observation \|
	\| `/step` \| POST \| Submit action → (obs, reward, done, info) \|
	\| `/state` \| GET \| Current episode state \|
	\| `/tasks` \| GET \| All 30 tasks + action schema \|
	\| `/grader` \| POST \| Grade an episode → float score \|
	\| `/baseline` \| POST \| Run baseline agent → scores \|
	\| `/progress` \| GET \| DB performance history + milestones \|

	---

	## Live Demo

	```bash
	# Reset with e-commerce scenario
	curl -X POST https://junaid0600-sql-db-engineer-agent.hf.space/reset \
	-H "Content-Type: application/json" \
	-d '{"difficulty": "easy", "task_id": "easy_s001"}'

	# Agent inspects slow query → sees FULL TABLE SCAN
	curl -X POST https://junaid0600-sql-db-engineer-agent.hf.space/step \
	-H "Content-Type: application/json" \
	-d '{"action_type": "inspect_query", "payload": {"query_id": "q1"}}'

	# Agent creates index → performance score 8.0 → 82.0
	curl -X POST https://junaid0600-sql-db-engineer-agent.hf.space/step \
	-H "Content-Type: application/json" \
	-d '{"action_type": "create_index", "payload": {"table": "users", "columns": ["email"]}}'

	# Agent submits report → terminal score 0.82
	curl -X POST https://junaid0600-sql-db-engineer-agent.hf.space/step \
	-H "Content-Type: application/json" \
	-d '{"action_type": "submit_report", "payload": {"summary": "Added email index. Performance 8 to 82."}}'
	```

	---

	## Project Structure

	```
	sql-db-engineer-agent/
	├── openenv.yaml # OpenEnv metadata (v2.0.0)
	├── Dockerfile # Container definition
	├── requirements.txt # Pinned dependencies
	├── README.md # This file
	├── baseline.py # Rule-based baseline agent
	├── inference.py # LLM inference agent
	├── env/
	│ ├── environment.py # Core: reset() step() state()
	│ ├── db_simulator.py # NEW: DB performance simulator
	│ ├── curriculum.py # NEW: Self-improving curriculum
	│ ├── scenario_generator.py # NEW: Dynamic scenario generation
	│ ├── models.py # Pydantic models (15 action types)
	│ ├── tasks.py # Task manager (30 tasks)
	│ ├── graders.py # Deterministic graders
	│ └── reward.py # Dense reward + milestones
	├── api/
	│ └── server.py # FastAPI — 8 endpoints
	├── dataset/
	│ ├── easy_cases.json # Round 1: 5 syntax tasks
	│ ├── medium_cases.json # Round 1: 5 logic tasks
	│ ├── hard_cases.json # Round 1: 5 performance tasks
	│ ├── easy_scenarios.json # Round 2: 5 easy DB scenarios
	│ ├── medium_scenarios.json # Round 2: 5 medium DB scenarios
	│ └── hard_scenarios.json # Round 2: 5 hard DB scenarios
	├── training/
	│ ├── train_agent.py # Unsloth + GRPO training
	│ ├── evaluate_agent.py # Reward curve generator
	│ ├── generate_training_data.py # Expert trajectory collector
	│ └── colab_notebook.py # Venue GPU training notebook
	├── blog/
	│ └── mini_blog.md # HF blog post
	└── tests/
	├── test_environment.py # 12 environment tests
	└── test_graders.py # 12 grader tests
	```

	---

	## Setup & Installation

	```bash
	# Clone
	git clone https://github.com/Mdjunaid06/sql-db-engineer-agent
	cd sql-db-engineer-agent

	# Install
	pip install -r requirements.txt

	# Configure
	cp .env.example .env
	# Add HF_TOKEN to .env

	# Run
	uvicorn api.server:app --host 0.0.0.0 --port 7860 --reload

	# Verify
	curl http://localhost:7860/health
	# {"status":"ok","version":"2.0.0"}
	```

	---

	## Validation

	```bash
	pytest tests/ -v # 24/24 passed
	openenv validate . # [OK] Ready for multi-mode deployment
	```

	---
	## Colab Training Notebook
	[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1xviukNsgrOCP25W2Z6ocUzvD_C7g6quw?usp=sharing)

	## Built For

	META × PyTorch × SST OpenEnv Hackathon
	Finals: April 25–26, 2026 \| Bangalore

	"We didn't build an environment. We built a DBA training simulator."