--- title: SQL Database Engineer Agent emoji: ๐Ÿ—„๏ธ colorFrom: blue colorTo: green sdk: docker pinned: true tags: - openenv - reinforcement-learning - sql - database - engineering - long-horizon - self-improvement - wildcard license: mit --- # SQL Database Engineer Agent โ€” OpenEnv Environment > **META ร— PyTorch ร— SST OpenEnv Hackathon** | Finals April 25โ€“26, 2026 | Bangalore > Evolved from SQL Query Debugger (Round 1 โ€” all 4 checks passed โœ…) An OpenEnv-compliant reinforcement learning environment where AI agents learn to act like **senior database engineers**. The agent manages a simulated production database over 50+ steps โ€” inspecting slow queries, creating indexes, rewriting queries, and partitioning tables. --- ## From Round 1 โ†’ Round 2 | | Round 1 โ€” SQL Query Debugger | Round 2 โ€” SQL Database Engineer Agent | |---|---|---| | **Task** | Fix one broken SQL query | Optimize entire production database | | **Steps** | 20 per episode | 50 per episode | | **Actions** | 6 (identify, fix, submit...) | 15 (inspect, index, rewrite, partition...) | | **Reward** | Dense per step | Dense + milestone bonuses | | **Scenarios** | 15 single-query tasks | 30 total (15 new + 15 original) | | **Training** | Rule-based baseline | Unsloth + GRPO on Qwen2.5-7B | | **Theme** | Real-world SQL | Long-Horizon + World Modeling + Wildcard | --- ## Motivation Every production database degrades over time. Your app launches. Queries run in 50ms. Six months later, users are complaining. P95 query time: **8,500ms**. A senior DBA sits down โ€” runs EXPLAIN queries, finds missing indexes, rewrites bad JOINs, partitions 50-million-row tables. **This takes 10 years to learn.** We asked: **can we train an LLM to do it?** SQL database engineering is uniquely well-suited for RL: 1. **100% measurable** โ€” query time in milliseconds, index hit rates, performance scores 2. **Long-horizon** โ€” real fixes require 10-50 careful, ordered steps 3. **World modeling** โ€” agent must maintain internal model of DB state, indexes, query plans 4. **Self-improving** โ€” curriculum generates harder scenarios as agent improves 5. **Novel** โ€” no OpenEnv environment for DB engineering exists anywhere --- ## Environment Overview | Property | Value | |---|---| | Domain | Database Engineering | | Tasks | 30 (15 Round 2 scenarios + 15 Round 1 cases) | | Max Steps | 50 per episode | | Reward Type | Dense + milestone bonuses | | Performance Score | 0โ€“100 (real DB metric) | | API Port | 7860 | | Themes | Long-Horizon (2) + World Modeling (3.1) + Self-Improvement (4) + Wildcard (5) | --- ## Action Space (15 Actions) ### Round 2 โ€” DB Engineering Actions | Action | What It Does | Reward | |---|---|---| | `inspect_query` | EXPLAIN a slow query โ€” scan type, rows examined, cost | +0.05 | | `analyze_indexes` | Show all indexes + missing index hints | +0.05 | | `create_index` | Add composite index on specified columns | +0.10 + delta | | `rewrite_query` | Submit rewritten SQL โ€” measures improvement | +0.15 + delta | | `add_column` | Add denormalization column to reduce JOINs | +0.08 + delta | | `drop_index` | Remove unused index (reduce write overhead) | +0.05 + delta | | `partition_table` | Partition large table by date/ID range | +0.15 + delta | | `analyze_statistics` | Update table statistics for query planner | +0.05 + delta | | `request_hint` | Get progressive hint | โˆ’0.10 penalty | | `submit_report` | **TERMINAL**: Final optimization report + full score | 0.0โ€“1.0 | ### Round 1 โ€” SQL Debugging Actions (backward compatible) `identify_error` ยท `propose_fix` ยท `submit_answer` ยท `explain_issue` ยท `optimize_query` ยท `request_hint` --- ## Observation Space Every observation contains the full DB state: ```json { "task_id": "medium_s001", "task_description": "E-commerce DB: 50K orders. P95 query time > 8s. Target: < 500ms.", "current_context": { "performance_score": 12.5, "target_score": 75.0, "tables": [ {"name": "orders", "rows": 50000, "indexes": ["PRIMARY"], "size_mb": 280}, {"name": "users", "rows": 8000, "indexes": ["PRIMARY", "email_idx"]} ], "slow_queries": [ {"id": "q1", "sql": "SELECT * FROM orders WHERE user_id=? AND status=?", "avg_ms": 8500}, {"id": "q2", "sql": "SELECT COUNT(*) FROM orders o JOIN users u ON o.user_id=u.id", "avg_ms": 3200} ], "improvement_history": [12.5], "milestones_earned": [], "steps_remaining": 50 }, "step_count": 0, "difficulty": "medium", "max_steps": 50 } ``` --- ## Reward Design Dense reward at every step + milestone bonuses: ``` inspect_query / analyze_indexes โ†’ +0.05 (investigation rewarded) create_index with improvement โ†’ +0.10 + delta_reward Milestone: 25% improvement โ†’ +0.15 ONE-TIME bonus Milestone: 50% improvement โ†’ +0.25 ONE-TIME bonus Milestone: 75% improvement โ†’ +0.40 ONE-TIME bonus submit_report (terminal) โ†’ 0.0โ€“1.0 full score Efficiency bonus (< 70% budget) โ†’ +0.10 Loop penalty (same action x2+) โ†’ โˆ’0.08 Hint penalty โ†’ โˆ’0.10 Backtrack penalty โ†’ โˆ’0.05 Budget exhaustion โ†’ โˆ’0.15 ``` ### Terminal Score Formula ```python perf_improvement = (final_score - baseline) / (100 - baseline) step_efficiency = 1.0 - (steps_used / max_steps) terminal_score = (perf_improvement * 0.60) + (step_efficiency * 0.20) + 0.10 ``` --- ## Scenarios โ€” 30 Tasks ### Round 2: DB Engineering (15 new tasks) #### Easy (15 steps, target 80+) | ID | Description | |---|---| | easy_s001 | User lookup โ€” missing email index on 10K users | | easy_s002 | Order status โ€” composite index on 50K orders | | easy_s003 | Product search โ€” LIKE query on 20K products | | easy_s004 | Session lookup โ€” 15K sessions, no index | | easy_s005 | Log filter โ€” compound index on 30K logs | #### Medium (25โ€“30 steps, target 72โ€“78) | ID | Description | |---|---| | medium_s001 | E-commerce: 50K orders + 8K users, 2 slow queries | | medium_s002 | Blog: 100K posts + 20K authors, search slow | | medium_s003 | Inventory: 200K stock movements, rewrite + index | | medium_s004 | Ticketing: 60K tickets, status queue degraded | | medium_s005 | Analytics: 150K events, funnel query slow | #### Hard (50 steps, target 65โ€“70) | ID | Description | |---|---| | hard_s001 | Financial: 500K transactions, 4 tables, 3 slow queries | | hard_s002 | SaaS: 8-table schema, 2M activity log, dashboard 20s+ | | hard_s003 | Healthcare: 1M patient records, compliance queries | | hard_s004 | Gaming: 2M players, 5M matches, leaderboard degraded | | hard_s005 | Logistics: 6 tables, 3M shipments + 10M tracking rows | ### Round 1: SQL Debugging (15 original tasks โ€” backward compatible) Easy: syntax errors ยท Medium: logic bugs ยท Hard: performance anti-patterns --- ## Self-Improving Curriculum ``` Agent avg score > 0.75 โ†’ Advance to harder tier Agent avg score < 0.30 โ†’ Drop back a tier Ultra tier (tier 3) โ†’ Auto-generated 5-8 table scenarios, no hints ``` The environment gets harder as the agent gets smarter. **Genuine adaptive curriculum.** --- ## Training Results Trained **Qwen2.5-7B-Instruct** with **GRPO** using **Unsloth**: | Stage | Avg Reward | Agent Behavior | |---|---|---| | Before training | 0.05 | Random actions, no strategy | | 50 steps | 0.25 | Learns to inspect before acting | | 200 steps | 0.55 | Multi-step planning emerges | | 500 steps | **0.82** | Senior DBA behavior pattern | ![Reward Curve](reward_curve.png) --- ## API Endpoints | Endpoint | Method | Description | |---|---|---| | `/health` | GET | Liveness check โ€” always 200 | | `/reset` | POST | Start new episode โ†’ Observation | | `/step` | POST | Submit action โ†’ (obs, reward, done, info) | | `/state` | GET | Current episode state | | `/tasks` | GET | All 30 tasks + action schema | | `/grader` | POST | Grade an episode โ†’ float score | | `/baseline` | POST | Run baseline agent โ†’ scores | | `/progress` | GET | DB performance history + milestones | --- ## Live Demo ```bash # Reset with e-commerce scenario curl -X POST https://junaid0600-sql-db-engineer-agent.hf.space/reset \ -H "Content-Type: application/json" \ -d '{"difficulty": "easy", "task_id": "easy_s001"}' # Agent inspects slow query โ†’ sees FULL TABLE SCAN curl -X POST https://junaid0600-sql-db-engineer-agent.hf.space/step \ -H "Content-Type: application/json" \ -d '{"action_type": "inspect_query", "payload": {"query_id": "q1"}}' # Agent creates index โ†’ performance score 8.0 โ†’ 82.0 curl -X POST https://junaid0600-sql-db-engineer-agent.hf.space/step \ -H "Content-Type: application/json" \ -d '{"action_type": "create_index", "payload": {"table": "users", "columns": ["email"]}}' # Agent submits report โ†’ terminal score 0.82 curl -X POST https://junaid0600-sql-db-engineer-agent.hf.space/step \ -H "Content-Type: application/json" \ -d '{"action_type": "submit_report", "payload": {"summary": "Added email index. Performance 8 to 82."}}' ``` --- ## Project Structure ``` sql-db-engineer-agent/ โ”œโ”€โ”€ openenv.yaml # OpenEnv metadata (v2.0.0) โ”œโ”€โ”€ Dockerfile # Container definition โ”œโ”€โ”€ requirements.txt # Pinned dependencies โ”œโ”€โ”€ README.md # This file โ”œโ”€โ”€ baseline.py # Rule-based baseline agent โ”œโ”€โ”€ inference.py # LLM inference agent โ”œโ”€โ”€ env/ โ”‚ โ”œโ”€โ”€ environment.py # Core: reset() step() state() โ”‚ โ”œโ”€โ”€ db_simulator.py # NEW: DB performance simulator โ”‚ โ”œโ”€โ”€ curriculum.py # NEW: Self-improving curriculum โ”‚ โ”œโ”€โ”€ scenario_generator.py # NEW: Dynamic scenario generation โ”‚ โ”œโ”€โ”€ models.py # Pydantic models (15 action types) โ”‚ โ”œโ”€โ”€ tasks.py # Task manager (30 tasks) โ”‚ โ”œโ”€โ”€ graders.py # Deterministic graders โ”‚ โ””โ”€โ”€ reward.py # Dense reward + milestones โ”œโ”€โ”€ api/ โ”‚ โ””โ”€โ”€ server.py # FastAPI โ€” 8 endpoints โ”œโ”€โ”€ dataset/ โ”‚ โ”œโ”€โ”€ easy_cases.json # Round 1: 5 syntax tasks โ”‚ โ”œโ”€โ”€ medium_cases.json # Round 1: 5 logic tasks โ”‚ โ”œโ”€โ”€ hard_cases.json # Round 1: 5 performance tasks โ”‚ โ”œโ”€โ”€ easy_scenarios.json # Round 2: 5 easy DB scenarios โ”‚ โ”œโ”€โ”€ medium_scenarios.json # Round 2: 5 medium DB scenarios โ”‚ โ””โ”€โ”€ hard_scenarios.json # Round 2: 5 hard DB scenarios โ”œโ”€โ”€ training/ โ”‚ โ”œโ”€โ”€ train_agent.py # Unsloth + GRPO training โ”‚ โ”œโ”€โ”€ evaluate_agent.py # Reward curve generator โ”‚ โ”œโ”€โ”€ generate_training_data.py # Expert trajectory collector โ”‚ โ””โ”€โ”€ colab_notebook.py # Venue GPU training notebook โ”œโ”€โ”€ blog/ โ”‚ โ””โ”€โ”€ mini_blog.md # HF blog post โ””โ”€โ”€ tests/ โ”œโ”€โ”€ test_environment.py # 12 environment tests โ””โ”€โ”€ test_graders.py # 12 grader tests ``` --- ## Setup & Installation ```bash # Clone git clone https://github.com/Mdjunaid06/sql-db-engineer-agent cd sql-db-engineer-agent # Install pip install -r requirements.txt # Configure cp .env.example .env # Add HF_TOKEN to .env # Run uvicorn api.server:app --host 0.0.0.0 --port 7860 --reload # Verify curl http://localhost:7860/health # {"status":"ok","version":"2.0.0"} ``` --- ## Validation ```bash pytest tests/ -v # 24/24 passed openenv validate . # [OK] Ready for multi-mode deployment ``` --- ## Colab Training Notebook [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1xviukNsgrOCP25W2Z6ocUzvD_C7g6quw?usp=sharing) ## Built For **META ร— PyTorch ร— SST OpenEnv Hackathon** Finals: April 25โ€“26, 2026 | Bangalore *"We didn't build an environment. We built a DBA training simulator."*