Spaces:
Paused
Paused
| title: SQL Query Optimization Env | |
| emoji: ποΈ | |
| colorFrom: indigo | |
| colorTo: blue | |
| sdk: docker | |
| app_file: server/app.py | |
| pinned: false | |
| tags: | |
| - openenv | |
| # ποΈ SQL Query Optimization Environment | |
| **OpenEnv Hackathon β Phase 1 & 2 Validated β ** | |
| > **The only OpenEnv submission where your optimized SQL is actually executed.** | |
| > Reward is computed from real DuckDB query timing + result correctness β not keyword matching. | |
| --- | |
| ## π What Makes This Unique | |
| Every other environment grades agents by checking if they *mentioned* the right keywords. | |
| This environment **actually runs both queries** against a realistic in-memory DuckDB database | |
| (500,000 orders Β· 1,000,000 events) and measures: | |
| | What we measure | How | | |
| |---|---| | |
| | ποΈ Real speedup | `original_ms / optimized_ms` via DuckDB timing | | |
| | β Result correctness | Both queries must return identical data | | |
| | π Issue detection | Keyword match against ground-truth anti-patterns | | |
| | π Analysis quality | Summary depth + improvement estimate | | |
| The agent receives **execution feedback** after every step (`last_execution` in observation) | |
| and can **refine its rewrite** in subsequent steps β a genuine iterative optimization loop. | |
| --- | |
| ## π¦ Environment at a Glance | |
| | Property | Value | | |
| |---|---| | |
| | SQL Engine | DuckDB in-memory (real execution) | | |
| | Tables | users (10k), orders (500k), products (1k), events (1M) | | |
| | Tasks | 5 (easy β expert) | | |
| | Reward | Float 0.0β1.0 (execution-grounded) | | |
| | Max runtime | < 20 min (DuckDB warm-up ~3s, queries ~5β200ms each) | | |
| --- | |
| ## π§ Observation Space | |
| ```json | |
| { | |
| "task_id": "string", | |
| "task_name": "string", | |
| "task_description": "string", | |
| "sql_query": "string β the bad query to optimize (executable against DuckDB)", | |
| "schema_info": "string β table sizes, columns, indexing notes", | |
| "dialect": "duckdb/postgresql", | |
| "difficulty": "easy | medium | medium-hard | hard | expert", | |
| "step_count": 0, | |
| "max_steps": 5, | |
| "issues_found_so_far": ["issue types flagged in previous steps"], | |
| "last_execution": { | |
| "original_ms": 145.7, | |
| "optimized_ms": 9.3, | |
| "speedup": 15.67, | |
| "results_match": true, | |
| "verdict": "β 15.7x faster with correct results" | |
| } | |
| } | |
| ``` | |
| ## β‘ Action Space | |
| ```json | |
| { | |
| "suggestions": [ | |
| { | |
| "issue_type": "correlated_subquery", | |
| "line": 4, | |
| "description": "Correlated subquery scans 500k orders for each of 3,300 premium users", | |
| "severity": "critical", | |
| "fix": "Rewrite as LEFT JOIN with GROUP BY aggregation" | |
| } | |
| ], | |
| "optimized_query": "SELECT ... FROM users u LEFT JOIN (SELECT ...) s ON ...", | |
| "summary": "Three correlated subqueries cause ~10M row reads. Single JOIN reduces this to one 500k-row scan.", | |
| "estimated_improvement": "15-20x faster β eliminates N+1 subquery pattern", | |
| "approved": false | |
| } | |
| ``` | |
| --- | |
| ## π Five Tasks | |
| | # | Task | Difficulty | Key Anti-Pattern | Expected Speedup | | |
| |---|---|---|---|---| | |
| | 1 | Basic Anti-pattern Detection | Easy | SELECT \*, CAST on filter, YEAR() | 2β5x | | |
| | 2 | N+1 Correlated Subquery Elimination | Medium | 3 correlated subqueries β JOIN | 8β25x | | |
| | 3 | Wildcard LIKE & Projection | Medium-Hard | `LIKE '%purchase%'` on 1M rows | 3β10x | | |
| | 4 | Implicit Cross Join & Scalar Subqueries | Hard | Comma-syntax join + 2 global aggregates | 10β30x | | |
| | 5 | Window Function Full-Scan Audit | Expert | 5 OVER() on unfiltered 1M-row table | 5β20x | | |
| --- | |
| ## π Reward Function | |
| | Component | Weight | Measured By | | |
| |---|---|---| | |
| | ποΈ Real Execution Speedup | **35%** | `original_ms / optimized_ms` via DuckDB | | |
| | β Result Correctness | **20%** | Sorted row-set equality check | | |
| | π Issue Detection | **25%** | Keyword match vs ground truth | | |
| | β Approval Correctness | **8%** | Bool match vs expected | | |
| | π Summary Quality | **7%** | Analysis length & depth | | |
| | π·οΈ Severity Labels | **5%** | Severity values present | | |
| --- | |
| ## π‘ API Endpoints | |
| | Endpoint | Method | Description | | |
| |---|---|---| | |
| | `/` | GET | Health check + table stats | | |
| | `/reset` | POST | Start episode (`{"task_id": "..."}`) | | |
| | `/step` | POST | Submit action β real execution | | |
| | `/state` | GET | Current episode state | | |
| | `/tasks` | GET | All 5 tasks with schema | | |
| | `/grader` | POST | Grade without advancing episode | | |
| | `/baseline` | POST | Run inference.py | | |
| | **`/execute`** | POST | **Run your SQL against DuckDB, get timing + verdict** | | |
| | **`/leaderboard`** | GET | **Real-time best scores & speedups per task** | | |
| ### π₯ Try /execute right now: | |
| ```bash | |
| curl -X POST https://laterabhi-sql-query-env.hf.space/execute \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "task_id": "task_1_basic_antipatterns", | |
| "optimized_query": "SELECT id, customer_id, status, total FROM orders WHERE customer_id = 5000 AND created_at >= DATE '\''2024-01-01'\'' AND created_at < DATE '\''2025-01-01'\''" | |
| }' | |
| ``` | |
| --- | |
| ## π Local Setup | |
| ```bash | |
| git clone https://github.com/OfficialAbhinavSingh/SQL-Query-Optimization-Environment- | |
| cd SQL-Query-Optimization-Environment- | |
| pip install -r requirements.txt | |
| uvicorn server.app:app --host 0.0.0.0 --port 7860 | |
| ``` | |
| ```bash | |
| # Run inference | |
| export API_BASE_URL=https://router.huggingface.co/v1 | |
| export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct | |
| export HF_TOKEN=hf_... | |
| python inference.py | |
| ``` | |
| --- | |
| ## π Baseline Scores (Qwen2.5-72B) | |
| | Task | Score | Speedup | Correct? | | |
| |---|---|---|---| | |
| | Basic Anti-patterns (Easy) | ~0.82 | ~4x | β | | |
| | N+1 Subqueries (Medium) | ~0.71 | ~12x | β | | |
| | Wildcard LIKE (Medium-Hard) | ~0.60 | ~6x | β | | |
| | Implicit Join (Hard) | ~0.52 | ~8x | β | | |
| | Window Functions (Expert) | ~0.44 | ~7x | β | | |
| --- | |
| *Built with β€οΈ for the OpenEnv Hackathon β Phase 1 & 2 Validated* | |