Spaces:
Paused
Paused
metadata
title: SQL Query Optimization Env
emoji: ποΈ
colorFrom: indigo
colorTo: blue
sdk: docker
app_file: server/app.py
pinned: false
tags:
- openenv
ποΈ SQL Query Optimization Environment
OpenEnv Hackathon β Phase 1 & 2 Validated β
The only OpenEnv submission where your optimized SQL is actually executed. Reward is computed from real DuckDB query timing + result correctness β not keyword matching.
π What Makes This Unique
Every other environment grades agents by checking if they mentioned the right keywords. This environment actually runs both queries against a realistic in-memory DuckDB database (500,000 orders Β· 1,000,000 events) and measures:
| What we measure | How |
|---|---|
| ποΈ Real speedup | original_ms / optimized_ms via DuckDB timing |
| β Result correctness | Both queries must return identical data |
| π Issue detection | Keyword match against ground-truth anti-patterns |
| π Analysis quality | Summary depth + improvement estimate |
The agent receives execution feedback after every step (last_execution in observation)
and can refine its rewrite in subsequent steps β a genuine iterative optimization loop.
π¦ Environment at a Glance
| Property | Value |
|---|---|
| SQL Engine | DuckDB in-memory (real execution) |
| Tables | users (10k), orders (500k), products (1k), events (1M) |
| Tasks | 5 (easy β expert) |
| Reward | Float 0.0β1.0 (execution-grounded) |
| Max runtime | < 20 min (DuckDB warm-up ~3s, queries ~5β200ms each) |
π§ Observation Space
{
"task_id": "string",
"task_name": "string",
"task_description": "string",
"sql_query": "string β the bad query to optimize (executable against DuckDB)",
"schema_info": "string β table sizes, columns, indexing notes",
"dialect": "duckdb/postgresql",
"difficulty": "easy | medium | medium-hard | hard | expert",
"step_count": 0,
"max_steps": 5,
"issues_found_so_far": ["issue types flagged in previous steps"],
"last_execution": {
"original_ms": 145.7,
"optimized_ms": 9.3,
"speedup": 15.67,
"results_match": true,
"verdict": "β
15.7x faster with correct results"
}
}
β‘ Action Space
{
"suggestions": [
{
"issue_type": "correlated_subquery",
"line": 4,
"description": "Correlated subquery scans 500k orders for each of 3,300 premium users",
"severity": "critical",
"fix": "Rewrite as LEFT JOIN with GROUP BY aggregation"
}
],
"optimized_query": "SELECT ... FROM users u LEFT JOIN (SELECT ...) s ON ...",
"summary": "Three correlated subqueries cause ~10M row reads. Single JOIN reduces this to one 500k-row scan.",
"estimated_improvement": "15-20x faster β eliminates N+1 subquery pattern",
"approved": false
}
π Five Tasks
| # | Task | Difficulty | Key Anti-Pattern | Expected Speedup |
|---|---|---|---|---|
| 1 | Basic Anti-pattern Detection | Easy | SELECT *, CAST on filter, YEAR() | 2β5x |
| 2 | N+1 Correlated Subquery Elimination | Medium | 3 correlated subqueries β JOIN | 8β25x |
| 3 | Wildcard LIKE & Projection | Medium-Hard | LIKE '%purchase%' on 1M rows |
3β10x |
| 4 | Implicit Cross Join & Scalar Subqueries | Hard | Comma-syntax join + 2 global aggregates | 10β30x |
| 5 | Window Function Full-Scan Audit | Expert | 5 OVER() on unfiltered 1M-row table | 5β20x |
π Reward Function
| Component | Weight | Measured By |
|---|---|---|
| ποΈ Real Execution Speedup | 35% | original_ms / optimized_ms via DuckDB |
| β Result Correctness | 20% | Sorted row-set equality check |
| π Issue Detection | 25% | Keyword match vs ground truth |
| β Approval Correctness | 8% | Bool match vs expected |
| π Summary Quality | 7% | Analysis length & depth |
| π·οΈ Severity Labels | 5% | Severity values present |
π‘ API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Health check + table stats |
/reset |
POST | Start episode ({"task_id": "..."}) |
/step |
POST | Submit action β real execution |
/state |
GET | Current episode state |
/tasks |
GET | All 5 tasks with schema |
/grader |
POST | Grade without advancing episode |
/baseline |
POST | Run inference.py |
/execute |
POST | Run your SQL against DuckDB, get timing + verdict |
/leaderboard |
GET | Real-time best scores & speedups per task |
π₯ Try /execute right now:
curl -X POST https://laterabhi-sql-query-env.hf.space/execute \
-H "Content-Type: application/json" \
-d '{
"task_id": "task_1_basic_antipatterns",
"optimized_query": "SELECT id, customer_id, status, total FROM orders WHERE customer_id = 5000 AND created_at >= DATE '\''2024-01-01'\'' AND created_at < DATE '\''2025-01-01'\''"
}'
π Local Setup
git clone https://github.com/OfficialAbhinavSingh/SQL-Query-Optimization-Environment-
cd SQL-Query-Optimization-Environment-
pip install -r requirements.txt
uvicorn server.app:app --host 0.0.0.0 --port 7860
# Run inference
export API_BASE_URL=https://router.huggingface.co/v1
export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
export HF_TOKEN=hf_...
python inference.py
π Baseline Scores (Qwen2.5-72B)
| Task | Score | Speedup | Correct? |
|---|---|---|---|
| Basic Anti-patterns (Easy) | ~0.82 | ~4x | β |
| N+1 Subqueries (Medium) | ~0.71 | ~12x | β |
| Wildcard LIKE (Medium-Hard) | ~0.60 | ~6x | β |
| Implicit Join (Hard) | ~0.52 | ~8x | β |
| Window Functions (Expert) | ~0.44 | ~7x | β |
Built with β€οΈ for the OpenEnv Hackathon β Phase 1 & 2 Validated