SQL-Query-Env / README.md
Abhinav Singh
fix: use valid HF Space colorTo value (blue not cyan)
e08fd70
metadata
title: SQL Query Optimization Env
emoji: πŸ—„οΈ
colorFrom: indigo
colorTo: blue
sdk: docker
app_file: server/app.py
pinned: false
tags:
  - openenv

πŸ—„οΈ SQL Query Optimization Environment

OpenEnv Hackathon β€” Phase 1 & 2 Validated βœ…

The only OpenEnv submission where your optimized SQL is actually executed. Reward is computed from real DuckDB query timing + result correctness β€” not keyword matching.


πŸš€ What Makes This Unique

Every other environment grades agents by checking if they mentioned the right keywords. This environment actually runs both queries against a realistic in-memory DuckDB database (500,000 orders Β· 1,000,000 events) and measures:

What we measure How
🏎️ Real speedup original_ms / optimized_ms via DuckDB timing
βœ… Result correctness Both queries must return identical data
πŸ” Issue detection Keyword match against ground-truth anti-patterns
πŸ“ Analysis quality Summary depth + improvement estimate

The agent receives execution feedback after every step (last_execution in observation) and can refine its rewrite in subsequent steps β€” a genuine iterative optimization loop.


πŸ“¦ Environment at a Glance

Property Value
SQL Engine DuckDB in-memory (real execution)
Tables users (10k), orders (500k), products (1k), events (1M)
Tasks 5 (easy β†’ expert)
Reward Float 0.0–1.0 (execution-grounded)
Max runtime < 20 min (DuckDB warm-up ~3s, queries ~5–200ms each)

🧠 Observation Space

{
  "task_id": "string",
  "task_name": "string",
  "task_description": "string",
  "sql_query": "string β€” the bad query to optimize (executable against DuckDB)",
  "schema_info": "string β€” table sizes, columns, indexing notes",
  "dialect": "duckdb/postgresql",
  "difficulty": "easy | medium | medium-hard | hard | expert",
  "step_count": 0,
  "max_steps": 5,
  "issues_found_so_far": ["issue types flagged in previous steps"],
  "last_execution": {
    "original_ms": 145.7,
    "optimized_ms": 9.3,
    "speedup": 15.67,
    "results_match": true,
    "verdict": "βœ… 15.7x faster with correct results"
  }
}

⚑ Action Space

{
  "suggestions": [
    {
      "issue_type": "correlated_subquery",
      "line": 4,
      "description": "Correlated subquery scans 500k orders for each of 3,300 premium users",
      "severity": "critical",
      "fix": "Rewrite as LEFT JOIN with GROUP BY aggregation"
    }
  ],
  "optimized_query": "SELECT ... FROM users u LEFT JOIN (SELECT ...) s ON ...",
  "summary": "Three correlated subqueries cause ~10M row reads. Single JOIN reduces this to one 500k-row scan.",
  "estimated_improvement": "15-20x faster β€” eliminates N+1 subquery pattern",
  "approved": false
}

πŸ“‹ Five Tasks

# Task Difficulty Key Anti-Pattern Expected Speedup
1 Basic Anti-pattern Detection Easy SELECT *, CAST on filter, YEAR() 2–5x
2 N+1 Correlated Subquery Elimination Medium 3 correlated subqueries β†’ JOIN 8–25x
3 Wildcard LIKE & Projection Medium-Hard LIKE '%purchase%' on 1M rows 3–10x
4 Implicit Cross Join & Scalar Subqueries Hard Comma-syntax join + 2 global aggregates 10–30x
5 Window Function Full-Scan Audit Expert 5 OVER() on unfiltered 1M-row table 5–20x

πŸ† Reward Function

Component Weight Measured By
🏎️ Real Execution Speedup 35% original_ms / optimized_ms via DuckDB
βœ… Result Correctness 20% Sorted row-set equality check
πŸ” Issue Detection 25% Keyword match vs ground truth
βœ… Approval Correctness 8% Bool match vs expected
πŸ“ Summary Quality 7% Analysis length & depth
🏷️ Severity Labels 5% Severity values present

πŸ“‘ API Endpoints

Endpoint Method Description
/ GET Health check + table stats
/reset POST Start episode ({"task_id": "..."})
/step POST Submit action β†’ real execution
/state GET Current episode state
/tasks GET All 5 tasks with schema
/grader POST Grade without advancing episode
/baseline POST Run inference.py
/execute POST Run your SQL against DuckDB, get timing + verdict
/leaderboard GET Real-time best scores & speedups per task

πŸ”₯ Try /execute right now:

curl -X POST https://laterabhi-sql-query-env.hf.space/execute \
  -H "Content-Type: application/json" \
  -d '{
    "task_id": "task_1_basic_antipatterns",
    "optimized_query": "SELECT id, customer_id, status, total FROM orders WHERE customer_id = 5000 AND created_at >= DATE '\''2024-01-01'\'' AND created_at < DATE '\''2025-01-01'\''"
  }'

πŸš€ Local Setup

git clone https://github.com/OfficialAbhinavSingh/SQL-Query-Optimization-Environment-
cd SQL-Query-Optimization-Environment-
pip install -r requirements.txt
uvicorn server.app:app --host 0.0.0.0 --port 7860
# Run inference
export API_BASE_URL=https://router.huggingface.co/v1
export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
export HF_TOKEN=hf_...
python inference.py

πŸ“Š Baseline Scores (Qwen2.5-72B)

Task Score Speedup Correct?
Basic Anti-patterns (Easy) ~0.82 ~4x βœ…
N+1 Subqueries (Medium) ~0.71 ~12x βœ…
Wildcard LIKE (Medium-Hard) ~0.60 ~6x βœ…
Implicit Join (Hard) ~0.52 ~8x βœ…
Window Functions (Expert) ~0.44 ~7x βœ…

Built with ❀️ for the OpenEnv Hackathon β€” Phase 1 & 2 Validated