--- title: SQL Tutor Env colorFrom: blue colorTo: indigo sdk: docker pinned: false tags: - openenv - openenv-main - rl-environment base_path: /web --- # SQL Tutor Environment An **OpenEnv** reinforcement learning environment that trains LLM agents to identify and fix bugs in SQL queries. Built for the [Meta x Hugging Face x PyTorch India Hackathon 2026](https://www.scaler.com/school-of-technology/meta-pytorch-hackathon). --- ## Task At each episode the agent receives: - A **broken SQL query** with a deliberate bug - The **database schema** (tables and columns) - A **task description** of what the correct query should return The agent must either: 1. **Submit a fix** (`submit_fix`) - provide a corrected SQL query 2. **Request a hint** (`request_hint`) - get a progressive hint (with a small reward penalty) The episode ends when the agent submits a correct query or exhausts its 5 allowed actions. --- ## Reward Structure | Outcome | Reward | |---|---| | Correct fix, no hints, first try | **+1.0** | | Correct fix with hints / retries | **+0.1 to +0.95** (scaled down) | | SQL syntax error | **-0.1** | | Wrong query (valid SQL, wrong result) | **-0.05** | | Requesting a hint | **-0.1** | | Max steps reached without solving | **0** | --- ## Challenge Types (5 built-in) | ID | Bug Type | |---|---| | `wrong_aggregate` | Missing `SUM()` + `GROUP BY` | | `wrong_join` | `INNER JOIN` should be `LEFT JOIN` | | `off_by_one_filter` | Wrong comparison operator in `WHERE` | | `missing_having` | `WHERE` used instead of `HAVING` for aggregate filter | | `wrong_order_limit` | `ASC` should be `DESC` for top-N query | --- ## Quick Start ```python from openenv.core import EnvClient from sql_tutor_env.client import SQLTutorEnv from sql_tutor_env.models import SQLAction # Connect to the running HF Space env = SQLTutorEnv(base_url="https://your-username-sql-tutor-env.hf.space") # Start an episode obs, state = env.reset() print(f"Task: {obs.task_description}") print(f"Broken query: {obs.broken_query}") # Submit a fix result = env.step(SQLAction( action_type="submit_fix", sql_query="SELECT customer_id, SUM(amount) AS total_amount FROM orders WHERE status = 'completed' GROUP BY customer_id ORDER BY customer_id;" )) print(f"Correct: {result.observation.is_correct}, Reward: {result.reward}") ``` --- ## Integration with TRL / GRPOTrainer ```python from trl import GRPOTrainer, GRPOConfig from sql_tutor_env.client import SQLTutorEnv from sql_tutor_env.models import SQLAction def rollout_func(prompts, env): obs, _ = env.reset() # ... build prompt from obs, call model, parse SQL, step env pass env = SQLTutorEnv(base_url="https://your-space.hf.space") trainer = GRPOTrainer( model=model, config=GRPOConfig(...), rollout_func=rollout_func, env=env, ) trainer.train() ``` --- ## Project Structure ``` sql_tutor_env/ |-- __init__.py |-- models.py # SQLAction, SQLObservation, SQLState |-- client.py # SQLTutorEnv (EnvClient subclass) |-- openenv.yaml |-- pyproject.toml |-- README.md `-- server/ |-- __init__.py |-- app.py # FastAPI app via create_app() |-- sql_environment.py # Core reset/step/state logic |-- challenges.py # Bank of SQL bug challenges |-- requirements.txt `-- Dockerfile ```