Spaces:
Runtime error
Runtime error
| title: SQL Tutor Env | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| pinned: false | |
| tags: | |
| - openenv | |
| - openenv-main | |
| - rl-environment | |
| base_path: /web | |
| # SQL Tutor Environment | |
| An **OpenEnv** reinforcement learning environment that trains LLM agents to identify and fix bugs in SQL queries. | |
| Built for the [Meta x Hugging Face x PyTorch India Hackathon 2026](https://www.scaler.com/school-of-technology/meta-pytorch-hackathon). | |
| --- | |
| ## Task | |
| At each episode the agent receives: | |
| - A **broken SQL query** with a deliberate bug | |
| - The **database schema** (tables and columns) | |
| - A **task description** of what the correct query should return | |
| The agent must either: | |
| 1. **Submit a fix** (`submit_fix`) - provide a corrected SQL query | |
| 2. **Request a hint** (`request_hint`) - get a progressive hint (with a small reward penalty) | |
| The episode ends when the agent submits a correct query or exhausts its 5 allowed actions. | |
| --- | |
| ## Reward Structure | |
| | Outcome | Reward | | |
| |---|---| | |
| | Correct fix, no hints, first try | **+1.0** | | |
| | Correct fix with hints / retries | **+0.1 to +0.95** (scaled down) | | |
| | SQL syntax error | **-0.1** | | |
| | Wrong query (valid SQL, wrong result) | **-0.05** | | |
| | Requesting a hint | **-0.1** | | |
| | Max steps reached without solving | **0** | | |
| --- | |
| ## Challenge Types (5 built-in) | |
| | ID | Bug Type | | |
| |---|---| | |
| | `wrong_aggregate` | Missing `SUM()` + `GROUP BY` | | |
| | `wrong_join` | `INNER JOIN` should be `LEFT JOIN` | | |
| | `off_by_one_filter` | Wrong comparison operator in `WHERE` | | |
| | `missing_having` | `WHERE` used instead of `HAVING` for aggregate filter | | |
| | `wrong_order_limit` | `ASC` should be `DESC` for top-N query | | |
| --- | |
| ## Quick Start | |
| ```python | |
| from openenv.core import EnvClient | |
| from sql_tutor_env.client import SQLTutorEnv | |
| from sql_tutor_env.models import SQLAction | |
| # Connect to the running HF Space | |
| env = SQLTutorEnv(base_url="https://your-username-sql-tutor-env.hf.space") | |
| # Start an episode | |
| obs, state = env.reset() | |
| print(f"Task: {obs.task_description}") | |
| print(f"Broken query: {obs.broken_query}") | |
| # Submit a fix | |
| result = env.step(SQLAction( | |
| action_type="submit_fix", | |
| sql_query="SELECT customer_id, SUM(amount) AS total_amount FROM orders WHERE status = 'completed' GROUP BY customer_id ORDER BY customer_id;" | |
| )) | |
| print(f"Correct: {result.observation.is_correct}, Reward: {result.reward}") | |
| ``` | |
| --- | |
| ## Integration with TRL / GRPOTrainer | |
| ```python | |
| from trl import GRPOTrainer, GRPOConfig | |
| from sql_tutor_env.client import SQLTutorEnv | |
| from sql_tutor_env.models import SQLAction | |
| def rollout_func(prompts, env): | |
| obs, _ = env.reset() | |
| # ... build prompt from obs, call model, parse SQL, step env | |
| pass | |
| env = SQLTutorEnv(base_url="https://your-space.hf.space") | |
| trainer = GRPOTrainer( | |
| model=model, | |
| config=GRPOConfig(...), | |
| rollout_func=rollout_func, | |
| env=env, | |
| ) | |
| trainer.train() | |
| ``` | |
| --- | |
| ## Project Structure | |
| ``` | |
| sql_tutor_env/ | |
| |-- __init__.py | |
| |-- models.py # SQLAction, SQLObservation, SQLState | |
| |-- client.py # SQLTutorEnv (EnvClient subclass) | |
| |-- openenv.yaml | |
| |-- pyproject.toml | |
| |-- README.md | |
| `-- server/ | |
| |-- __init__.py | |
| |-- app.py # FastAPI app via create_app() | |
| |-- sql_environment.py # Core reset/step/state logic | |
| |-- challenges.py # Bank of SQL bug challenges | |
| |-- requirements.txt | |
| `-- Dockerfile | |
| ``` | |