File size: 3,465 Bytes
0683cf4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---

title: SQL Tutor Env
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
tags:
  - openenv
  - openenv-main
  - rl-environment
base_path: /web
---


# SQL Tutor Environment

An **OpenEnv** reinforcement learning environment that trains LLM agents to identify and fix bugs in SQL queries.

Built for the [Meta x Hugging Face x PyTorch India Hackathon 2026](https://www.scaler.com/school-of-technology/meta-pytorch-hackathon).

---

## Task

At each episode the agent receives:
- A **broken SQL query** with a deliberate bug
- The **database schema** (tables and columns)
- A **task description** of what the correct query should return

The agent must either:
1. **Submit a fix** (`submit_fix`) - provide a corrected SQL query
2. **Request a hint** (`request_hint`) - get a progressive hint (with a small reward penalty)

The episode ends when the agent submits a correct query or exhausts its 5 allowed actions.

---

## Reward Structure

| Outcome | Reward |
|---|---|
| Correct fix, no hints, first try | **+1.0** |
| Correct fix with hints / retries | **+0.1 to +0.95** (scaled down) |
| SQL syntax error | **-0.1** |
| Wrong query (valid SQL, wrong result) | **-0.05** |
| Requesting a hint | **-0.1** |
| Max steps reached without solving | **0** |

---

## Challenge Types (5 built-in)

| ID | Bug Type |
|---|---|
| `wrong_aggregate` | Missing `SUM()` + `GROUP BY` |
| `wrong_join` | `INNER JOIN` should be `LEFT JOIN` |
| `off_by_one_filter` | Wrong comparison operator in `WHERE` |
| `missing_having` | `WHERE` used instead of `HAVING` for aggregate filter |
| `wrong_order_limit` | `ASC` should be `DESC` for top-N query |

---

## Quick Start

```python

from openenv.core import EnvClient

from sql_tutor_env.client import SQLTutorEnv

from sql_tutor_env.models import SQLAction



# Connect to the running HF Space

env = SQLTutorEnv(base_url="https://your-username-sql-tutor-env.hf.space")



# Start an episode

obs, state = env.reset()

print(f"Task: {obs.task_description}")

print(f"Broken query: {obs.broken_query}")



# Submit a fix

result = env.step(SQLAction(

    action_type="submit_fix",

    sql_query="SELECT customer_id, SUM(amount) AS total_amount FROM orders WHERE status = 'completed' GROUP BY customer_id ORDER BY customer_id;"

))

print(f"Correct: {result.observation.is_correct}, Reward: {result.reward}")

```

---

## Integration with TRL / GRPOTrainer

```python

from trl import GRPOTrainer, GRPOConfig

from sql_tutor_env.client import SQLTutorEnv

from sql_tutor_env.models import SQLAction



def rollout_func(prompts, env):

    obs, _ = env.reset()

    # ... build prompt from obs, call model, parse SQL, step env

    pass



env = SQLTutorEnv(base_url="https://your-space.hf.space")

trainer = GRPOTrainer(

    model=model,

    config=GRPOConfig(...),

    rollout_func=rollout_func,

    env=env,

)

trainer.train()

```

---

## Project Structure

```

sql_tutor_env/

|-- __init__.py

|-- models.py              # SQLAction, SQLObservation, SQLState

|-- client.py              # SQLTutorEnv (EnvClient subclass)

|-- openenv.yaml

|-- pyproject.toml

|-- README.md

`-- server/

    |-- __init__.py

    |-- app.py             # FastAPI app via create_app()

    |-- sql_environment.py # Core reset/step/state logic

    |-- challenges.py      # Bank of SQL bug challenges

    |-- requirements.txt

    `-- Dockerfile

```