sql_tutor_env / README.md
snigenigmatic's picture
Upload folder using huggingface_hub
0683cf4 verified
metadata
title: SQL Tutor Env
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
tags:
  - openenv
  - openenv-main
  - rl-environment
base_path: /web

SQL Tutor Environment

An OpenEnv reinforcement learning environment that trains LLM agents to identify and fix bugs in SQL queries.

Built for the Meta x Hugging Face x PyTorch India Hackathon 2026.


Task

At each episode the agent receives:

  • A broken SQL query with a deliberate bug
  • The database schema (tables and columns)
  • A task description of what the correct query should return

The agent must either:

  1. Submit a fix (submit_fix) - provide a corrected SQL query
  2. Request a hint (request_hint) - get a progressive hint (with a small reward penalty)

The episode ends when the agent submits a correct query or exhausts its 5 allowed actions.


Reward Structure

Outcome Reward
Correct fix, no hints, first try +1.0
Correct fix with hints / retries +0.1 to +0.95 (scaled down)
SQL syntax error -0.1
Wrong query (valid SQL, wrong result) -0.05
Requesting a hint -0.1
Max steps reached without solving 0

Challenge Types (5 built-in)

ID Bug Type
wrong_aggregate Missing SUM() + GROUP BY
wrong_join INNER JOIN should be LEFT JOIN
off_by_one_filter Wrong comparison operator in WHERE
missing_having WHERE used instead of HAVING for aggregate filter
wrong_order_limit ASC should be DESC for top-N query

Quick Start

from openenv.core import EnvClient
from sql_tutor_env.client import SQLTutorEnv
from sql_tutor_env.models import SQLAction

# Connect to the running HF Space
env = SQLTutorEnv(base_url="https://your-username-sql-tutor-env.hf.space")

# Start an episode
obs, state = env.reset()
print(f"Task: {obs.task_description}")
print(f"Broken query: {obs.broken_query}")

# Submit a fix
result = env.step(SQLAction(
    action_type="submit_fix",
    sql_query="SELECT customer_id, SUM(amount) AS total_amount FROM orders WHERE status = 'completed' GROUP BY customer_id ORDER BY customer_id;"
))
print(f"Correct: {result.observation.is_correct}, Reward: {result.reward}")

Integration with TRL / GRPOTrainer

from trl import GRPOTrainer, GRPOConfig
from sql_tutor_env.client import SQLTutorEnv
from sql_tutor_env.models import SQLAction

def rollout_func(prompts, env):
    obs, _ = env.reset()
    # ... build prompt from obs, call model, parse SQL, step env
    pass

env = SQLTutorEnv(base_url="https://your-space.hf.space")
trainer = GRPOTrainer(
    model=model,
    config=GRPOConfig(...),
    rollout_func=rollout_func,
    env=env,
)
trainer.train()

Project Structure

sql_tutor_env/
|-- __init__.py
|-- models.py              # SQLAction, SQLObservation, SQLState
|-- client.py              # SQLTutorEnv (EnvClient subclass)
|-- openenv.yaml
|-- pyproject.toml
|-- README.md
`-- server/
    |-- __init__.py
    |-- app.py             # FastAPI app via create_app()
    |-- sql_environment.py # Core reset/step/state logic
    |-- challenges.py      # Bank of SQL bug challenges
    |-- requirements.txt
    `-- Dockerfile