Spaces:
Sleeping
title: SQLDrift
emoji: π
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 8000
short_description: SQL repair and optimization under live schema drift
base_path: /web
SQLDrift
An OpenEnv gym that teaches an LLM agent to repair and optimize SQL when the database schema and business rules shift out from under it.
Why this problem exists
Production analytics breaks in familiar ways: a deploy renames a column, splits an enum into two fields, or changes how dates are stored. The dashboard query that worked on Monday may error or return wrong results on Tuesday. LLMs are a natural fitβthey can read changelogs, run diagnostics, and propose rewritesβbut static SQL benchmarks assume a fixed schema. They cannot reward an agent for noticing that the world changed during a task or for trading correctness against latency under a tool budget.
SQLDrift closes that gap: each episode is a small DuckDB world, a slow baseline
SELECT, a fixed tool budget, and (for drift scenarios) a mid-episode schema or rule
change the agent must recover from before submitting a rewrite that is both semantically
correct and materially faster than the baseline.
Live environment (Hugging Face Space)
Run the OpenEnv server without cloning the repo:
SQLDrift on Hugging Face Spaces
For TRL / client rollouts, point SQL_DRIFT_ENV_URL (or your SqlDriftEnv /
SqlDriftToolEnv base URL) at the Space runtime, for example
https://visheshrathi-sql-drift-env.hf.space β same host the training notebook
health-checks by default. If you need many concurrent WebSocket sessions (e.g.
num_generations > 1 in GRPO), duplicate the Space to your own account per the
TRL OpenEnv concurrency notes.
SQLDrift is a production-grade OpenEnv
environment for RL fine-tuning with TRLβs
GRPOTrainer (and
Unsloth LoRA on consumer GPUs). Each episode
gives the agent a slow baseline query, a lightly populated DuckDB database, read-only
introspection and execution tools, and a 25-step budget. Mid-episode, a schema or
business-rule drift event may fire; the agent must read the deploy changelog,
re-diagnose, and submit a rewrite that is (a) semantically correct and (b) β₯ 1.2Γ
faster than the baseline. A hierarchical rubric turns that into six reward signals
(correctness, drift-adapt, speedup, step-tax, gatekeepers, consult-dba).
Highlights:
- 10 hand-crafted scenarios β 6 classic anti-patterns (correlated subqueries,
SELECT *joins, Cartesian joins,DISTINCTgroupbys, nested subqueries,HAVING-as-WHERE) and 4 drift events (column rename, date format flip, enum rule split, field deprecation). - Deterministic fixtures β 15β30-column schemas, 2β4 tables per scenario,
regenerated in-process at
reset()from a seed. No Parquet files, no pre-baked sqlite dumps. - Sqlglot-canonicalised baseline-verbatim gate β agents that rename whitespace and resubmit the baseline cannot farm the +1.0 correct bonus.
- Skill library β 12 pre-seeded playbook/drift-card entries plus an on-disk,
fcntl.flock-guarded JSON store that grows as the agent solves episodes, surfaced nextreset()via Jaccard retrieval. - Feature-flagged DBA Oracle β 3-tier escalating hints per scenario, penalised by
the
ConsultDBArubric. Off by default. - Dockerised β
server/Dockerfilelayers the env onghcr.io/meta-pytorch/openenv-baseand exposes/health,/reset,/step,/wsout-of-the-box.
How the environment works
reset(scenario_id, seed, β¦)builds a fresh in-memory DuckDB instance, loads the scenarioβs tables and rows, and returns an observation with the baseline SQL, phase (DIAGNOSEβ β¦), step budget, andlearned_hintsfrom the skill library.- Tooling β The agent acts via OpenEnv actions: list/describe tables, sample rows,
EXPLAIN, timed read-onlyrun_query,read_changelog, optionalconsult_dba, and terminalsubmit_rewrite. Each step consumes budget and accrues shaping reward (e.g. step tax and gatekeeper penalties). - Drift β On drift scenarios, an event fires in a configured step window: DDL / rule
operators run idempotently (
engine/drift.py); the changelog updates; ground truth and baseline behaviour may change; the observation signalsdrift_recoveryso the agent can adapt. - Verification β Submissions are checked with an order-independent result hash
(
engine/verifier.py) and median-of-3 timed execution against the baseline (engine/profiler.py). Correctness and speedup components of the rubric enforce the β₯ 1.2Γ speedup bar for the top correctness tier. - Reward β
SqlDriftRubriccomposes six child rubrics; every observation exposesreward_components(e.g.r_correct,r_drift,r_speedup,r_step_tax,r_gatekeepers,r_consult_dba) for analysis and logging. - Server β
server/app.pyuses the OpenEnv FastAPI factory;client.pyprovides a WebSocketSqlDriftEnvfor remote episodes. Seeopenenv.yamlfor the Space-oriented manifest.
Results
Random baseline (floor)
We ship a reproducible random-agent evaluation under
outputs/evals/baseline_random_v1/ (50 episodes, 5 seeds Γ 10 scenarios). It establishes
a floor for any trained policy:
| Metric | Value |
|---|---|
| Mean total reward | β2.048 (Ο β 0.70) |
| Pass rate (reward β₯ 0.5) | 0% |
| Submit rate | 80% |
Mean reward is roughly β1.8 to β2.3 per scenario; drift scenarios trend slightly
harder than static anti-pattern scenarios. See
outputs/evals/baseline_random_v1/report.md
for the full per-scenario table and component bars.
Trained policies
GRPO training is driven by training/grpo_train.py and sql_drift_grpo_training.ipynb.
After a run, compare checkpoints with:
uv run python -m training.eval \
--checkpoint path/to/adapter \
--scenarios 1-10 \
--seeds-per-scenario 5 \
--out outputs/evals/my_run
Training logs and plots: training/grpo_train.py can emit step-wise JSONL;
utilities/plot_curves.py produces reward/loss figures under training/evidence/ when
that log exists.
Quick start
git clone <your-fork>/sql_drift_env.git
cd sql_drift_env
uv sync # install deps
uv run pytest -q # 300+ tests, ~90s on CPU
uv run uvicorn server.app:app --reload # serve the env on :8000
Or via Docker:
docker build -f server/Dockerfile -t sql-drift-env:latest .
docker run -p 8000:8000 sql-drift-env:latest
curl -s http://localhost:8000/health
Programmatic rollout
A five-step in-process episode against scenario 01_correlated_subquery:
from client import SqlDriftEnv
from server import SqlDriftEnvironment
env = SqlDriftEnvironment()
obs = env.reset(seed=42, scenario_id="01_correlated_subquery")
print(obs.learned_hints) # surfaced from skill library
obs = env.step(SqlDriftEnv.action_list_tables())
obs = env.step(SqlDriftEnv.action_describe_table("users"))
obs = env.step(SqlDriftEnv.action_run_query("SELECT COUNT(*) FROM users"))
obs = env.step(SqlDriftEnv.action_submit_rewrite(
"SELECT u.*, COALESCE(c.n, 0) FROM users u "
"LEFT JOIN (SELECT user_id, COUNT(*) AS n FROM orders GROUP BY 1) c "
"ON c.user_id = u.user_id"
))
print(obs.reward, obs.reward_components)
env.close()
End-to-end over an HTTP+WS OpenEnv server, see SqlDriftEnv in client.py and the
integration test suite in tests/integration/test_client_server.py /
tests/integration/test_state_no_leak.py.
Evaluation
uv run python -m training.eval \
--checkpoint base \
--scenarios 1-10 \
--seeds-per-scenario 5 \
--out outputs/evals/my_run
Emits report.md, per_episode.csv, and summary.json.
Training (GPU)
training/grpo_train.py contains the GRPO entrypoint used by the hackathon training
notebook: it builds the curriculum dataset, loads Qwen/Qwen3-1.7B (or your
SQL_DRIFT_MODEL_NAME) via transformers.AutoModelForCausalLM + BitsAndBytesConfig
4-bit nf4 (QLoRA), attaches a PEFT LoRA adapter, and lets TRL's GRPOTrainer drive
multi-turn OpenEnv rollouts through SqlDriftToolEnv via environment_factory. The
install/runtime stack mirrors Hugging Face TRL's reference notebooks
(grpo_trl_lora_qlora.ipynb + openenv_wordle_grpo.ipynb). Open
sql_drift_grpo_training.ipynb on a free Colab T4, set SQL_DRIFT_ENV_URL to your
deployed SQLDrift Space (see Live environment),
and run all cells.
uv sync --extra train # installs trl, transformers, accelerate, unsloth
uv sync --extra evidence # matplotlib + pandas for utilities/plot_curves.py
Repository layout
sql_drift_env/
βββ models.py # Pydantic v2 action/observation/state
βββ client.py # SqlDriftEnv EnvClient (/ws)
βββ engine/
β βββ runtime.py # private RuntimeEpisodeState
β βββ drift.py # 4 DDL drift operators
β βββ reward.py # SqlDriftRubric (6 child rubrics)
β βββ verifier.py
| |-- profiler.py
βββ scenarios/ # 10 hand-crafted scenario modules + registry
βββ skill_library/ # pre-seeds + JSON store + Jaccard retrieval
βββ actors/ # engineering_manager (changelog), dba_oracle
βββ server/ # FastAPI app, Dockerfile, env wrapper class
βββ training/ # config, prompt, random_agent, grpo_train, eval
βββ utilities/ # env_loader, logger, plot_curves, run_training_job (HF Jobs), β¦
βββ tests/ # 300+ unit + integration tests
βββ outputs/evals/ # baseline eval artifacts
βββ design/ # design docs (ignored by docker)
References
- SQLDrift β Hugging Face Space β deployed OpenEnv server (Docker SDK).