sql_env / specs /F006-DEMO.md
hjerpe's picture
Upload folder using huggingface_hub
5dd1bb4 verified
|
Raw
History Blame Contribute Delete
7.26 kB

Feature Demo: F006 — GRPO Training Pipeline

Generated: 2026-03-28T07:42:55Z Context source: spec + discovery only (implementation not read) Feature entry: FEATURES.json #F006


What This Feature Does

This feature gives you a single notebook workflow to train an SQLEnv policy with GRPO, then compare behavior before vs after training. The user-facing goal is simple: run one notebook and see whether the trained policy explores the database more strategically than a random baseline.

From a user perspective, success means the workflow is reproducible, the learning signal is visible, and the random-vs-trained comparison is easy to inspect in one place.


What Is Already Proven

Verified in This Demo Run

  • Confirmed the training extra can import TRL GRPO classes locally (trl-grpo-import-ok).
  • Ran error-handling unit suite (6 passed) covering model-load failure, question-load failure modes, OOM guidance, and parse-fallback logging behavior.
  • Ran notebook-oriented E2E smoke suite (5 passed) covering structure, difficulty filtering, training step execution, and transcript generation.
  • Ran integration suite (2 passed) covering rollout + reward flow and unparseable-action recovery.
  • Attempted to launch the notebook UI; local environment currently lacks jupyter binary (captured below).

Previously Verified Evidence

  • FEATURES.json (F006) records independent verification as 68/68 tests passed with verifier result approved at 2026-03-28T07:37:20Z.
  • Implementation spec Section 7 records full verification command passing and prior TRL import check.

What Still Needs User Verification

  • Open and run notebooks/train_grpo.ipynb interactively in a machine with Jupyter available.
  • Validate the visual learning curve in the notebook output.
  • Validate side-by-side transcript quality (random vs trained) with your preferred model/runtime.

Quickstart / Verification Steps

Run these commands to see the feature in action:

uv sync --extra training
uv run --extra training python -c "from trl import GRPOConfig, GRPOTrainer; print('trl-grpo-import-ok')"
uv run --with pytest pytest tests/e2e/test_training_e2e.py -v

If you want the interactive notebook UI, install Jupyter in your environment first.


Live Local Proof

Attempt to Launch the Training Notebook UI

This is the user-facing entrypoint described in the spec.

uv run jupyter notebook "notebooks/train_grpo.ipynb" --no-browser --port 8899
error: Failed to spawn: `jupyter`
  Caused by: No such file or directory (os error 2)

What to notice: the notebook launch path is correct, but this environment does not currently have Jupyter installed, so interactive verification is handed off to the user.

Verify GRPO Training Dependencies Resolve Locally

uv run --extra training python -c "from trl import GRPOConfig, GRPOTrainer; print('trl-grpo-import-ok')"
trl-grpo-import-ok

What to notice: the TRL GRPO surface required by the notebook is available in this environment when using the training extra.


Existing Evidence

  • Source: specs/FEATURES.json (F006.verification_evidence)
    • tests_run: 68, tests_passed: 68, verifier_result: approved
    • Command recorded: uv run --with pytest pytest tests/unit/test_grpo_config.py tests/unit/test_prompts.py tests/unit/test_rollout.py tests/unit/test_rewards.py tests/unit/test_error_handling.py tests/integration/test_training_pipeline.py tests/e2e/test_training_e2e.py -v

Manual Verification Checklist

  1. Install notebook runtime (jupyter) and training deps (uv sync --extra training).
  2. Launch notebook: jupyter notebook notebooks/train_grpo.ipynb.
  3. Run all cells end-to-end.
  4. Confirm training completes without runtime errors.
  5. Confirm reward/learning curve is rendered.
  6. Confirm random vs trained transcript comparison appears and is readable.
  7. Confirm model artifacts are written to the configured output directory.

Edge Cases Exercised

Error-path handling (bad model, missing/invalid questions, parse fallback)

uv run --with pytest pytest tests/unit/test_error_handling.py -v
============================= test session starts ==============================
platform darwin -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /Users/hjerp/.cache/uv/builds-v0/.tmpA8Pzif/bin/python
collecting ... collected 6 items

tests/unit/test_error_handling.py::test_model_load_error_bad_name PASSED [ 16%]
tests/unit/test_error_handling.py::test_question_load_missing_file PASSED [ 33%]
tests/unit/test_error_handling.py::test_question_load_empty_file PASSED  [ 50%]
tests/unit/test_error_handling.py::test_question_load_invalid_json PASSED [ 66%]
tests/unit/test_error_handling.py::test_oom_guidance PASSED              [ 83%]
tests/unit/test_error_handling.py::test_action_parse_fallback_logged PASSED [100%]

============================== 6 passed in 4.68s ===============================

Why this matters: this verifies the most important failure modes fail clearly instead of silently.

Unparseable action recovery in integration flow

uv run --with pytest pytest tests/integration/test_training_pipeline.py -v
============================= test session starts ==============================
platform darwin -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /Users/hjerp/.cache/uv/builds-v0/.tmpn3aEqJ/bin/python
collecting ... collected 2 items

tests/integration/test_training_pipeline.py::test_training_pipeline_flow_with_reward_functions PASSED [ 50%]
tests/integration/test_training_pipeline.py::test_unparseable_action_recovers_and_episode_continues PASSED [100%]

============================== 2 passed in 3.87s ===============================

Why this matters: malformed model output does not crash the episode loop; training can continue.

Verification command mismatch in this environment (--timeout flag)

uv run --with pytest pytest tests/e2e/test_training_e2e.py -v --timeout=300
ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
pytest: error: unrecognized arguments: --timeout=300
  inifile: /Users/hjerp/Projects/sql-env/pyproject.toml
  rootdir: /Users/hjerp/Projects/sql-env

Why this matters: the spec-listed command assumes timeout-plugin support; local fallback without --timeout was required.


Test Evidence (Optional)

Supplementary proof that the feature works correctly across all scenarios. The Live Demo section above shows how to use the feature; this section shows it was tested.

Test Suite Tests Status
Error handling unit tests 6 All passed
E2E training notebook smoke tests 5 All passed
Integration training pipeline tests 2 All passed

Representative command (run in this demo):

uv run --with pytest pytest tests/e2e/test_training_e2e.py -v

Result summary:

5 passed in 3.83s

Feature Links

  • Implementation spec: specs/F006-IMPLEMENTATION_SPEC.md
  • Verification spec: specs/F006-VERIFICATION_SPEC.md

Demo generated by feature-demo agent. Re-run with /feature-demo F006 to refresh.