sql_env / specs /F006-DEMO.md
hjerpe's picture
Upload folder using huggingface_hub
5dd1bb4 verified
|
Raw
History Blame Contribute Delete
7.26 kB
# Feature Demo: F006 — GRPO Training Pipeline
> **Generated:** 2026-03-28T07:42:55Z
> **Context source:** spec + discovery only (implementation not read)
> **Feature entry:** [FEATURES.json #F006](FEATURES.json)
---
## What This Feature Does
This feature gives you a single notebook workflow to train an SQLEnv policy with GRPO, then compare behavior before vs after training. The user-facing goal is simple: run one notebook and see whether the trained policy explores the database more strategically than a random baseline.
From a user perspective, success means the workflow is reproducible, the learning signal is visible, and the random-vs-trained comparison is easy to inspect in one place.
---
## What Is Already Proven
### Verified in This Demo Run
- Confirmed the training extra can import TRL GRPO classes locally (`trl-grpo-import-ok`).
- Ran error-handling unit suite (`6 passed`) covering model-load failure, question-load failure modes, OOM guidance, and parse-fallback logging behavior.
- Ran notebook-oriented E2E smoke suite (`5 passed`) covering structure, difficulty filtering, training step execution, and transcript generation.
- Ran integration suite (`2 passed`) covering rollout + reward flow and unparseable-action recovery.
- Attempted to launch the notebook UI; local environment currently lacks `jupyter` binary (captured below).
### Previously Verified Evidence
- `FEATURES.json` (F006) records independent verification as **68/68 tests passed** with verifier result `approved` at `2026-03-28T07:37:20Z`.
- Implementation spec Section 7 records full verification command passing and prior TRL import check.
---
## What Still Needs User Verification
- Open and run `notebooks/train_grpo.ipynb` interactively in a machine with Jupyter available.
- Validate the visual learning curve in the notebook output.
- Validate side-by-side transcript quality (random vs trained) with your preferred model/runtime.
---
## Quickstart / Verification Steps
> Run these commands to see the feature in action:
```bash
uv sync --extra training
uv run --extra training python -c "from trl import GRPOConfig, GRPOTrainer; print('trl-grpo-import-ok')"
uv run --with pytest pytest tests/e2e/test_training_e2e.py -v
```
If you want the interactive notebook UI, install Jupyter in your environment first.
---
## Live Local Proof
### Attempt to Launch the Training Notebook UI
This is the user-facing entrypoint described in the spec.
```bash
uv run jupyter notebook "notebooks/train_grpo.ipynb" --no-browser --port 8899
```
```
error: Failed to spawn: `jupyter`
Caused by: No such file or directory (os error 2)
```
What to notice: the notebook launch path is correct, but this environment does not currently have Jupyter installed, so interactive verification is handed off to the user.
### Verify GRPO Training Dependencies Resolve Locally
```bash
uv run --extra training python -c "from trl import GRPOConfig, GRPOTrainer; print('trl-grpo-import-ok')"
```
```
trl-grpo-import-ok
```
What to notice: the TRL GRPO surface required by the notebook is available in this environment when using the `training` extra.
---
## Existing Evidence
- Source: `specs/FEATURES.json` (F006.verification_evidence)
- `tests_run: 68`, `tests_passed: 68`, `verifier_result: approved`
- Command recorded: `uv run --with pytest pytest tests/unit/test_grpo_config.py tests/unit/test_prompts.py tests/unit/test_rollout.py tests/unit/test_rewards.py tests/unit/test_error_handling.py tests/integration/test_training_pipeline.py tests/e2e/test_training_e2e.py -v`
---
## Manual Verification Checklist
1. Install notebook runtime (`jupyter`) and training deps (`uv sync --extra training`).
2. Launch notebook: `jupyter notebook notebooks/train_grpo.ipynb`.
3. Run all cells end-to-end.
4. Confirm training completes without runtime errors.
5. Confirm reward/learning curve is rendered.
6. Confirm random vs trained transcript comparison appears and is readable.
7. Confirm model artifacts are written to the configured output directory.
---
## Edge Cases Exercised
### Error-path handling (bad model, missing/invalid questions, parse fallback)
```bash
uv run --with pytest pytest tests/unit/test_error_handling.py -v
```
```
============================= test session starts ==============================
platform darwin -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /Users/hjerp/.cache/uv/builds-v0/.tmpA8Pzif/bin/python
collecting ... collected 6 items
tests/unit/test_error_handling.py::test_model_load_error_bad_name PASSED [ 16%]
tests/unit/test_error_handling.py::test_question_load_missing_file PASSED [ 33%]
tests/unit/test_error_handling.py::test_question_load_empty_file PASSED [ 50%]
tests/unit/test_error_handling.py::test_question_load_invalid_json PASSED [ 66%]
tests/unit/test_error_handling.py::test_oom_guidance PASSED [ 83%]
tests/unit/test_error_handling.py::test_action_parse_fallback_logged PASSED [100%]
============================== 6 passed in 4.68s ===============================
```
Why this matters: this verifies the most important failure modes fail clearly instead of silently.
### Unparseable action recovery in integration flow
```bash
uv run --with pytest pytest tests/integration/test_training_pipeline.py -v
```
```
============================= test session starts ==============================
platform darwin -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /Users/hjerp/.cache/uv/builds-v0/.tmpn3aEqJ/bin/python
collecting ... collected 2 items
tests/integration/test_training_pipeline.py::test_training_pipeline_flow_with_reward_functions PASSED [ 50%]
tests/integration/test_training_pipeline.py::test_unparseable_action_recovers_and_episode_continues PASSED [100%]
============================== 2 passed in 3.87s ===============================
```
Why this matters: malformed model output does not crash the episode loop; training can continue.
### Verification command mismatch in this environment (`--timeout` flag)
```bash
uv run --with pytest pytest tests/e2e/test_training_e2e.py -v --timeout=300
```
```
ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
pytest: error: unrecognized arguments: --timeout=300
inifile: /Users/hjerp/Projects/sql-env/pyproject.toml
rootdir: /Users/hjerp/Projects/sql-env
```
Why this matters: the spec-listed command assumes timeout-plugin support; local fallback without `--timeout` was required.
---
## Test Evidence (Optional)
> Supplementary proof that the feature works correctly across all scenarios.
> The Live Demo section above shows how to use the feature; this section shows it was tested.
| Test Suite | Tests | Status |
|---|---|---|
| Error handling unit tests | 6 | All passed |
| E2E training notebook smoke tests | 5 | All passed |
| Integration training pipeline tests | 2 | All passed |
Representative command (run in this demo):
```bash
uv run --with pytest pytest tests/e2e/test_training_e2e.py -v
```
Result summary:
```
5 passed in 3.83s
```
---
## Feature Links
- Implementation spec: `specs/F006-IMPLEMENTATION_SPEC.md`
- Verification spec: `specs/F006-VERIFICATION_SPEC.md`
---
*Demo generated by `feature-demo` agent. Re-run with `/feature-demo F006` to refresh.*