File size: 7,264 Bytes
5dd1bb4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 | # Feature Demo: F006 — GRPO Training Pipeline
> **Generated:** 2026-03-28T07:42:55Z
> **Context source:** spec + discovery only (implementation not read)
> **Feature entry:** [FEATURES.json #F006](FEATURES.json)
---
## What This Feature Does
This feature gives you a single notebook workflow to train an SQLEnv policy with GRPO, then compare behavior before vs after training. The user-facing goal is simple: run one notebook and see whether the trained policy explores the database more strategically than a random baseline.
From a user perspective, success means the workflow is reproducible, the learning signal is visible, and the random-vs-trained comparison is easy to inspect in one place.
---
## What Is Already Proven
### Verified in This Demo Run
- Confirmed the training extra can import TRL GRPO classes locally (`trl-grpo-import-ok`).
- Ran error-handling unit suite (`6 passed`) covering model-load failure, question-load failure modes, OOM guidance, and parse-fallback logging behavior.
- Ran notebook-oriented E2E smoke suite (`5 passed`) covering structure, difficulty filtering, training step execution, and transcript generation.
- Ran integration suite (`2 passed`) covering rollout + reward flow and unparseable-action recovery.
- Attempted to launch the notebook UI; local environment currently lacks `jupyter` binary (captured below).
### Previously Verified Evidence
- `FEATURES.json` (F006) records independent verification as **68/68 tests passed** with verifier result `approved` at `2026-03-28T07:37:20Z`.
- Implementation spec Section 7 records full verification command passing and prior TRL import check.
---
## What Still Needs User Verification
- Open and run `notebooks/train_grpo.ipynb` interactively in a machine with Jupyter available.
- Validate the visual learning curve in the notebook output.
- Validate side-by-side transcript quality (random vs trained) with your preferred model/runtime.
---
## Quickstart / Verification Steps
> Run these commands to see the feature in action:
```bash
uv sync --extra training
uv run --extra training python -c "from trl import GRPOConfig, GRPOTrainer; print('trl-grpo-import-ok')"
uv run --with pytest pytest tests/e2e/test_training_e2e.py -v
```
If you want the interactive notebook UI, install Jupyter in your environment first.
---
## Live Local Proof
### Attempt to Launch the Training Notebook UI
This is the user-facing entrypoint described in the spec.
```bash
uv run jupyter notebook "notebooks/train_grpo.ipynb" --no-browser --port 8899
```
```
error: Failed to spawn: `jupyter`
Caused by: No such file or directory (os error 2)
```
What to notice: the notebook launch path is correct, but this environment does not currently have Jupyter installed, so interactive verification is handed off to the user.
### Verify GRPO Training Dependencies Resolve Locally
```bash
uv run --extra training python -c "from trl import GRPOConfig, GRPOTrainer; print('trl-grpo-import-ok')"
```
```
trl-grpo-import-ok
```
What to notice: the TRL GRPO surface required by the notebook is available in this environment when using the `training` extra.
---
## Existing Evidence
- Source: `specs/FEATURES.json` (F006.verification_evidence)
- `tests_run: 68`, `tests_passed: 68`, `verifier_result: approved`
- Command recorded: `uv run --with pytest pytest tests/unit/test_grpo_config.py tests/unit/test_prompts.py tests/unit/test_rollout.py tests/unit/test_rewards.py tests/unit/test_error_handling.py tests/integration/test_training_pipeline.py tests/e2e/test_training_e2e.py -v`
---
## Manual Verification Checklist
1. Install notebook runtime (`jupyter`) and training deps (`uv sync --extra training`).
2. Launch notebook: `jupyter notebook notebooks/train_grpo.ipynb`.
3. Run all cells end-to-end.
4. Confirm training completes without runtime errors.
5. Confirm reward/learning curve is rendered.
6. Confirm random vs trained transcript comparison appears and is readable.
7. Confirm model artifacts are written to the configured output directory.
---
## Edge Cases Exercised
### Error-path handling (bad model, missing/invalid questions, parse fallback)
```bash
uv run --with pytest pytest tests/unit/test_error_handling.py -v
```
```
============================= test session starts ==============================
platform darwin -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /Users/hjerp/.cache/uv/builds-v0/.tmpA8Pzif/bin/python
collecting ... collected 6 items
tests/unit/test_error_handling.py::test_model_load_error_bad_name PASSED [ 16%]
tests/unit/test_error_handling.py::test_question_load_missing_file PASSED [ 33%]
tests/unit/test_error_handling.py::test_question_load_empty_file PASSED [ 50%]
tests/unit/test_error_handling.py::test_question_load_invalid_json PASSED [ 66%]
tests/unit/test_error_handling.py::test_oom_guidance PASSED [ 83%]
tests/unit/test_error_handling.py::test_action_parse_fallback_logged PASSED [100%]
============================== 6 passed in 4.68s ===============================
```
Why this matters: this verifies the most important failure modes fail clearly instead of silently.
### Unparseable action recovery in integration flow
```bash
uv run --with pytest pytest tests/integration/test_training_pipeline.py -v
```
```
============================= test session starts ==============================
platform darwin -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /Users/hjerp/.cache/uv/builds-v0/.tmpn3aEqJ/bin/python
collecting ... collected 2 items
tests/integration/test_training_pipeline.py::test_training_pipeline_flow_with_reward_functions PASSED [ 50%]
tests/integration/test_training_pipeline.py::test_unparseable_action_recovers_and_episode_continues PASSED [100%]
============================== 2 passed in 3.87s ===============================
```
Why this matters: malformed model output does not crash the episode loop; training can continue.
### Verification command mismatch in this environment (`--timeout` flag)
```bash
uv run --with pytest pytest tests/e2e/test_training_e2e.py -v --timeout=300
```
```
ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
pytest: error: unrecognized arguments: --timeout=300
inifile: /Users/hjerp/Projects/sql-env/pyproject.toml
rootdir: /Users/hjerp/Projects/sql-env
```
Why this matters: the spec-listed command assumes timeout-plugin support; local fallback without `--timeout` was required.
---
## Test Evidence (Optional)
> Supplementary proof that the feature works correctly across all scenarios.
> The Live Demo section above shows how to use the feature; this section shows it was tested.
| Test Suite | Tests | Status |
|---|---|---|
| Error handling unit tests | 6 | All passed |
| E2E training notebook smoke tests | 5 | All passed |
| Integration training pipeline tests | 2 | All passed |
Representative command (run in this demo):
```bash
uv run --with pytest pytest tests/e2e/test_training_e2e.py -v
```
Result summary:
```
5 passed in 3.83s
```
---
## Feature Links
- Implementation spec: `specs/F006-IMPLEMENTATION_SPEC.md`
- Verification spec: `specs/F006-VERIFICATION_SPEC.md`
---
*Demo generated by `feature-demo` agent. Re-run with `/feature-demo F006` to refresh.*
|