File size: 7,264 Bytes
5dd1bb4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
# Feature Demo: F006 — GRPO Training Pipeline

> **Generated:** 2026-03-28T07:42:55Z
> **Context source:** spec + discovery only (implementation not read)
> **Feature entry:** [FEATURES.json #F006](FEATURES.json)

---

## What This Feature Does

This feature gives you a single notebook workflow to train an SQLEnv policy with GRPO, then compare behavior before vs after training. The user-facing goal is simple: run one notebook and see whether the trained policy explores the database more strategically than a random baseline.

From a user perspective, success means the workflow is reproducible, the learning signal is visible, and the random-vs-trained comparison is easy to inspect in one place.

---

## What Is Already Proven

### Verified in This Demo Run

- Confirmed the training extra can import TRL GRPO classes locally (`trl-grpo-import-ok`).
- Ran error-handling unit suite (`6 passed`) covering model-load failure, question-load failure modes, OOM guidance, and parse-fallback logging behavior.
- Ran notebook-oriented E2E smoke suite (`5 passed`) covering structure, difficulty filtering, training step execution, and transcript generation.
- Ran integration suite (`2 passed`) covering rollout + reward flow and unparseable-action recovery.
- Attempted to launch the notebook UI; local environment currently lacks `jupyter` binary (captured below).

### Previously Verified Evidence

- `FEATURES.json` (F006) records independent verification as **68/68 tests passed** with verifier result `approved` at `2026-03-28T07:37:20Z`.
- Implementation spec Section 7 records full verification command passing and prior TRL import check.

---

## What Still Needs User Verification

- Open and run `notebooks/train_grpo.ipynb` interactively in a machine with Jupyter available.
- Validate the visual learning curve in the notebook output.
- Validate side-by-side transcript quality (random vs trained) with your preferred model/runtime.

---

## Quickstart / Verification Steps

> Run these commands to see the feature in action:

```bash
uv sync --extra training
uv run --extra training python -c "from trl import GRPOConfig, GRPOTrainer; print('trl-grpo-import-ok')"
uv run --with pytest pytest tests/e2e/test_training_e2e.py -v
```

If you want the interactive notebook UI, install Jupyter in your environment first.

---

## Live Local Proof

### Attempt to Launch the Training Notebook UI

This is the user-facing entrypoint described in the spec.

```bash
uv run jupyter notebook "notebooks/train_grpo.ipynb" --no-browser --port 8899
```

```
error: Failed to spawn: `jupyter`
  Caused by: No such file or directory (os error 2)
```

What to notice: the notebook launch path is correct, but this environment does not currently have Jupyter installed, so interactive verification is handed off to the user.

### Verify GRPO Training Dependencies Resolve Locally

```bash
uv run --extra training python -c "from trl import GRPOConfig, GRPOTrainer; print('trl-grpo-import-ok')"
```

```
trl-grpo-import-ok
```

What to notice: the TRL GRPO surface required by the notebook is available in this environment when using the `training` extra.

---

## Existing Evidence

- Source: `specs/FEATURES.json` (F006.verification_evidence)
  - `tests_run: 68`, `tests_passed: 68`, `verifier_result: approved`
  - Command recorded: `uv run --with pytest pytest tests/unit/test_grpo_config.py tests/unit/test_prompts.py tests/unit/test_rollout.py tests/unit/test_rewards.py tests/unit/test_error_handling.py tests/integration/test_training_pipeline.py tests/e2e/test_training_e2e.py -v`

---

## Manual Verification Checklist

1. Install notebook runtime (`jupyter`) and training deps (`uv sync --extra training`).
2. Launch notebook: `jupyter notebook notebooks/train_grpo.ipynb`.
3. Run all cells end-to-end.
4. Confirm training completes without runtime errors.
5. Confirm reward/learning curve is rendered.
6. Confirm random vs trained transcript comparison appears and is readable.
7. Confirm model artifacts are written to the configured output directory.

---

## Edge Cases Exercised

### Error-path handling (bad model, missing/invalid questions, parse fallback)

```bash
uv run --with pytest pytest tests/unit/test_error_handling.py -v
```

```
============================= test session starts ==============================
platform darwin -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /Users/hjerp/.cache/uv/builds-v0/.tmpA8Pzif/bin/python
collecting ... collected 6 items

tests/unit/test_error_handling.py::test_model_load_error_bad_name PASSED [ 16%]
tests/unit/test_error_handling.py::test_question_load_missing_file PASSED [ 33%]
tests/unit/test_error_handling.py::test_question_load_empty_file PASSED  [ 50%]
tests/unit/test_error_handling.py::test_question_load_invalid_json PASSED [ 66%]
tests/unit/test_error_handling.py::test_oom_guidance PASSED              [ 83%]
tests/unit/test_error_handling.py::test_action_parse_fallback_logged PASSED [100%]

============================== 6 passed in 4.68s ===============================
```

Why this matters: this verifies the most important failure modes fail clearly instead of silently.

### Unparseable action recovery in integration flow

```bash
uv run --with pytest pytest tests/integration/test_training_pipeline.py -v
```

```
============================= test session starts ==============================
platform darwin -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /Users/hjerp/.cache/uv/builds-v0/.tmpn3aEqJ/bin/python
collecting ... collected 2 items

tests/integration/test_training_pipeline.py::test_training_pipeline_flow_with_reward_functions PASSED [ 50%]
tests/integration/test_training_pipeline.py::test_unparseable_action_recovers_and_episode_continues PASSED [100%]

============================== 2 passed in 3.87s ===============================
```

Why this matters: malformed model output does not crash the episode loop; training can continue.

### Verification command mismatch in this environment (`--timeout` flag)

```bash
uv run --with pytest pytest tests/e2e/test_training_e2e.py -v --timeout=300
```

```
ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
pytest: error: unrecognized arguments: --timeout=300
  inifile: /Users/hjerp/Projects/sql-env/pyproject.toml
  rootdir: /Users/hjerp/Projects/sql-env
```

Why this matters: the spec-listed command assumes timeout-plugin support; local fallback without `--timeout` was required.

---

## Test Evidence (Optional)

> Supplementary proof that the feature works correctly across all scenarios.
> The Live Demo section above shows how to use the feature; this section shows it was tested.

| Test Suite | Tests | Status |
|---|---|---|
| Error handling unit tests | 6 | All passed |
| E2E training notebook smoke tests | 5 | All passed |
| Integration training pipeline tests | 2 | All passed |

Representative command (run in this demo):

```bash
uv run --with pytest pytest tests/e2e/test_training_e2e.py -v
```

Result summary:

```
5 passed in 3.83s
```

---

## Feature Links

- Implementation spec: `specs/F006-IMPLEMENTATION_SPEC.md`
- Verification spec: `specs/F006-VERIFICATION_SPEC.md`

---

*Demo generated by `feature-demo` agent. Re-run with `/feature-demo F006` to refresh.*