Spaces:

hjerpe
/

sql_env

Sleeping

File size: 7,264 Bytes

5dd1bb4

# Feature Demo: F006 — GRPO Training Pipeline

> **Generated:** 2026-03-28T07:42:55Z
> **Context source:** spec + discovery only (implementation not read)
> **Feature entry:** [FEATURES.json #F006](FEATURES.json)

---

## What This Feature Does

This feature gives you a single notebook workflow to train an SQLEnv policy with GRPO, then compare behavior before vs after training. The user-facing goal is simple: run one notebook and see whether the trained policy explores the database more strategically than a random baseline.

From a user perspective, success means the workflow is reproducible, the learning signal is visible, and the random-vs-trained comparison is easy to inspect in one place.

---

## What Is Already Proven

### Verified in This Demo Run

- Confirmed the training extra can import TRL GRPO classes locally (`trl-grpo-import-ok`).
- Ran error-handling unit suite (`6 passed`) covering model-load failure, question-load failure modes, OOM guidance, and parse-fallback logging behavior.
- Ran notebook-oriented E2E smoke suite (`5 passed`) covering structure, difficulty filtering, training step execution, and transcript generation.
- Ran integration suite (`2 passed`) covering rollout + reward flow and unparseable-action recovery.
- Attempted to launch the notebook UI; local environment currently lacks `jupyter` binary (captured below).

### Previously Verified Evidence

- `FEATURES.json` (F006) records independent verification as **68/68 tests passed** with verifier result `approved` at `2026-03-28T07:37:20Z`.
- Implementation spec Section 7 records full verification command passing and prior TRL import check.

---

## What Still Needs User Verification

- Open and run `notebooks/train_grpo.ipynb` interactively in a machine with Jupyter available.
- Validate the visual learning curve in the notebook output.
- Validate side-by-side transcript quality (random vs trained) with your preferred model/runtime.

---

## Quickstart / Verification Steps

> Run these commands to see the feature in action:

```bash
uv sync --extra training
uv run --extra training python -c "from trl import GRPOConfig, GRPOTrainer; print('trl-grpo-import-ok')"
uv run --with pytest pytest tests/e2e/test_training_e2e.py -v
```

If you want the interactive notebook UI, install Jupyter in your environment first.

---

## Live Local Proof

### Attempt to Launch the Training Notebook UI

This is the user-facing entrypoint described in the spec.

```bash
uv run jupyter notebook "notebooks/train_grpo.ipynb" --no-browser --port 8899
```

```
error: Failed to spawn: `jupyter`
  Caused by: No such file or directory (os error 2)
```

What to notice: the notebook launch path is correct, but this environment does not currently have Jupyter installed, so interactive verification is handed off to the user.

### Verify GRPO Training Dependencies Resolve Locally

```bash
uv run --extra training python -c "from trl import GRPOConfig, GRPOTrainer; print('trl-grpo-import-ok')"
```

```
trl-grpo-import-ok
```

What to notice: the TRL GRPO surface required by the notebook is available in this environment when using the `training` extra.

---

## Existing Evidence

- Source: `specs/FEATURES.json` (F006.verification_evidence)
  - `tests_run: 68`, `tests_passed: 68`, `verifier_result: approved`
  - Command recorded: `uv run --with pytest pytest tests/unit/test_grpo_config.py tests/unit/test_prompts.py tests/unit/test_rollout.py tests/unit/test_rewards.py tests/unit/test_error_handling.py tests/integration/test_training_pipeline.py tests/e2e/test_training_e2e.py -v`

---

## Manual Verification Checklist

1. Install notebook runtime (`jupyter`) and training deps (`uv sync --extra training`).
2. Launch notebook: `jupyter notebook notebooks/train_grpo.ipynb`.
3. Run all cells end-to-end.
4. Confirm training completes without runtime errors.
5. Confirm reward/learning curve is rendered.
6. Confirm random vs trained transcript comparison appears and is readable.
7. Confirm model artifacts are written to the configured output directory.

---

## Edge Cases Exercised

### Error-path handling (bad model, missing/invalid questions, parse fallback)

```bash
uv run --with pytest pytest tests/unit/test_error_handling.py -v
```

```
============================= test session starts ==============================
platform darwin -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /Users/hjerp/.cache/uv/builds-v0/.tmpA8Pzif/bin/python
collecting ... collected 6 items

tests/unit/test_error_handling.py::test_model_load_error_bad_name PASSED [ 16%]
tests/unit/test_error_handling.py::test_question_load_missing_file PASSED [ 33%]
tests/unit/test_error_handling.py::test_question_load_empty_file PASSED  [ 50%]
tests/unit/test_error_handling.py::test_question_load_invalid_json PASSED [ 66%]
tests/unit/test_error_handling.py::test_oom_guidance PASSED              [ 83%]
tests/unit/test_error_handling.py::test_action_parse_fallback_logged PASSED [100%]

============================== 6 passed in 4.68s ===============================
```

Why this matters: this verifies the most important failure modes fail clearly instead of silently.

### Unparseable action recovery in integration flow

```bash
uv run --with pytest pytest tests/integration/test_training_pipeline.py -v
```

```
============================= test session starts ==============================
platform darwin -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /Users/hjerp/.cache/uv/builds-v0/.tmpn3aEqJ/bin/python
collecting ... collected 2 items

tests/integration/test_training_pipeline.py::test_training_pipeline_flow_with_reward_functions PASSED [ 50%]
tests/integration/test_training_pipeline.py::test_unparseable_action_recovers_and_episode_continues PASSED [100%]

============================== 2 passed in 3.87s ===============================
```

Why this matters: malformed model output does not crash the episode loop; training can continue.

### Verification command mismatch in this environment (`--timeout` flag)

```bash
uv run --with pytest pytest tests/e2e/test_training_e2e.py -v --timeout=300
```

```
ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
pytest: error: unrecognized arguments: --timeout=300
  inifile: /Users/hjerp/Projects/sql-env/pyproject.toml
  rootdir: /Users/hjerp/Projects/sql-env
```

Why this matters: the spec-listed command assumes timeout-plugin support; local fallback without `--timeout` was required.

---

## Test Evidence (Optional)

> Supplementary proof that the feature works correctly across all scenarios.
> The Live Demo section above shows how to use the feature; this section shows it was tested.

| Test Suite | Tests | Status |
|---|---|---|
| Error handling unit tests | 6 | All passed |
| E2E training notebook smoke tests | 5 | All passed |
| Integration training pipeline tests | 2 | All passed |

Representative command (run in this demo):

```bash
uv run --with pytest pytest tests/e2e/test_training_e2e.py -v
```

Result summary:

```
5 passed in 3.83s
```

---

## Feature Links

- Implementation spec: `specs/F006-IMPLEMENTATION_SPEC.md`
- Verification spec: `specs/F006-VERIFICATION_SPEC.md`

---

*Demo generated by `feature-demo` agent. Re-run with `/feature-demo F006` to refresh.*