Spaces:

hjerpe
/

sql_env

Sleeping

App Files Files Community

sql_env / specs /F006-DEMO.md

hjerpe

Upload folder using huggingface_hub

5dd1bb4 verified 3 months ago

preview code

Raw

History Blame Contribute Delete

7.26 kB

	# Feature Demo: F006 — GRPO Training Pipeline

	> Generated: 2026-03-28T07:42:55Z
	> Context source: spec + discovery only (implementation not read)
	> Feature entry: [FEATURES.json #F006](FEATURES.json)

	---

	## What This Feature Does

	This feature gives you a single notebook workflow to train an SQLEnv policy with GRPO, then compare behavior before vs after training. The user-facing goal is simple: run one notebook and see whether the trained policy explores the database more strategically than a random baseline.

	From a user perspective, success means the workflow is reproducible, the learning signal is visible, and the random-vs-trained comparison is easy to inspect in one place.

	---

	## What Is Already Proven

	### Verified in This Demo Run

	- Confirmed the training extra can import TRL GRPO classes locally (`trl-grpo-import-ok`).
	- Ran error-handling unit suite (`6 passed`) covering model-load failure, question-load failure modes, OOM guidance, and parse-fallback logging behavior.
	- Ran notebook-oriented E2E smoke suite (`5 passed`) covering structure, difficulty filtering, training step execution, and transcript generation.
	- Ran integration suite (`2 passed`) covering rollout + reward flow and unparseable-action recovery.
	- Attempted to launch the notebook UI; local environment currently lacks `jupyter` binary (captured below).

	### Previously Verified Evidence

	- `FEATURES.json` (F006) records independent verification as 68/68 tests passed with verifier result `approved` at `2026-03-28T07:37:20Z`.
	- Implementation spec Section 7 records full verification command passing and prior TRL import check.

	---

	## What Still Needs User Verification

	- Open and run `notebooks/train_grpo.ipynb` interactively in a machine with Jupyter available.
	- Validate the visual learning curve in the notebook output.
	- Validate side-by-side transcript quality (random vs trained) with your preferred model/runtime.

	---

	## Quickstart / Verification Steps

	> Run these commands to see the feature in action:

	```bash
	uv sync --extra training
	uv run --extra training python -c "from trl import GRPOConfig, GRPOTrainer; print('trl-grpo-import-ok')"
	uv run --with pytest pytest tests/e2e/test_training_e2e.py -v
	```

	If you want the interactive notebook UI, install Jupyter in your environment first.

	---

	## Live Local Proof

	### Attempt to Launch the Training Notebook UI

	This is the user-facing entrypoint described in the spec.

	```bash
	uv run jupyter notebook "notebooks/train_grpo.ipynb" --no-browser --port 8899
	```

	```
	error: Failed to spawn: `jupyter`
	Caused by: No such file or directory (os error 2)
	```

	What to notice: the notebook launch path is correct, but this environment does not currently have Jupyter installed, so interactive verification is handed off to the user.

	### Verify GRPO Training Dependencies Resolve Locally

	```bash
	uv run --extra training python -c "from trl import GRPOConfig, GRPOTrainer; print('trl-grpo-import-ok')"
	```

	```
	trl-grpo-import-ok
	```

	What to notice: the TRL GRPO surface required by the notebook is available in this environment when using the `training` extra.

	---

	## Existing Evidence

	- Source: `specs/FEATURES.json` (F006.verification_evidence)
	- `tests_run: 68`, `tests_passed: 68`, `verifier_result: approved`
	- Command recorded: `uv run --with pytest pytest tests/unit/test_grpo_config.py tests/unit/test_prompts.py tests/unit/test_rollout.py tests/unit/test_rewards.py tests/unit/test_error_handling.py tests/integration/test_training_pipeline.py tests/e2e/test_training_e2e.py -v`

	---

	## Manual Verification Checklist

	1. Install notebook runtime (`jupyter`) and training deps (`uv sync --extra training`).
	2. Launch notebook: `jupyter notebook notebooks/train_grpo.ipynb`.
	3. Run all cells end-to-end.
	4. Confirm training completes without runtime errors.
	5. Confirm reward/learning curve is rendered.
	6. Confirm random vs trained transcript comparison appears and is readable.
	7. Confirm model artifacts are written to the configured output directory.

	---

	## Edge Cases Exercised

	### Error-path handling (bad model, missing/invalid questions, parse fallback)

	```bash
	uv run --with pytest pytest tests/unit/test_error_handling.py -v
	```

	```
	============================= test session starts ==============================
	platform darwin -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /Users/hjerp/.cache/uv/builds-v0/.tmpA8Pzif/bin/python
	collecting ... collected 6 items

	tests/unit/test_error_handling.py::test_model_load_error_bad_name PASSED [ 16%]
	tests/unit/test_error_handling.py::test_question_load_missing_file PASSED [ 33%]
	tests/unit/test_error_handling.py::test_question_load_empty_file PASSED [ 50%]
	tests/unit/test_error_handling.py::test_question_load_invalid_json PASSED [ 66%]
	tests/unit/test_error_handling.py::test_oom_guidance PASSED [ 83%]
	tests/unit/test_error_handling.py::test_action_parse_fallback_logged PASSED [100%]

	============================== 6 passed in 4.68s ===============================
	```

	Why this matters: this verifies the most important failure modes fail clearly instead of silently.

	### Unparseable action recovery in integration flow

	```bash
	uv run --with pytest pytest tests/integration/test_training_pipeline.py -v
	```

	```
	============================= test session starts ==============================
	platform darwin -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /Users/hjerp/.cache/uv/builds-v0/.tmpn3aEqJ/bin/python
	collecting ... collected 2 items

	tests/integration/test_training_pipeline.py::test_training_pipeline_flow_with_reward_functions PASSED [ 50%]
	tests/integration/test_training_pipeline.py::test_unparseable_action_recovers_and_episode_continues PASSED [100%]

	============================== 2 passed in 3.87s ===============================
	```

	Why this matters: malformed model output does not crash the episode loop; training can continue.

	### Verification command mismatch in this environment (`--timeout` flag)

	```bash
	uv run --with pytest pytest tests/e2e/test_training_e2e.py -v --timeout=300
	```

	```
	ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
	pytest: error: unrecognized arguments: --timeout=300
	inifile: /Users/hjerp/Projects/sql-env/pyproject.toml
	rootdir: /Users/hjerp/Projects/sql-env
	```

	Why this matters: the spec-listed command assumes timeout-plugin support; local fallback without `--timeout` was required.

	---

	## Test Evidence (Optional)

	> Supplementary proof that the feature works correctly across all scenarios.
	> The Live Demo section above shows how to use the feature; this section shows it was tested.

	\| Test Suite \| Tests \| Status \|
	\|---\|---\|---\|
	\| Error handling unit tests \| 6 \| All passed \|
	\| E2E training notebook smoke tests \| 5 \| All passed \|
	\| Integration training pipeline tests \| 2 \| All passed \|

	Representative command (run in this demo):

	```bash
	uv run --with pytest pytest tests/e2e/test_training_e2e.py -v
	```

	Result summary:

	```
	5 passed in 3.83s
	```

	---

	## Feature Links

	- Implementation spec: `specs/F006-IMPLEMENTATION_SPEC.md`
	- Verification spec: `specs/F006-VERIFICATION_SPEC.md`

	---

	Demo generated by `feature-demo` agent. Re-run with `/feature-demo F006` to refresh.