GRPO Training β Wildfire Containment Simulator
How to run
Open grpo_colab.ipynb in Colab (T4 GPU runtime) and run cells in order:
| Section | Cell(s) | What it does |
|---|---|---|
| 1 β Setup | 1β3 | Installs deps, clones repo, loads Qwen-2.5-1.5B with LoRA |
| 2 β Rollout | 4β5 | Defines collect_rollout() using env + serializer + parser |
| 3 β Training | 6β8 | Builds GRPO dataset, trains 50 steps with curriculum |
| 4 β Checkpointing | 9 | Saves final adapter, verifies reload |
| 5 β Plot | 10 | Plots reward curve with tier-promotion markers |
Resume from checkpoint: The first cell of Section 3 auto-detects the latest checkpoints/step_* folder and loads it. Re-run from that cell to continue training.
Expected runtime on T4
~45 minutes for 50 GRPO steps (depends on episode length per tier).
Downloading the trained adapter
After training completes, run in a Colab cell:
from google.colab import files
import shutil
shutil.make_archive('wildfire_adapter', 'zip', 'checkpoints/final')
files.download('wildfire_adapter.zip')
Local validation (no GPU needed)
python training/test_notebook_imports.py
This checks all imports and runs a quick env smoke test without loading model weights.