# GRPO Training — Wildfire Containment Simulator [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Abrodolph/Wildfire-Containment-Simulator/blob/main/training/grpo_colab.ipynb) ## How to run Open `grpo_colab.ipynb` in Colab (T4 GPU runtime) and run cells in order: | Section | Cell(s) | What it does | |---------|---------|--------------| | 1 — Setup | 1–3 | Installs deps, clones repo, loads Qwen-2.5-1.5B with LoRA | | 2 — Rollout | 4–5 | Defines `collect_rollout()` using env + serializer + parser | | 3 — Training | 6–8 | Builds GRPO dataset, trains 50 steps with curriculum | | 4 — Checkpointing | 9 | Saves final adapter, verifies reload | | 5 — Plot | 10 | Plots reward curve with tier-promotion markers | **Resume from checkpoint:** The first cell of Section 3 auto-detects the latest `checkpoints/step_*` folder and loads it. Re-run from that cell to continue training. ## Expected runtime on T4 ~45 minutes for 50 GRPO steps (depends on episode length per tier). ## Downloading the trained adapter After training completes, run in a Colab cell: ```python from google.colab import files import shutil shutil.make_archive('wildfire_adapter', 'zip', 'checkpoints/final') files.download('wildfire_adapter.zip') ``` ## Local validation (no GPU needed) ```bash python training/test_notebook_imports.py ``` This checks all imports and runs a quick env smoke test without loading model weights.