| # GRPO Training β Wildfire Containment Simulator |
|
|
| [](https://colab.research.google.com/github/Abrodolph/Wildfire-Containment-Simulator/blob/main/training/grpo_colab.ipynb) |
|
|
| ## How to run |
|
|
| Open `grpo_colab.ipynb` in Colab (T4 GPU runtime) and run cells in order: |
|
|
| | Section | Cell(s) | What it does | |
| |---------|---------|--------------| |
| | 1 β Setup | 1β3 | Installs deps, clones repo, loads Qwen-2.5-1.5B with LoRA | |
| | 2 β Rollout | 4β5 | Defines `collect_rollout()` using env + serializer + parser | |
| | 3 β Training | 6β8 | Builds GRPO dataset, trains 50 steps with curriculum | |
| | 4 β Checkpointing | 9 | Saves final adapter, verifies reload | |
| | 5 β Plot | 10 | Plots reward curve with tier-promotion markers | |
|
|
| **Resume from checkpoint:** The first cell of Section 3 auto-detects the latest `checkpoints/step_*` folder and loads it. Re-run from that cell to continue training. |
|
|
| ## Expected runtime on T4 |
|
|
| ~45 minutes for 50 GRPO steps (depends on episode length per tier). |
|
|
| ## Downloading the trained adapter |
|
|
| After training completes, run in a Colab cell: |
|
|
| ```python |
| from google.colab import files |
| import shutil |
| shutil.make_archive('wildfire_adapter', 'zip', 'checkpoints/final') |
| files.download('wildfire_adapter.zip') |
| ``` |
|
|
| ## Local validation (no GPU needed) |
|
|
| ```bash |
| python training/test_notebook_imports.py |
| ``` |
|
|
| This checks all imports and runs a quick env smoke test without loading model weights. |
|
|