Eshit's picture
Deploy to HF Space
363abf3
# GRPO Training β€” Wildfire Containment Simulator
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Abrodolph/Wildfire-Containment-Simulator/blob/main/training/grpo_colab.ipynb)
## How to run
Open `grpo_colab.ipynb` in Colab (T4 GPU runtime) and run cells in order:
| Section | Cell(s) | What it does |
|---------|---------|--------------|
| 1 β€” Setup | 1–3 | Installs deps, clones repo, loads Qwen-2.5-1.5B with LoRA |
| 2 β€” Rollout | 4–5 | Defines `collect_rollout()` using env + serializer + parser |
| 3 β€” Training | 6–8 | Builds GRPO dataset, trains 50 steps with curriculum |
| 4 β€” Checkpointing | 9 | Saves final adapter, verifies reload |
| 5 β€” Plot | 10 | Plots reward curve with tier-promotion markers |
**Resume from checkpoint:** The first cell of Section 3 auto-detects the latest `checkpoints/step_*` folder and loads it. Re-run from that cell to continue training.
## Expected runtime on T4
~45 minutes for 50 GRPO steps (depends on episode length per tier).
## Downloading the trained adapter
After training completes, run in a Colab cell:
```python
from google.colab import files
import shutil
shutil.make_archive('wildfire_adapter', 'zip', 'checkpoints/final')
files.download('wildfire_adapter.zip')
```
## Local validation (no GPU needed)
```bash
python training/test_notebook_imports.py
```
This checks all imports and runs a quick env smoke test without loading model weights.