Eshit's picture
Deploy to HF Space
363abf3

GRPO Training β€” Wildfire Containment Simulator

Open In Colab

How to run

Open grpo_colab.ipynb in Colab (T4 GPU runtime) and run cells in order:

Section Cell(s) What it does
1 β€” Setup 1–3 Installs deps, clones repo, loads Qwen-2.5-1.5B with LoRA
2 β€” Rollout 4–5 Defines collect_rollout() using env + serializer + parser
3 β€” Training 6–8 Builds GRPO dataset, trains 50 steps with curriculum
4 β€” Checkpointing 9 Saves final adapter, verifies reload
5 β€” Plot 10 Plots reward curve with tier-promotion markers

Resume from checkpoint: The first cell of Section 3 auto-detects the latest checkpoints/step_* folder and loads it. Re-run from that cell to continue training.

Expected runtime on T4

~45 minutes for 50 GRPO steps (depends on episode length per tier).

Downloading the trained adapter

After training completes, run in a Colab cell:

from google.colab import files
import shutil
shutil.make_archive('wildfire_adapter', 'zip', 'checkpoints/final')
files.download('wildfire_adapter.zip')

Local validation (no GPU needed)

python training/test_notebook_imports.py

This checks all imports and runs a quick env smoke test without loading model weights.