Spaces:

Eshit
/

Wildfire-Containment-Simulator

Sleeping

Deploy to HF Space

363abf3 about 1 month ago

1.5 kB

	# GRPO Training — Wildfire Containment Simulator

	[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Abrodolph/Wildfire-Containment-Simulator/blob/main/training/grpo_colab.ipynb)

	## How to run

	Open `grpo_colab.ipynb` in Colab (T4 GPU runtime) and run cells in order:

	\| Section \| Cell(s) \| What it does \|
	\|---------\|---------\|--------------\|
	\| 1 — Setup \| 1–3 \| Installs deps, clones repo, loads Qwen-2.5-1.5B with LoRA \|
	\| 2 — Rollout \| 4–5 \| Defines `collect_rollout()` using env + serializer + parser \|
	\| 3 — Training \| 6–8 \| Builds GRPO dataset, trains 50 steps with curriculum \|
	\| 4 — Checkpointing \| 9 \| Saves final adapter, verifies reload \|
	\| 5 — Plot \| 10 \| Plots reward curve with tier-promotion markers \|

	Resume from checkpoint: The first cell of Section 3 auto-detects the latest `checkpoints/step_*` folder and loads it. Re-run from that cell to continue training.

	## Expected runtime on T4

	~45 minutes for 50 GRPO steps (depends on episode length per tier).

	## Downloading the trained adapter

	After training completes, run in a Colab cell:

	```python
	from google.colab import files
	import shutil
	shutil.make_archive('wildfire_adapter', 'zip', 'checkpoints/final')
	files.download('wildfire_adapter.zip')
	```

	## Local validation (no GPU needed)

	```bash
	python training/test_notebook_imports.py
	```

	This checks all imports and runs a quick env smoke test without loading model weights.