Paused RL SurviveCity v2 — GRPO training 🧟 5-agent zombie-survival GRPO training (Qwen 2.5-3B + LoRA)
Paused RL SurviveCity v2 — GRPO training 🧟 5-agent zombie-survival GRPO training (Qwen 2.5-3B + LoRA)