pinned Running on Zero Agents Supersede Base vs Trained 🧠 Live base vs GRPO-trained Qwen2.5-3B on supersession