blog: expand GRPO mechanics, reward shaping narrative, and lessons learned 2fd537a Mohammed-Altaf Claude Sonnet 4.6 commited on Apr 26