SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Paper • 2506.24119 • Published Jun 30, 2025 • 51 • 6
view article Article Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries +7 aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, nouamanetazi, lvwerra, sergiopaniego • Mar 10 • 148
Towards a Mechanistic Understanding of Propositional Logical Reasoning in Large Language Models Paper • 2601.04260 • Published Jan 7 • 1
SeeUPO: Sequence-Level Agentic-RL with Convergence Guarantees Paper • 2602.06554 • Published Feb 6 • 6