Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction Paper • 2605.12070 • Published 2 days ago • 13
Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective Paper • 2509.22613 • Published Sep 26, 2025 • 10