The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL Paper • 2606.19162 • Published 10 days ago • 20
InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning Paper • 2602.06960 • Published Feb 6 • 14
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 107
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment Paper • 2410.01679 • Published Oct 2, 2024 • 27