Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning Paper • 2510.20150 • Published Oct 23, 2025 • 5
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9, 2025 • 133
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning Paper • 2508.10433 • Published Aug 14, 2025 • 144
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published Dec 1, 2025 • 102
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning Paper • 2511.22570 • Published Nov 27, 2025 • 90
GARDO: Reinforcing Diffusion Models without Reward Hacking Paper • 2512.24138 • Published about 1 month ago • 29
Controlled Self-Evolution for Algorithmic Code Optimization Paper • 2601.07348 • Published 17 days ago • 113
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability Paper • 2601.18778 • Published 3 days ago • 31
Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow Paper • 2601.14243 • Published 9 days ago • 18
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents Paper • 2601.16973 • Published 6 days ago • 35
Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation Paper • 2601.11258 • Published 13 days ago • 5
RL's Razor: Why Online Reinforcement Learning Forgets Less Paper • 2509.04259 • Published Sep 4, 2025 • 6
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18, 2025 • 139