Rank-GRPO: Training LLM-based Conversational Recommender Systems with
Reinforcement Learning
Paper
• 2510.20150
• Published • 6
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model
Reasoning Ability in VibeThinker-1.5B
Paper
• 2511.06221
• Published • 133
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual
Mathematical Reasoning
Paper
• 2508.10433
• Published • 146
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper
• 2512.01374
• Published • 106
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Paper
• 2511.22570
• Published • 92
GARDO: Reinforcing Diffusion Models without Reward Hacking
Paper
• 2512.24138
• Published • 30
Controlled Self-Evolution for Algorithmic Code Optimization
Paper
• 2601.07348
• Published • 116
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper
• 2601.18778
• Published • 41
Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow
Paper
• 2601.14243
• Published • 23
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
Paper
• 2601.16973
• Published • 40
Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation
Paper
• 2601.11258
• Published • 10
RL's Razor: Why Online Reinforcement Learning Forgets Less
Paper
• 2509.04259
• Published • 7
Does Reinforcement Learning Really Incentivize Reasoning Capacity in
LLMs Beyond the Base Model?
Paper
• 2504.13837
• Published • 141