Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs Paper • 2505.18573 • Published May 24, 2025
Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains? Paper • 2510.11184 • Published Oct 13, 2025
Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text Paper • 2601.10355 • Published 11 days ago • 38
Rethinking Expert Trajectory Utilization in LLM Post-training Paper • 2512.11470 • Published Dec 12, 2025 • 8 • 4
State over Tokens: Characterizing the Role of Reasoning Tokens Paper • 2512.12777 • Published Dec 14, 2025 • 5 • 6
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9, 2025 • 133 • 12
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation Paper • 2511.06307 • Published Nov 9, 2025 • 52 • 5
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation Paper • 2511.06307 • Published Nov 9, 2025 • 52
AMO-Bench: Large Language Models Still Struggle in High School Math Competitions Paper • 2510.26768 • Published Oct 30, 2025 • 34
Learning from the Best, Differently: A Diversity-Driven Rethinking on Data Selection Paper • 2510.18909 • Published Oct 21, 2025 • 5 • 3
First Try Matters: Revisiting the Role of Reflection in Reasoning Models Paper • 2510.08308 • Published Oct 9, 2025 • 24 • 4
First Try Matters: Revisiting the Role of Reflection in Reasoning Models Paper • 2510.08308 • Published Oct 9, 2025 • 24 • 4
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications Paper • 2509.26490 • Published Sep 30, 2025 • 20