Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models Paper • 2602.12036 • Published 2 days ago • 86
Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners Paper • 2509.26226 • Published Sep 30, 2025 • 34