Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? Paper • 2603.24472 • Published 13 days ago • 49
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models Paper • 2602.12036 • Published Feb 12 • 93