Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? Paper • 2603.24472 • Published Mar 25 • 55
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper • 2602.01058 • Published Feb 1 • 44