From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models Paper • 2605.20177 • Published May 19 • 10
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? Paper • 2603.24472 • Published Mar 25 • 57
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper • 2602.01058 • Published Feb 1 • 45