Learning from the Self-future: On-policy Self-distillation for dLLMs Paper • 2606.18195 • Published 2 days ago • 26
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models Paper • 2505.18536 • Published May 24, 2025 • 18