To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models Paper • 2602.12566 • Published Feb 13 • 1 • 1
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published 26 days ago • 90 • 6
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation Paper • 2604.13010 • Published 26 days ago • 13 • 7
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published 26 days ago • 90 • 6
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation Paper • 2604.13010 • Published 26 days ago • 13 • 7