To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models Paper • 2602.12566 • Published Feb 13 • 1 • 1
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 113 • 6
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation Paper • 2604.13010 • Published Apr 14 • 19 • 7
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 113 • 6
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation Paper • 2604.13010 • Published Apr 14 • 19 • 7