Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding Paper • 2605.02290 • Published 16 days ago • 35
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex Paper • 2605.06139 • Published 13 days ago • 65
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9, 2025 • 134