MOPD: Multi-Teacher On-Policy Distillation for Capability Integration in LLM Post-Training Paper • 2606.30406 • Published 3 days ago • 4
HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing Paper • 2602.03560 • Published Feb 3 • 49
GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation Paper • 2512.17495 • Published Dec 19, 2025 • 20
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published Dec 1, 2025 • 107
Lansechen/deepseek-v2-lite-16b-chat-R1-Distill-bs17k-batch32 Text Generation • 16B • Updated Feb 22, 2025 • 6 • 1
Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers Paper • 2510.11370 • Published Oct 13, 2025 • 4
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper • 2505.07608 • Published May 12, 2025 • 86
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper • 2505.07608 • Published May 12, 2025 • 86