MergeMix: Optimizing Mid-Training Data Mixtures via Learnable Model Merging Paper • 2601.17858 • Published 16 days ago • 1
Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation Paper • 2510.22115 • Published Oct 25, 2025 • 84
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training Paper • 2507.17634 • Published Jul 23, 2025 • 2
Ring Collection Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI, derived from Ling. • 5 items • Updated 13 days ago • 21
Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models Paper • 2507.17702 • Published Jul 23, 2025 • 6