MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning Paper • 2603.16929 • Published 18 days ago • 13
On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models Paper • 2603.27481 • Published 3 days ago • 30
HyperAlign: Hypernetwork for Efficient Test-Time Alignment of Diffusion Models Paper • 2601.15968 • Published Jan 22 • 9
On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models Paper • 2603.27481 • Published 3 days ago • 30
Running on CPU Upgrade Featured 3.07k The Smol Training Playbook 📚 3.07k The secrets to building world-class LLMs
The Art of Scaling Reinforcement Learning Compute for LLMs Paper • 2510.13786 • Published Oct 15, 2025 • 33
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems Paper • 2510.02263 • Published Oct 2, 2025 • 9
MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks Paper • 2509.14638 • Published Sep 18, 2025 • 14
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation Paper • 2509.15194 • Published Sep 18, 2025 • 33
MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks Paper • 2509.14638 • Published Sep 18, 2025 • 14
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing Paper • 2509.08721 • Published Sep 10, 2025 • 664
MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs Paper • 2508.05257 • Published Aug 7, 2025 • 13
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts Paper • 2508.07785 • Published Aug 11, 2025 • 29