Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?
Paper
•
2506.12119
•
Published
This is the official Hugging Face repository for the paper: "Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resource?".
Our extensive research (250+ MoE trainings at 2B & 7B) provides strong evidence: MoE architectures with optimized backbones and activation rates consistently deliver superior performance over dense counterparts on both upstream & downstream tasks, even with identical resources.
The checkpoints are released in this repository.
More details:
Paper: https://www.arxiv.org/abs/2506.12119
Github page: https://kamanphoebe.github.io/moe-surpass-dense.github.io/
@misc{li2025mixtureofexpertssurpassdensellms,
title = {Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?},
author = {Houyi Li and Ka Man Lo and Ziqi Wang and Zili Wang and Wenzhen Zheng and Shuigeng Zhou and Xiangyu Zhang and Daxin Jiang},
year = {2025},
eprint = {2506.12119},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2506.12119},
}