File size: 1,866 Bytes
09ca659 40691bb 6375c03 40691bb c0f4154 40691bb 1df7e87 40691bb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | ---
license: apache-2.0
datasets:
- liuhaotian/LLaVA-Instruct-150K
language:
- en
base_model:
- microsoft/Phi-3.5-mini-instruct
pipeline_tag: text-generation
---
π CompeteSMoE-5.1B
CompeteSMoE-5.1B is a lightweight and integrated variant of the Mixture-of-Experts (MoE) architecture, built upon the Phi-3.5 Mini and SigLIP baselines. This version incorporates the latest CompeteSMoE algorithm enhancements. CompeteSMoE-5.1B demonstrates strong performance across a range of MoE routing strategies, including both standard and star-to-art routing methods. It achieves competitive results compared to recent MoE architectures, such as SharedE-V2 and SharedE-V3, which are inspired by DeepSeek. Despite the architectural innovations of these models especially their use of shared experts CompeteSMoE-5.1B consistently delivers superior or comparable results.
π Note: This version of CompeteSMoE-5.1B was trained on a small-scale dataset. π§ We're actively working on a stronger, more robust release β coming soon! π Stay tuned for updates. π‘
### Hardware Resources
| Stage | MoE Method | Hardware |
|-------------------|----------------------|-----------|
| Pre-Training | | 4xH100 |
| Pre-FineTuning | | 4xH100 |
| VIT | CompeteSMoE | 4xH100 |
---
### Citation Information
More details can be found in our paper.
If you use CompeteSMoE, please cite it using this BibTeX:
```
@article{Nguyen2025CompeteSMoES,
title={CompeteSMoE - Statistically Guaranteed Mixture of Experts Training via Competition},
author={Nam V. Nguyen and Huy Nguyen and Quang Pham and Van Nguyen and Savitha Ramasamy and Nhat Ho},
journal={ArXiv},
year={2025},
volume={abs/2505.13380},
url={https://api.semanticscholar.org/CorpusID:278769210}
}
```
|