File size: 1,866 Bytes
09ca659
 
 
 
 
 
 
 
40691bb
 
 
 
 
 
6375c03
 
 
40691bb
 
 
 
 
 
c0f4154
40691bb
 
 
 
 
 
 
 
 
1df7e87
 
 
 
 
 
 
40691bb
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
license: apache-2.0
datasets:
- liuhaotian/LLaVA-Instruct-150K
language:
- en
base_model:
- microsoft/Phi-3.5-mini-instruct
pipeline_tag: text-generation
---

πŸŽ‰ CompeteSMoE-5.1B

CompeteSMoE-5.1B is a lightweight and integrated variant of the Mixture-of-Experts (MoE) architecture, built upon the Phi-3.5 Mini and SigLIP baselines. This version incorporates the latest CompeteSMoE algorithm enhancements. CompeteSMoE-5.1B demonstrates strong performance across a range of MoE routing strategies, including both standard and star-to-art routing methods. It achieves competitive results compared to recent MoE architectures, such as SharedE-V2 and SharedE-V3, which are inspired by DeepSeek. Despite the architectural innovations of these models especially their use of shared experts CompeteSMoE-5.1B consistently delivers superior or comparable results.

πŸ“ Note: This version of CompeteSMoE-5.1B was trained on a small-scale dataset. 🚧 We're actively working on a stronger, more robust release β€” coming soon! πŸš€ Stay tuned for updates. πŸ’‘

### Hardware Resources

| Stage             | MoE Method           | Hardware  |
|-------------------|----------------------|-----------|
| Pre-Training      |                      | 4xH100    |
| Pre-FineTuning    |                      | 4xH100    |
| VIT               | CompeteSMoE               | 4xH100    |

--- 

### Citation Information
More details can be found in our paper.

If you use CompeteSMoE, please cite it using this BibTeX:

```
@article{Nguyen2025CompeteSMoES,
  title={CompeteSMoE - Statistically Guaranteed Mixture of Experts Training via Competition},
  author={Nam V. Nguyen and Huy Nguyen and Quang Pham and Van Nguyen and Savitha Ramasamy and Nhat Ho},
  journal={ArXiv},
  year={2025},
  volume={abs/2505.13380},
  url={https://api.semanticscholar.org/CorpusID:278769210}
}
```