Model Card for DMoE-Bloom-560M-16Experts-128Langs
The model is extended from Bloom-560m and continues to train 17.7B tokens on the 128 languages text of MADLAD-400.
Code
The code used to train this model refers to the github repo.
Citation
@inproceedings{li-etal-2025-DMoE,
author = {Chong Li and
Yingzhuo Deng and
Jiajun Zhang and
Chengqing Zong},
title = "Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
}
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support