Model Card for DMoE-Bloom-560M-16Experts-128Langs

The model is extended from Bloom-560m and continues to train 17.7B tokens on the 128 languages text of MADLAD-400.

Code

The code used to train this model refers to the github repo.

Citation

@inproceedings{li-etal-2025-DMoE,
  author    = {Chong Li and
               Yingzhuo Deng and
               Jiajun Zhang and
               Chengqing Zong},
  title = "Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model",
  booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
  year = "2025",
  address = "Vienna, Austria",
  publisher = "Association for Computational Linguistics",
}
Downloads last month
1
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support