| --- |
| license: apache-2.0 |
| tags: |
| - moe |
| - mixture-of-experts |
| - causal-lm |
| - olmoe |
| - distributed-training |
| - decentralized-training |
| - sparse-sync |
| language: |
| - en |
| pipeline_tag: text-generation |
| --- |
| |
| # SPES-9B |
|
|
| SPES-9B is a pretrained language model released as part of paper: |
|
|
| **Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm** |
|
|
| ## Model Details |
|
|
| - **Model name:** SPES-9B |
| - **Model type:** Causal language model |
| - **Parameters:** 9B |
| - **Framework:** SPES |
| - **License:** Apache-2.0 |
|
|
| ## Project Links |
|
|
| - **GitHub:** https://github.com/zjr2000/SPES |
| - **Paper:** https://huggingface.co/papers/2602.11543 |
|
|
| ## Intended Use |
|
|
| This model is intended for: |
|
|
| - research on decentralized LLM pretraining |
| - research on MoE training and synchronization |
| - experimentation and evaluation of pretrained language models |
|
|
|
|
| ## Citation |
|
|
| If you use this model, please cite the SPES paper. |
|
|
| ```bibtex |
| @article{zhang2026spes, |
| title={Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm}, |
| author={Zhang, Jinrui and Xiao, Chaodong and Wu, Aoqi and Zhang, Xindong and Zhang, Lei}, |
| year={2026} |
| } |