--- license: apache-2.0 tags: - moe - mixture-of-experts - causal-lm - olmoe - distributed-training - decentralized-training - sparse-sync language: - en pipeline_tag: text-generation --- # SPES-9B SPES-9B is a pretrained language model released as part of paper: **Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm** ## Model Details - **Model name:** SPES-9B - **Model type:** Causal language model - **Parameters:** 9B - **Framework:** SPES - **License:** Apache-2.0 ## Project Links - **GitHub:** https://github.com/zjr2000/SPES - **Paper:** https://huggingface.co/papers/2602.11543 ## Intended Use This model is intended for: - research on decentralized LLM pretraining - research on MoE training and synchronization - experimentation and evaluation of pretrained language models ## Citation If you use this model, please cite the SPES paper. ```bibtex @article{zhang2026spes, title={Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm}, author={Zhang, Jinrui and Xiao, Chaodong and Wu, Aoqi and Zhang, Xindong and Zhang, Lei}, year={2026} }