File size: 1,166 Bytes
f0fda12 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | ---
license: apache-2.0
tags:
- moe
- mixture-of-experts
- causal-lm
- olmoe
- distributed-training
- decentralized-training
- sparse-sync
language:
- en
pipeline_tag: text-generation
---
# SPES-9B
SPES-9B is a pretrained language model released as part of paper:
**Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm**
## Model Details
- **Model name:** SPES-9B
- **Model type:** Causal language model
- **Parameters:** 9B
- **Framework:** SPES
- **License:** Apache-2.0
## Project Links
- **GitHub:** https://github.com/zjr2000/SPES
- **Paper:** https://huggingface.co/papers/2602.11543
## Intended Use
This model is intended for:
- research on decentralized LLM pretraining
- research on MoE training and synchronization
- experimentation and evaluation of pretrained language models
## Citation
If you use this model, please cite the SPES paper.
```bibtex
@article{zhang2026spes,
title={Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm},
author={Zhang, Jinrui and Xiao, Chaodong and Wu, Aoqi and Zhang, Xindong and Zhang, Lei},
year={2026}
} |