metadata
license: apache-2.0
tags:
- moe
- mixture-of-experts
- causal-lm
- olmoe
- distributed-training
- decentralized-training
- sparse-sync
language:
- en
pipeline_tag: text-generation
SPES-9B
SPES-9B is a pretrained language model released as part of paper:
Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm
Model Details
- Model name: SPES-9B
- Model type: Causal language model
- Parameters: 9B
- Framework: SPES
- License: Apache-2.0
Project Links
Intended Use
This model is intended for:
- research on decentralized LLM pretraining
- research on MoE training and synchronization
- experimentation and evaluation of pretrained language models
Citation
If you use this model, please cite the SPES paper.
@article{zhang2026spes,
title={Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm},
author={Zhang, Jinrui and Xiao, Chaodong and Wu, Aoqi and Zhang, Xindong and Zhang, Lei},
year={2026}
}