SPES-9B
SPES-9B is a 9B-parameter Mixture-of-Experts (MoE) language model pretrained using SPES (SParse Expert Synchronization), a memory-efficient decentralized pretraining paradigm.
The model and framework were introduced in the paper: Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm.
Authors
Jinrui Zhang, Chaodong Xiao, Aoqi Wu, Xindong Zhang, and Lei Zhang.
Model Details
- Model name: SPES-9B
- Model type: Causal Mixture-of-Experts (MoE) language model
- Parameters: 9B (Upcycled from a dense Qwen3-1.7B checkpoint)
- Framework: SPES
- License: Apache-2.0
Description
SPES-9B was developed to address the memory constraints of GPU nodes in decentralized training environments. By training only a subset of experts per node, the SPES framework significantly reduces the memory footprint and eliminates the need for full-parameter transmission and high-speed cross-node interconnects. This allows the model to be trained effectively over standard internet connections while maintaining competitive performance compared to centralized baselines.
Project Links
- GitHub Repository: zjr2000/SPES
- Paper: Hugging Face Papers
- Training Logs: Weights & Biases
Intended Use
This model is intended for research on:
- Decentralized LLM pretraining paradigms.
- Mixture-of-Experts (MoE) training and synchronization mechanisms.
- Evaluation of pretrained language models trained under computational and bandwidth constraints.
Citation
If you use this model or the SPES framework in your research, please cite:
@article{zhang2026pretraining,
title={Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm},
author={Zhang, Jinrui and Xiao, Chaodong and Wu, Aoqi District and Zhang, Xindong and Zhang, Lei},
journal={arXiv preprint arXiv:2602.11543},
year={2026}
}
Acknowledgements
The SPES codebase is built upon the modeling and training infrastructure provided by OLMo (Allen Institute for AI) and utilizes MegaBlocks (Databricks) for efficient MoE operations.
- Downloads last month
- 13