SPES-9B

SPES-9B is a 9B-parameter Mixture-of-Experts (MoE) language model pretrained using SPES (SParse Expert Synchronization), a memory-efficient decentralized pretraining paradigm.

The model and framework were introduced in the paper: Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm.

Authors

Jinrui Zhang, Chaodong Xiao, Aoqi Wu, Xindong Zhang, and Lei Zhang.

Model Details

Model name: SPES-9B
Model type: Causal Mixture-of-Experts (MoE) language model
Parameters: 9B (Upcycled from a dense Qwen3-1.7B checkpoint)
Framework: SPES
License: Apache-2.0

Description

SPES-9B was developed to address the memory constraints of GPU nodes in decentralized training environments. By training only a subset of experts per node, the SPES framework significantly reduces the memory footprint and eliminates the need for full-parameter transmission and high-speed cross-node interconnects. This allows the model to be trained effectively over standard internet connections while maintaining competitive performance compared to centralized baselines.

Project Links

GitHub Repository: zjr2000/SPES
Paper: Hugging Face Papers
Training Logs: Weights & Biases

Intended Use

This model is intended for research on:

Decentralized LLM pretraining paradigms.
Mixture-of-Experts (MoE) training and synchronization mechanisms.
Evaluation of pretrained language models trained under computational and bandwidth constraints.

Citation

If you use this model or the SPES framework in your research, please cite:

@article{zhang2026pretraining,
  title={Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm},
  author={Zhang, Jinrui and Xiao, Chaodong and Wu, Aoqi District and Zhang, Xindong and Zhang, Lei},
  journal={arXiv preprint arXiv:2602.11543},
  year={2026}
}

Acknowledgements

The SPES codebase is built upon the modeling and training infrastructure provided by OLMo (Allen Institute for AI) and utilizes MegaBlocks (Databricks) for efficient MoE operations.

Downloads last month: 13

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for zjr2000/SPES-9B

Quantizations

2 models

Collection including zjr2000/SPES-9B

SPES

Collection

Pretrained models for paper "Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm" • 3 items • Updated Mar 9

Paper for zjr2000/SPES-9B

Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm

Paper • 2602.11543 • Published Feb 12 • 6