SPES-9B

SPES-9B is a 9B-parameter Mixture-of-Experts (MoE) language model pretrained using SPES (SParse Expert Synchronization), a memory-efficient decentralized pretraining paradigm.

The model and framework were introduced in the paper: Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm.

Authors

Jinrui Zhang, Chaodong Xiao, Aoqi Wu, Xindong Zhang, and Lei Zhang.

Model Details

  • Model name: SPES-9B
  • Model type: Causal Mixture-of-Experts (MoE) language model
  • Parameters: 9B (Upcycled from a dense Qwen3-1.7B checkpoint)
  • Framework: SPES
  • License: Apache-2.0

Description

SPES-9B was developed to address the memory constraints of GPU nodes in decentralized training environments. By training only a subset of experts per node, the SPES framework significantly reduces the memory footprint and eliminates the need for full-parameter transmission and high-speed cross-node interconnects. This allows the model to be trained effectively over standard internet connections while maintaining competitive performance compared to centralized baselines.

Project Links

Intended Use

This model is intended for research on:

  • Decentralized LLM pretraining paradigms.
  • Mixture-of-Experts (MoE) training and synchronization mechanisms.
  • Evaluation of pretrained language models trained under computational and bandwidth constraints.

Citation

If you use this model or the SPES framework in your research, please cite:

@article{zhang2026pretraining,
  title={Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm},
  author={Zhang, Jinrui and Xiao, Chaodong and Wu, Aoqi District and Zhang, Xindong and Zhang, Lei},
  journal={arXiv preprint arXiv:2602.11543},
  year={2026}
}

Acknowledgements

The SPES codebase is built upon the modeling and training infrastructure provided by OLMo (Allen Institute for AI) and utilizes MegaBlocks (Databricks) for efficient MoE operations.

Downloads last month
13
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zjr2000/SPES-9B

Quantizations
2 models

Collection including zjr2000/SPES-9B

Paper for zjr2000/SPES-9B