SPES-7B

SPES-7B is a 7B-parameter Mixture-of-Experts (MoE) Large Language Model pretrained using SPES (SParse Expert Sync), a memory-efficient decentralized training framework.

This model was introduced in the paper: Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm.

Authors: Jinrui Zhang, Chaodong Xiao, Aoqi Wu, Xindong Zhang, Lei Zhang.

Model Details

  • Model name: SPES-7B
  • Model type: Causal language model (MoE)
  • Parameters: 7B
  • Architecture: Olmoe
  • Framework: SPES
  • License: Apache-2.0

Introduction

SPES (SParse Expert Sync) is designed for pretraining MoE LLMs across geographically distributed GPU nodes. It addresses memory and bandwidth constraints by training only a subset of experts per node, significantly lowering the individual memory footprint and eliminating the need for full-parameter transmission. SPES-7B achieves competitive performance with centrally trained models under similar computational budgets.

Project Links

Intended Use

This model is intended for research on:

  • Decentralized LLM pretraining paradigms.
  • Mixture-of-Experts (MoE) training and synchronization.
  • Evaluation of pretrained language models trained under constrained bandwidth conditions.

Citation

If you use this model, please cite the SPES paper:

@article{zhang2026pretraining,
  title={Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm},
  author={Zhang, Jinrui and Xiao, Chaodong and Wu, Aoqi and Zhang, Xindong and Zhang, Lei},
  journal={arXiv preprint arXiv:2602.11543},
  year={2026}
}
Downloads last month
21
Safetensors
Model size
7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zjr2000/SPES-7B

Quantizations
2 models

Collection including zjr2000/SPES-7B

Paper for zjr2000/SPES-7B