SPES-2B

SPES-2B is a 2B-parameter Mixture-of-Experts (MoE) pretrained language model introduced in the paper:

Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm

Model Details

  • Model name: SPES-2B
  • Model type: Causal language model (MoE)
  • Architecture: OLMoE
  • Parameters: 2B
  • Framework: SPES (SParse Expert Synchronization)
  • License: Apache-2.0

Description

SPES-2B was trained using SPES, a memory-efficient decentralized framework. Unlike traditional centralized training that requires high-bandwidth interconnects, SPES enables pretraining across geographically distributed GPU nodes by training only a subset of experts per node and periodically synchronizing them. This model was trained using 16 standalone 48GB GPUs over standard internet connections.

Project Links

Intended Use

This model is intended for:

  • Research on decentralized LLM pretraining.
  • Research on Mixture-of-Experts (MoE) training and synchronization.
  • Experimentation and evaluation of pretrained language models.

Citation

If you use this model, please cite the SPES paper:

@article{zhang2026pretraining,
  title={Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm},
  author={Zhang, Jinrui icon and Xiao, Chaodong and Wu, Aoqi and Zhang, Xindong and Zhang, Lei},
  journal={arXiv preprint arXiv:2602.11543},
  year={2026}
}
Downloads last month
29
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zjr2000/SPES-2B

Quantizations
2 models

Collection including zjr2000/SPES-2B

Paper for zjr2000/SPES-2B