SPES
Collection
Pretrained models for paper "Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm" • 3 items • Updated
SPES-2B is a 2B-parameter Mixture-of-Experts (MoE) pretrained language model introduced in the paper:
Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm
SPES-2B was trained using SPES, a memory-efficient decentralized framework. Unlike traditional centralized training that requires high-bandwidth interconnects, SPES enables pretraining across geographically distributed GPU nodes by training only a subset of experts per node and periodically synchronizing them. This model was trained using 16 standalone 48GB GPUs over standard internet connections.
This model is intended for:
If you use this model, please cite the SPES paper:
@article{zhang2026pretraining,
title={Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm},
author={Zhang, Jinrui icon and Xiao, Chaodong and Wu, Aoqi and Zhang, Xindong and Zhang, Lei},
journal={arXiv preprint arXiv:2602.11543},
year={2026}
}