SPES-9B / README.md

zjr2000

Update README.md

f0fda12 verified 9 days ago

preview code

raw

history blame contribute delete

1.17 kB

metadata

license: apache-2.0
tags:
  - moe
  - mixture-of-experts
  - causal-lm
  - olmoe
  - distributed-training
  - decentralized-training
  - sparse-sync
language:
  - en
pipeline_tag: text-generation

SPES-9B

SPES-9B is a pretrained language model released as part of paper:

Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm

Model Details

Model name: SPES-9B
Model type: Causal language model
Parameters: 9B
Framework: SPES
License: Apache-2.0

Project Links

GitHub: https://github.com/zjr2000/SPES
Paper: https://huggingface.co/papers/2602.11543

Intended Use

This model is intended for:

research on decentralized LLM pretraining
research on MoE training and synchronization
experimentation and evaluation of pretrained language models

Citation

If you use this model, please cite the SPES paper.

@article{zhang2026spes,
  title={Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm},
  author={Zhang, Jinrui and Xiao, Chaodong and Wu, Aoqi and Zhang, Xindong and Zhang, Lei},
  year={2026}
}