zjr2000
/

SPES-9B

Text Generation

Mixture of Experts

mixture-of-experts

distributed-training

decentralized-training

Model card Files Files and versions

SPES-9B / README.md

zjr2000's picture

Update README.md

f0fda12 verified 25 days ago

|

history blame contribute delete

1.17 kB

	---
	license: apache-2.0
	tags:
	- moe
	- mixture-of-experts
	- causal-lm
	- olmoe
	- distributed-training
	- decentralized-training
	- sparse-sync
	language:
	- en
	pipeline_tag: text-generation
	---

	# SPES-9B

	SPES-9B is a pretrained language model released as part of paper:

	Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm

	## Model Details

	- Model name: SPES-9B
	- Model type: Causal language model
	- Parameters: 9B
	- Framework: SPES
	- License: Apache-2.0

	## Project Links

	- GitHub: https://github.com/zjr2000/SPES
	- Paper: https://huggingface.co/papers/2602.11543

	## Intended Use

	This model is intended for:

	- research on decentralized LLM pretraining
	- research on MoE training and synchronization
	- experimentation and evaluation of pretrained language models


	## Citation

	If you use this model, please cite the SPES paper.

	```bibtex
	@article{zhang2026spes,
	title={Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm},
	author={Zhang, Jinrui and Xiao, Chaodong and Wu, Aoqi and Zhang, Xindong and Zhang, Lei},
	year={2026}
	}