zjr2000
/

SPES-9B

Text Generation

Mixture of Experts

mixture-of-experts

distributed-training

decentralized-training

Model card Files Files and versions

zjr2000 commited on Mar 10

Commit

f0fda12

·

verified ·

1 Parent(s): b5d59cd

Update README.md

Files changed (1) hide show

README.md +53 -3

README.md CHANGED Viewed

@@ -1,3 +1,53 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+tags:
+- moe
+- mixture-of-experts
+- causal-lm
+- olmoe
+- distributed-training
+- decentralized-training
+- sparse-sync
+language:
+- en
+pipeline_tag: text-generation
+---
+# SPES-9B
+SPES-9B is a pretrained language model released as part of paper:
+**Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm**
+## Model Details
+- **Model name:** SPES-9B
+- **Model type:** Causal language model
+- **Parameters:** 9B
+- **Framework:** SPES
+- **License:** Apache-2.0
+## Project Links
+- **GitHub:** https://github.com/zjr2000/SPES
+- **Paper:** https://huggingface.co/papers/2602.11543
+## Intended Use
+This model is intended for:
+- research on decentralized LLM pretraining
+- research on MoE training and synchronization
+- experimentation and evaluation of pretrained language models
+## Citation
+If you use this model, please cite the SPES paper.
+```bibtex
+@article{zhang2026spes,
+  title={Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm},
+  author={Zhang, Jinrui and Xiao, Chaodong and Wu, Aoqi and Zhang, Xindong and Zhang, Lei},
+  year={2026}
+}