zjr2000 commited on
Commit
f0fda12
·
verified ·
1 Parent(s): b5d59cd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -3
README.md CHANGED
@@ -1,3 +1,53 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - moe
5
+ - mixture-of-experts
6
+ - causal-lm
7
+ - olmoe
8
+ - distributed-training
9
+ - decentralized-training
10
+ - sparse-sync
11
+ language:
12
+ - en
13
+ pipeline_tag: text-generation
14
+ ---
15
+
16
+ # SPES-9B
17
+
18
+ SPES-9B is a pretrained language model released as part of paper:
19
+
20
+ **Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm**
21
+
22
+ ## Model Details
23
+
24
+ - **Model name:** SPES-9B
25
+ - **Model type:** Causal language model
26
+ - **Parameters:** 9B
27
+ - **Framework:** SPES
28
+ - **License:** Apache-2.0
29
+
30
+ ## Project Links
31
+
32
+ - **GitHub:** https://github.com/zjr2000/SPES
33
+ - **Paper:** https://huggingface.co/papers/2602.11543
34
+
35
+ ## Intended Use
36
+
37
+ This model is intended for:
38
+
39
+ - research on decentralized LLM pretraining
40
+ - research on MoE training and synchronization
41
+ - experimentation and evaluation of pretrained language models
42
+
43
+
44
+ ## Citation
45
+
46
+ If you use this model, please cite the SPES paper.
47
+
48
+ ```bibtex
49
+ @article{zhang2026spes,
50
+ title={Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm},
51
+ author={Zhang, Jinrui and Xiao, Chaodong and Wu, Aoqi and Zhang, Xindong and Zhang, Lei},
52
+ year={2026}
53
+ }