Add library_name and improve model card metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +24 -18
README.md CHANGED
@@ -1,5 +1,9 @@
1
  ---
 
 
2
  license: apache-2.0
 
 
3
  tags:
4
  - moe
5
  - mixture-of-experts
@@ -8,46 +12,48 @@ tags:
8
  - distributed-training
9
  - decentralized-training
10
  - sparse-sync
11
- language:
12
- - en
13
- pipeline_tag: text-generation
14
  ---
15
 
16
  # SPES-2B
17
 
18
- SPES-2B is a pretrained language model released as part of paper:
19
 
20
- **Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm**
21
 
22
  ## Model Details
23
 
24
  - **Model name:** SPES-2B
25
- - **Model type:** Causal language model
 
26
  - **Parameters:** 2B
27
- - **Framework:** SPES
28
  - **License:** Apache-2.0
29
 
 
 
 
 
30
  ## Project Links
31
 
32
- - **GitHub:** https://github.com/zjr2000/SPES
33
- - **Paper:** https://huggingface.co/papers/2602.11543
34
 
35
  ## Intended Use
36
 
37
  This model is intended for:
38
-
39
- - research on decentralized LLM pretraining
40
- - research on MoE training and synchronization
41
- - experimentation and evaluation of pretrained language models
42
-
43
 
44
  ## Citation
45
 
46
- If you use this model, please cite the SPES paper.
47
 
48
  ```bibtex
49
- @article{zhang2026spes,
50
  title={Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm},
51
- author={Zhang, Jinrui and Xiao, Chaodong and Wu, Aoqi and Zhang, Xindong and Zhang, Lei},
 
52
  year={2026}
53
- }
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: apache-2.0
5
+ pipeline_tag: text-generation
6
+ library_name: transformers
7
  tags:
8
  - moe
9
  - mixture-of-experts
 
12
  - distributed-training
13
  - decentralized-training
14
  - sparse-sync
 
 
 
15
  ---
16
 
17
  # SPES-2B
18
 
19
+ SPES-2B is a 2B-parameter Mixture-of-Experts (MoE) pretrained language model introduced in the paper:
20
 
21
+ **[Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm](https://huggingface.co/papers/2602.11543)**
22
 
23
  ## Model Details
24
 
25
  - **Model name:** SPES-2B
26
+ - **Model type:** Causal language model (MoE)
27
+ - **Architecture:** OLMoE
28
  - **Parameters:** 2B
29
+ - **Framework:** SPES (SParse Expert Synchronization)
30
  - **License:** Apache-2.0
31
 
32
+ ## Description
33
+
34
+ SPES-2B was trained using **SPES**, a memory-efficient decentralized framework. Unlike traditional centralized training that requires high-bandwidth interconnects, SPES enables pretraining across geographically distributed GPU nodes by training only a subset of experts per node and periodically synchronizing them. This model was trained using 16 standalone 48GB GPUs over standard internet connections.
35
+
36
  ## Project Links
37
 
38
+ - **GitHub:** [https://github.com/zjr2000/SPES](https://github.com/zjr2000/SPES)
39
+ - **Paper:** [https://huggingface.co/papers/2602.11543](https://huggingface.co/papers/2602.11543)
40
 
41
  ## Intended Use
42
 
43
  This model is intended for:
44
+ - Research on decentralized LLM pretraining.
45
+ - Research on Mixture-of-Experts (MoE) training and synchronization.
46
+ - Experimentation and evaluation of pretrained language models.
 
 
47
 
48
  ## Citation
49
 
50
+ If you use this model, please cite the SPES paper:
51
 
52
  ```bibtex
53
+ @article{zhang2026pretraining,
54
  title={Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm},
55
+ author={Zhang, Jinrui icon and Xiao, Chaodong and Wu, Aoqi and Zhang, Xindong and Zhang, Lei},
56
+ journal={arXiv preprint arXiv:2602.11543},
57
  year={2026}
58
+ }
59
+ ```