Add library_name and improve model card

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +26 -17
README.md CHANGED
@@ -1,5 +1,9 @@
1
  ---
 
 
2
  license: apache-2.0
 
 
3
  tags:
4
  - moe
5
  - mixture-of-experts
@@ -8,46 +12,51 @@ tags:
8
  - distributed-training
9
  - decentralized-training
10
  - sparse-sync
11
- language:
12
- - en
13
- pipeline_tag: text-generation
14
  ---
15
 
16
  # SPES-7B
17
 
18
- SPES-7B is a pretrained language model released as part of paper:
19
 
20
- **Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm**
 
 
21
 
22
  ## Model Details
23
 
24
  - **Model name:** SPES-7B
25
- - **Model type:** Causal language model
26
  - **Parameters:** 7B
 
27
  - **Framework:** SPES
28
  - **License:** Apache-2.0
29
 
30
- ## Project Links
31
 
32
- - **GitHub:** https://github.com/zjr2000/SPES
33
- - **Paper:** https://huggingface.co/papers/2602.11543
34
 
35
- ## Intended Use
36
 
37
- This model is intended for:
 
 
38
 
39
- - research on decentralized LLM pretraining
40
- - research on MoE training and synchronization
41
- - experimentation and evaluation of pretrained language models
42
 
 
 
 
 
43
 
44
  ## Citation
45
 
46
- If you use this model, please cite the SPES paper.
47
 
48
  ```bibtex
49
- @article{zhang2026spes,
50
  title={Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm},
51
  author={Zhang, Jinrui and Xiao, Chaodong and Wu, Aoqi and Zhang, Xindong and Zhang, Lei},
 
52
  year={2026}
53
- }
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: apache-2.0
5
+ pipeline_tag: text-generation
6
+ library_name: transformers
7
  tags:
8
  - moe
9
  - mixture-of-experts
 
12
  - distributed-training
13
  - decentralized-training
14
  - sparse-sync
 
 
 
15
  ---
16
 
17
  # SPES-7B
18
 
19
+ SPES-7B is a 7B-parameter Mixture-of-Experts (MoE) Large Language Model pretrained using **SPES** (**SP**arse **E**xpert **S**ync), a memory-efficient decentralized training framework.
20
 
21
+ This model was introduced in the paper: [Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm](https://huggingface.co/papers/2602.11543).
22
+
23
+ **Authors:** Jinrui Zhang, Chaodong Xiao, Aoqi Wu, Xindong Zhang, Lei Zhang.
24
 
25
  ## Model Details
26
 
27
  - **Model name:** SPES-7B
28
+ - **Model type:** Causal language model (MoE)
29
  - **Parameters:** 7B
30
+ - **Architecture:** Olmoe
31
  - **Framework:** SPES
32
  - **License:** Apache-2.0
33
 
34
+ ## Introduction
35
 
36
+ SPES (SParse Expert Sync) is designed for pretraining MoE LLMs across geographically distributed GPU nodes. It addresses memory and bandwidth constraints by training only a subset of experts per node, significantly lowering the individual memory footprint and eliminating the need for full-parameter transmission. SPES-7B achieves competitive performance with centrally trained models under similar computational budgets.
 
37
 
38
+ ## Project Links
39
 
40
+ - **GitHub:** [zjr2000/SPES](https://github.com/zjr2000/SPES)
41
+ - **Paper (arXiv):** [2602.11543](https://arxiv.org/abs/2602.11543)
42
+ - **Model Collection:** [SPES Collection](https://huggingface.co/collections/zjr2000/spes)
43
 
44
+ ## Intended Use
 
 
45
 
46
+ This model is intended for research on:
47
+ - Decentralized LLM pretraining paradigms.
48
+ - Mixture-of-Experts (MoE) training and synchronization.
49
+ - Evaluation of pretrained language models trained under constrained bandwidth conditions.
50
 
51
  ## Citation
52
 
53
+ If you use this model, please cite the SPES paper:
54
 
55
  ```bibtex
56
+ @article{zhang2026pretraining,
57
  title={Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm},
58
  author={Zhang, Jinrui and Xiao, Chaodong and Wu, Aoqi and Zhang, Xindong and Zhang, Lei},
59
+ journal={arXiv preprint arXiv:2602.11543},
60
  year={2026}
61
+ }
62
+ ```