lujangusface commited on
Commit
871ad71
·
verified ·
1 Parent(s): f600f87

Fix model card: correct base_model to zai-org/GLM-4.7-FP8, license to MIT, add blog link

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -1,9 +1,9 @@
1
  ---
2
  library_name: transformers
3
- license: apache-2.0
4
  language:
5
  - en
6
- base_model: THUDM/GLM-4.7
7
  pipeline_tag: text-generation
8
  tags:
9
  - eagle3
@@ -19,11 +19,11 @@ tags:
19
 
20
  # EAGLE3 Draft Head — GLM-4.7-FP8
21
 
22
- A lightweight EAGLE3 draft head for [GLM-4.7](https://huggingface.co/THUDM/GLM-4.7) (~218B MoE, 160 experts, sigmoid top-8 routing, ~40B active parameters per token). Trained with [SpecForge](https://github.com/tails-mpt/SpecForge) on 8x H200 GPUs using the [EAGLE-3](https://arxiv.org/abs/2503.01840) training-time test objective.
23
 
24
  GLM-4.7 uses sigmoid top-8 routing — activating 8 out of 160 experts per token rather than the typical 1-2 in most MoE models. This preserves high representational capacity at the cost of increased compute, making speculative decoding especially valuable: the draft head is tiny relative to the 218B target.
25
 
26
- **Blog post**: [TODO: link after publication]
27
 
28
  ## Usage
29
 
@@ -37,7 +37,7 @@ Requires our [SGLang fork](https://github.com/tails-mpt/sglang) for GLM-4.7 Eagl
37
  pip install 'git+https://github.com/tails-mpt/sglang.git#subdirectory=python'
38
 
39
  python -m sglang.launch_server \
40
- --model-path THUDM/GLM-4.7 \
41
  --speculative-algorithm EAGLE3 \
42
  --speculative-draft-model-path thoughtworks/GLM-4.7-FP8-Eagle3 \
43
  --speculative-num-steps 3 \
@@ -52,7 +52,7 @@ python -m sglang.launch_server \
52
 
53
  ```bash
54
  python -m sglang.launch_server \
55
- --model-path THUDM/GLM-4.7 \
56
  --speculative-algorithm EAGLE3 \
57
  --speculative-draft-model-path thoughtworks/GLM-4.7-FP8-Eagle3 \
58
  --speculative-num-steps 3 \
@@ -153,7 +153,7 @@ The final fine-tuning stage uses training data where the assistant responses wer
153
 
154
  ## License
155
 
156
- This draft head is released under Apache 2.0. Please verify the [GLM-4.7 license](https://huggingface.co/THUDM/GLM-4.7) for the target model.
157
 
158
  ## Citation
159
 
 
1
  ---
2
  library_name: transformers
3
+ license: mit
4
  language:
5
  - en
6
+ base_model: zai-org/GLM-4.7-FP8
7
  pipeline_tag: text-generation
8
  tags:
9
  - eagle3
 
19
 
20
  # EAGLE3 Draft Head — GLM-4.7-FP8
21
 
22
+ A lightweight EAGLE3 draft head for [GLM-4.7-FP8](https://huggingface.co/zai-org/GLM-4.7-FP8) (~218B MoE, 160 experts, sigmoid top-8 routing, ~40B active parameters per token). Trained with [SpecForge](https://github.com/tails-mpt/SpecForge) on 8x H200 GPUs using the [EAGLE-3](https://arxiv.org/abs/2503.01840) training-time test objective.
23
 
24
  GLM-4.7 uses sigmoid top-8 routing — activating 8 out of 160 experts per token rather than the typical 1-2 in most MoE models. This preserves high representational capacity at the cost of increased compute, making speculative decoding especially valuable: the draft head is tiny relative to the 218B target.
25
 
26
+ **Blog post**: [1.7x Faster on a 218B Model: EAGLE3 Speculative Decoding for GLM-4.7](https://huggingface.co/blog/lujangusface/tw-eagle3-glm47-fp8)
27
 
28
  ## Usage
29
 
 
37
  pip install 'git+https://github.com/tails-mpt/sglang.git#subdirectory=python'
38
 
39
  python -m sglang.launch_server \
40
+ --model-path zai-org/GLM-4.7-FP8 \
41
  --speculative-algorithm EAGLE3 \
42
  --speculative-draft-model-path thoughtworks/GLM-4.7-FP8-Eagle3 \
43
  --speculative-num-steps 3 \
 
52
 
53
  ```bash
54
  python -m sglang.launch_server \
55
+ --model-path zai-org/GLM-4.7-FP8 \
56
  --speculative-algorithm EAGLE3 \
57
  --speculative-draft-model-path thoughtworks/GLM-4.7-FP8-Eagle3 \
58
  --speculative-num-steps 3 \
 
153
 
154
  ## License
155
 
156
+ This draft head is released under the [MIT License](https://opensource.org/licenses/MIT), matching the [GLM-4.7-FP8 license](https://huggingface.co/zai-org/GLM-4.7-FP8).
157
 
158
  ## Citation
159