Fix model card: correct base_model to zai-org/GLM-4.7-FP8, license to MIT, add blog link
Browse files
README.md
CHANGED
|
@@ -1,9 +1,9 @@
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
-
license:
|
| 4 |
language:
|
| 5 |
- en
|
| 6 |
-
base_model:
|
| 7 |
pipeline_tag: text-generation
|
| 8 |
tags:
|
| 9 |
- eagle3
|
|
@@ -19,11 +19,11 @@ tags:
|
|
| 19 |
|
| 20 |
# EAGLE3 Draft Head — GLM-4.7-FP8
|
| 21 |
|
| 22 |
-
A lightweight EAGLE3 draft head for [GLM-4.7](https://huggingface.co/
|
| 23 |
|
| 24 |
GLM-4.7 uses sigmoid top-8 routing — activating 8 out of 160 experts per token rather than the typical 1-2 in most MoE models. This preserves high representational capacity at the cost of increased compute, making speculative decoding especially valuable: the draft head is tiny relative to the 218B target.
|
| 25 |
|
| 26 |
-
**Blog post**: [
|
| 27 |
|
| 28 |
## Usage
|
| 29 |
|
|
@@ -37,7 +37,7 @@ Requires our [SGLang fork](https://github.com/tails-mpt/sglang) for GLM-4.7 Eagl
|
|
| 37 |
pip install 'git+https://github.com/tails-mpt/sglang.git#subdirectory=python'
|
| 38 |
|
| 39 |
python -m sglang.launch_server \
|
| 40 |
-
--model-path
|
| 41 |
--speculative-algorithm EAGLE3 \
|
| 42 |
--speculative-draft-model-path thoughtworks/GLM-4.7-FP8-Eagle3 \
|
| 43 |
--speculative-num-steps 3 \
|
|
@@ -52,7 +52,7 @@ python -m sglang.launch_server \
|
|
| 52 |
|
| 53 |
```bash
|
| 54 |
python -m sglang.launch_server \
|
| 55 |
-
--model-path
|
| 56 |
--speculative-algorithm EAGLE3 \
|
| 57 |
--speculative-draft-model-path thoughtworks/GLM-4.7-FP8-Eagle3 \
|
| 58 |
--speculative-num-steps 3 \
|
|
@@ -153,7 +153,7 @@ The final fine-tuning stage uses training data where the assistant responses wer
|
|
| 153 |
|
| 154 |
## License
|
| 155 |
|
| 156 |
-
This draft head is released under
|
| 157 |
|
| 158 |
## Citation
|
| 159 |
|
|
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
+
license: mit
|
| 4 |
language:
|
| 5 |
- en
|
| 6 |
+
base_model: zai-org/GLM-4.7-FP8
|
| 7 |
pipeline_tag: text-generation
|
| 8 |
tags:
|
| 9 |
- eagle3
|
|
|
|
| 19 |
|
| 20 |
# EAGLE3 Draft Head — GLM-4.7-FP8
|
| 21 |
|
| 22 |
+
A lightweight EAGLE3 draft head for [GLM-4.7-FP8](https://huggingface.co/zai-org/GLM-4.7-FP8) (~218B MoE, 160 experts, sigmoid top-8 routing, ~40B active parameters per token). Trained with [SpecForge](https://github.com/tails-mpt/SpecForge) on 8x H200 GPUs using the [EAGLE-3](https://arxiv.org/abs/2503.01840) training-time test objective.
|
| 23 |
|
| 24 |
GLM-4.7 uses sigmoid top-8 routing — activating 8 out of 160 experts per token rather than the typical 1-2 in most MoE models. This preserves high representational capacity at the cost of increased compute, making speculative decoding especially valuable: the draft head is tiny relative to the 218B target.
|
| 25 |
|
| 26 |
+
**Blog post**: [1.7x Faster on a 218B Model: EAGLE3 Speculative Decoding for GLM-4.7](https://huggingface.co/blog/lujangusface/tw-eagle3-glm47-fp8)
|
| 27 |
|
| 28 |
## Usage
|
| 29 |
|
|
|
|
| 37 |
pip install 'git+https://github.com/tails-mpt/sglang.git#subdirectory=python'
|
| 38 |
|
| 39 |
python -m sglang.launch_server \
|
| 40 |
+
--model-path zai-org/GLM-4.7-FP8 \
|
| 41 |
--speculative-algorithm EAGLE3 \
|
| 42 |
--speculative-draft-model-path thoughtworks/GLM-4.7-FP8-Eagle3 \
|
| 43 |
--speculative-num-steps 3 \
|
|
|
|
| 52 |
|
| 53 |
```bash
|
| 54 |
python -m sglang.launch_server \
|
| 55 |
+
--model-path zai-org/GLM-4.7-FP8 \
|
| 56 |
--speculative-algorithm EAGLE3 \
|
| 57 |
--speculative-draft-model-path thoughtworks/GLM-4.7-FP8-Eagle3 \
|
| 58 |
--speculative-num-steps 3 \
|
|
|
|
| 153 |
|
| 154 |
## License
|
| 155 |
|
| 156 |
+
This draft head is released under the [MIT License](https://opensource.org/licenses/MIT), matching the [GLM-4.7-FP8 license](https://huggingface.co/zai-org/GLM-4.7-FP8).
|
| 157 |
|
| 158 |
## Citation
|
| 159 |
|