Update README.md
#2
by rogerwyf - opened
README.md
CHANGED
|
@@ -74,13 +74,22 @@ Benchmarks were run using [SpecForge](https://github.com/sgl-project/SpecForge/b
|
|
| 74 |
### Requirements
|
| 75 |
|
| 76 |
- NVIDIA GPU with CUDA 12.0+
|
|
|
|
| 77 |
- [SGLang](https://github.com/sgl-project/sglang) ≥ 0.5.8
|
| 78 |
|
| 79 |
-
### Launch Server
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
|
| 81 |
```bash
|
| 82 |
python -m sglang.launch_server \
|
| 83 |
-
--model-path /
|
| 84 |
--tp 8 \
|
| 85 |
--trust-remote-code \
|
| 86 |
--speculative-algorithm EAGLE3 \
|
|
@@ -96,7 +105,7 @@ python -m sglang.launch_server \
|
|
| 96 |
|
| 97 |
```bash
|
| 98 |
python bench_eagle3.py \
|
| 99 |
-
--model-path /
|
| 100 |
--port 30000 \
|
| 101 |
--config-list 1,3,1,4 \
|
| 102 |
--benchmark-list <benchmark_name> \
|
|
@@ -104,3 +113,9 @@ python bench_eagle3.py \
|
|
| 104 |
```
|
| 105 |
|
| 106 |
`--config-list` format: `topk,num_steps,topk,num_draft_tokens`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
### Requirements
|
| 75 |
|
| 76 |
- NVIDIA GPU with CUDA 12.0+
|
| 77 |
+
- [vLLM](https://github.com/vllm-project/vllm) >= (0.18.0) or you can install the [nightly wheel/docker image](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#install-the-latest-code).
|
| 78 |
- [SGLang](https://github.com/sgl-project/sglang) ≥ 0.5.8
|
| 79 |
|
| 80 |
+
### Launch Server (vLLM)
|
| 81 |
+
```bash
|
| 82 |
+
vllm serve moonshotai/Kimi-K2.5 \
|
| 83 |
+
--tensor-parallel-size 8 \
|
| 84 |
+
--speculative-config '{"model": "lightseekorg/kimi-k2.5-eagle3", "method": "eagle3", "num_speculative_tokens": 3}' \
|
| 85 |
+
--trust-remote-code
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
### Launch Server (SGLang)
|
| 89 |
|
| 90 |
```bash
|
| 91 |
python -m sglang.launch_server \
|
| 92 |
+
--model-path moonshotai/Kimi-K2.5 \
|
| 93 |
--tp 8 \
|
| 94 |
--trust-remote-code \
|
| 95 |
--speculative-algorithm EAGLE3 \
|
|
|
|
| 105 |
|
| 106 |
```bash
|
| 107 |
python bench_eagle3.py \
|
| 108 |
+
--model-path moonshotai/Kimi-K2.5 \
|
| 109 |
--port 30000 \
|
| 110 |
--config-list 1,3,1,4 \
|
| 111 |
--benchmark-list <benchmark_name> \
|
|
|
|
| 113 |
```
|
| 114 |
|
| 115 |
`--config-list` format: `topk,num_steps,topk,num_draft_tokens`.
|
| 116 |
+
|
| 117 |
+
### Metrics
|
| 118 |
+
The same underlying run would produce different reported numbers from the two engines because:
|
| 119 |
+
|
| 120 |
+
- SGLang (`accept len`) adds +1 to each round's count (including the guaranteed target-model token), then averages.
|
| 121 |
+
- vLLM (`mean acceptance length`) does not add +1 — it counts only accepted draft tokens.
|