Update README.md
Browse files
README.md
CHANGED
|
@@ -103,8 +103,8 @@ print(tokenizer.decode(generate_ids[0], skip_special_tokens=True))
|
|
| 103 |
DFlash consistently achieves higher speedups than the state-of-the-art speculative decoding method **EAGLE-3**. All experiments are conducted using **SGLang** on a single **B200 GPU**.
|
| 104 |
|
| 105 |
For EAGLE-3, we evaluate two speculative decoding configurations:
|
| 106 |
-
- `--speculative-num-steps 7`, `--speculative-eagle-topk
|
| 107 |
-
- `--speculative-num-steps 7`, `--speculative-eagle-topk
|
| 108 |
|
| 109 |
For DFlash, we use a block size of 10 during speculation.
|
| 110 |
|
|
|
|
| 103 |
DFlash consistently achieves higher speedups than the state-of-the-art speculative decoding method **EAGLE-3**. All experiments are conducted using **SGLang** on a single **B200 GPU**.
|
| 104 |
|
| 105 |
For EAGLE-3, we evaluate two speculative decoding configurations:
|
| 106 |
+
- `--speculative-num-steps 7`, `--speculative-eagle-topk 10`, `--speculative-num-draft-tokens 10`
|
| 107 |
+
- `--speculative-num-steps 7`, `--speculative-eagle-topk 10`, `--speculative-num-draft-tokens 60`, which is the **official** setting used in the EAGLE-3 paper.
|
| 108 |
|
| 109 |
For DFlash, we use a block size of 10 during speculation.
|
| 110 |
|