jianchen0311 commited on
Commit
f71e3b7
·
verified ·
1 Parent(s): ce45e9d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -103,8 +103,8 @@ print(tokenizer.decode(generate_ids[0], skip_special_tokens=True))
103
  DFlash consistently achieves higher speedups than the state-of-the-art speculative decoding method **EAGLE-3**. All experiments are conducted using **SGLang** on a single **B200 GPU**.
104
 
105
  For EAGLE-3, we evaluate two speculative decoding configurations:
106
- - `--speculative-num-steps 7`, `--speculative-eagle-topk 1`, `--speculative-num-draft-tokens 10`
107
- - `--speculative-num-steps 7`, `--speculative-eagle-topk 1`, `--speculative-num-draft-tokens 60`, which is the **official** setting used in the EAGLE-3 paper.
108
 
109
  For DFlash, we use a block size of 10 during speculation.
110
 
 
103
  DFlash consistently achieves higher speedups than the state-of-the-art speculative decoding method **EAGLE-3**. All experiments are conducted using **SGLang** on a single **B200 GPU**.
104
 
105
  For EAGLE-3, we evaluate two speculative decoding configurations:
106
+ - `--speculative-num-steps 7`, `--speculative-eagle-topk 10`, `--speculative-num-draft-tokens 10`
107
+ - `--speculative-num-steps 7`, `--speculative-eagle-topk 10`, `--speculative-num-draft-tokens 60`, which is the **official** setting used in the EAGLE-3 paper.
108
 
109
  For DFlash, we use a block size of 10 during speculation.
110