Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -103,8 +103,8 @@ print(tokenizer.decode(generate_ids[0], skip_special_tokens=True))
 DFlash consistently achieves higher speedups than the state-of-the-art speculative decoding method **EAGLE-3**. All experiments are conducted using **SGLang** on a single **B200 GPU**.
 For EAGLE-3, we evaluate two speculative decoding configurations:
-- `--speculative-num-steps 7`, `--speculative-eagle-topk 1`, `--speculative-num-draft-tokens 10`
-- `--speculative-num-steps 7`, `--speculative-eagle-topk 1`, `--speculative-num-draft-tokens 60`, which is the **official** setting used in the EAGLE-3 paper.
 For DFlash, we use a block size of 10 during speculation.

 DFlash consistently achieves higher speedups than the state-of-the-art speculative decoding method **EAGLE-3**. All experiments are conducted using **SGLang** on a single **B200 GPU**.
 For EAGLE-3, we evaluate two speculative decoding configurations:
+- `--speculative-num-steps 7`, `--speculative-eagle-topk 10`, `--speculative-num-draft-tokens 10`
+- `--speculative-num-steps 7`, `--speculative-eagle-topk 10`, `--speculative-num-draft-tokens 60`, which is the **official** setting used in the EAGLE-3 paper.
 For DFlash, we use a block size of 10 during speculation.