jianchen0311 commited on
Commit
2908986
·
verified ·
1 Parent(s): f8fc8df

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -7,8 +7,8 @@ tags:
7
  - diffusion
8
  - efficiency
9
  - flash-decoding
10
- - qwen
11
  - diffusion-language-model
 
12
  ---
13
 
14
  # gpt-oss-20b-DFlash
@@ -106,7 +106,7 @@ We use a **block size of 8 (7 draft tokens)** during speculation. DFlash consist
106
 
107
  The numbers reported are end-to-end speedup (including prefill time). You can specify different block size during inference by passing `--speculative-num-draft-tokens` arguments when launch the server.
108
 
109
- The reasoning effort is set to medium for all tasks. Low reasoning effort will give even higher acceptance length.
110
 
111
  | | Math500 | GSM8K | HumanEval | MT-Bench |
112
  |----------------|----------|--------|------------|-----------|
 
7
  - diffusion
8
  - efficiency
9
  - flash-decoding
 
10
  - diffusion-language-model
11
+ - gpt-oss
12
  ---
13
 
14
  # gpt-oss-20b-DFlash
 
106
 
107
  The numbers reported are end-to-end speedup (including prefill time). You can specify different block size during inference by passing `--speculative-num-draft-tokens` arguments when launch the server.
108
 
109
+ The reasoning effort is set to **medium** for all tasks. Low reasoning effort will give even higher acceptance length.
110
 
111
  | | Math500 | GSM8K | HumanEval | MT-Bench |
112
  |----------------|----------|--------|------------|-----------|