Update README.md
Browse files
README.md
CHANGED
|
@@ -7,8 +7,8 @@ tags:
|
|
| 7 |
- diffusion
|
| 8 |
- efficiency
|
| 9 |
- flash-decoding
|
| 10 |
-
- qwen
|
| 11 |
- diffusion-language-model
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
# gpt-oss-20b-DFlash
|
|
@@ -106,7 +106,7 @@ We use a **block size of 8 (7 draft tokens)** during speculation. DFlash consist
|
|
| 106 |
|
| 107 |
The numbers reported are end-to-end speedup (including prefill time). You can specify different block size during inference by passing `--speculative-num-draft-tokens` arguments when launch the server.
|
| 108 |
|
| 109 |
-
The reasoning effort is set to medium for all tasks. Low reasoning effort will give even higher acceptance length.
|
| 110 |
|
| 111 |
| | Math500 | GSM8K | HumanEval | MT-Bench |
|
| 112 |
|----------------|----------|--------|------------|-----------|
|
|
|
|
| 7 |
- diffusion
|
| 8 |
- efficiency
|
| 9 |
- flash-decoding
|
|
|
|
| 10 |
- diffusion-language-model
|
| 11 |
+
- gpt-oss
|
| 12 |
---
|
| 13 |
|
| 14 |
# gpt-oss-20b-DFlash
|
|
|
|
| 106 |
|
| 107 |
The numbers reported are end-to-end speedup (including prefill time). You can specify different block size during inference by passing `--speculative-num-draft-tokens` arguments when launch the server.
|
| 108 |
|
| 109 |
+
The reasoning effort is set to **medium** for all tasks. Low reasoning effort will give even higher acceptance length.
|
| 110 |
|
| 111 |
| | Math500 | GSM8K | HumanEval | MT-Bench |
|
| 112 |
|----------------|----------|--------|------------|-----------|
|