Update README.md
Browse files
README.md
CHANGED
|
@@ -24,15 +24,38 @@ This model is the **drafter** component. It must be used in conjunction with the
|
|
| 24 |
|
| 25 |
## 🚀 Quick Start
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
This model requires `trust_remote_code=True` to load the custom architecture for block diffusion generation.
|
| 28 |
|
| 29 |
-
### Installation
|
| 30 |
Ensure you have `transformers` and `torch` installed. Our evaluation is conducted with torch==2.9.0 and transformers=4.57.3.
|
| 31 |
```bash
|
| 32 |
pip install transformers==4.57.3 torch==2.9.0 accelerate
|
| 33 |
```
|
| 34 |
|
| 35 |
-
### Inference
|
| 36 |
The following example demonstrates how to load the DFlash drafter and the Qwen3-8B target model to perform speculative decoding.
|
| 37 |
```python
|
| 38 |
import torch
|
|
|
|
| 24 |
|
| 25 |
## 🚀 Quick Start
|
| 26 |
|
| 27 |
+
### SGLang
|
| 28 |
+
DFlash is now supported on SGLang. And vLLM integration is currently in progress.
|
| 29 |
+
|
| 30 |
+
#### Installation
|
| 31 |
+
```bash
|
| 32 |
+
uv pip install "git+https://github.com/sgl-project/sglang.git@refs/pull/16818/head#subdirectory=python"
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
#### Inference
|
| 36 |
+
```bash
|
| 37 |
+
python -m sglang.launch_server \
|
| 38 |
+
--model-path Qwen/Qwen3-8B \
|
| 39 |
+
--speculative-algorithm DFLASH \
|
| 40 |
+
--speculative-draft-model-path z-lab/Qwen3-8B-DFlash-b16 \
|
| 41 |
+
--tp-size 1 \
|
| 42 |
+
--dtype bfloat16 \
|
| 43 |
+
--attention-backend fa3 \
|
| 44 |
+
--mem-fraction-static 0.75 \
|
| 45 |
+
--trust-remote-code \
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
### Transformers
|
| 49 |
+
|
| 50 |
This model requires `trust_remote_code=True` to load the custom architecture for block diffusion generation.
|
| 51 |
|
| 52 |
+
#### Installation
|
| 53 |
Ensure you have `transformers` and `torch` installed. Our evaluation is conducted with torch==2.9.0 and transformers=4.57.3.
|
| 54 |
```bash
|
| 55 |
pip install transformers==4.57.3 torch==2.9.0 accelerate
|
| 56 |
```
|
| 57 |
|
| 58 |
+
#### Inference
|
| 59 |
The following example demonstrates how to load the DFlash drafter and the Qwen3-8B target model to perform speculative decoding.
|
| 60 |
```python
|
| 61 |
import torch
|