Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -24,15 +24,38 @@ This model is the **drafter** component. It must be used in conjunction with the
 ## 🚀 Quick Start
 This model requires `trust_remote_code=True` to load the custom architecture for block diffusion generation.
-### Installation
 Ensure you have `transformers` and `torch` installed. Our evaluation is conducted with torch==2.9.0 and transformers=4.57.3.
 ```bash
 pip install transformers==4.57.3 torch==2.9.0 accelerate
 ```
-### Inference Example
 The following example demonstrates how to load the DFlash drafter and the Qwen3-8B target model to perform speculative decoding.
 ```python
 import torch

 ## 🚀 Quick Start
+### SGLang
+DFlash is now supported on SGLang. And vLLM integration is currently in progress.
+#### Installation
+```bash
+uv pip install "git+https://github.com/sgl-project/sglang.git@refs/pull/16818/head#subdirectory=python"
+```
+#### Inference
+```bash
+python -m sglang.launch_server \
+    --model-path Qwen/Qwen3-8B \
+    --speculative-algorithm DFLASH \
+    --speculative-draft-model-path z-lab/Qwen3-8B-DFlash-b16 \
+    --tp-size 1 \
+    --dtype bfloat16 \
+    --attention-backend fa3 \
+    --mem-fraction-static 0.75 \
+    --trust-remote-code \
+```
+### Transformers
 This model requires `trust_remote_code=True` to load the custom architecture for block diffusion generation.
+#### Installation
 Ensure you have `transformers` and `torch` installed. Our evaluation is conducted with torch==2.9.0 and transformers=4.57.3.
 ```bash
 pip install transformers==4.57.3 torch==2.9.0 accelerate
 ```
+#### Inference
 The following example demonstrates how to load the DFlash drafter and the Qwen3-8B target model to perform speculative decoding.
 ```python
 import torch