jianchen0311 commited on
Commit
63940de
·
verified ·
1 Parent(s): c56ea49

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -2
README.md CHANGED
@@ -24,15 +24,38 @@ This model is the **drafter** component. It must be used in conjunction with the
24
 
25
  ## 🚀 Quick Start
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  This model requires `trust_remote_code=True` to load the custom architecture for block diffusion generation.
28
 
29
- ### Installation
30
  Ensure you have `transformers` and `torch` installed. Our evaluation is conducted with torch==2.9.0 and transformers=4.57.3.
31
  ```bash
32
  pip install transformers==4.57.3 torch==2.9.0 accelerate
33
  ```
34
 
35
- ### Inference Example
36
  The following example demonstrates how to load the DFlash drafter and the Qwen3-8B target model to perform speculative decoding.
37
  ```python
38
  import torch
 
24
 
25
  ## 🚀 Quick Start
26
 
27
+ ### SGLang
28
+ DFlash is now supported on SGLang. And vLLM integration is currently in progress.
29
+
30
+ #### Installation
31
+ ```bash
32
+ uv pip install "git+https://github.com/sgl-project/sglang.git@refs/pull/16818/head#subdirectory=python"
33
+ ```
34
+
35
+ #### Inference
36
+ ```bash
37
+ python -m sglang.launch_server \
38
+ --model-path Qwen/Qwen3-8B \
39
+ --speculative-algorithm DFLASH \
40
+ --speculative-draft-model-path z-lab/Qwen3-8B-DFlash-b16 \
41
+ --tp-size 1 \
42
+ --dtype bfloat16 \
43
+ --attention-backend fa3 \
44
+ --mem-fraction-static 0.75 \
45
+ --trust-remote-code \
46
+ ```
47
+
48
+ ### Transformers
49
+
50
  This model requires `trust_remote_code=True` to load the custom architecture for block diffusion generation.
51
 
52
+ #### Installation
53
  Ensure you have `transformers` and `torch` installed. Our evaluation is conducted with torch==2.9.0 and transformers=4.57.3.
54
  ```bash
55
  pip install transformers==4.57.3 torch==2.9.0 accelerate
56
  ```
57
 
58
+ #### Inference
59
  The following example demonstrates how to load the DFlash drafter and the Qwen3-8B target model to perform speculative decoding.
60
  ```python
61
  import torch