jianchen0311 commited on
Commit
09e7bd0
·
verified ·
1 Parent(s): 3088934

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -11,7 +11,7 @@ tags:
11
  - diffusion-language-model
12
  ---
13
 
14
- # LLaMA3.1-8B-Instruct-DFlash-b10
15
  [**Paper (Coming Soon)**](#) | [**GitHub**](https://github.com/z-lab/dflash) | [**Blog**](https://z-lab.ai/projects/dflash/)
16
 
17
  **DFlash** is a novel speculative decoding method that utilizes a lightweight **block diffusion** model for drafting. It enables efficient, high-quality parallel drafting that pushes the limits of inference speed.
@@ -24,7 +24,7 @@ This model is the **drafter** component. It must be used in conjunction with the
24
 
25
  ## 📊 Training Data
26
 
27
- **LLaMA3.1-8B-Instruct-DFlash-b10** is trained on **Ultrachat-200K** and **ShareGPT** datasets, aiming to align with EAGLE-3 training data. The assistant reponses in the datasets are regenerated by `meta-llama/Llama-3.1-8B-Instruct`.
28
 
29
  ## 🚀 Quick Start
30
 
@@ -41,7 +41,7 @@ uv pip install "git+https://github.com/sgl-project/sglang.git@refs/pull/16818/he
41
  python -m sglang.launch_server \
42
  --model-path meta-llama/Llama-3.1-8B-Instruct \
43
  --speculative-algorithm DFLASH \
44
- --speculative-draft-model-path z-lab/LLaMA3.1-8B-Instruct-DFlash-b10 \
45
  --tp-size 1 \
46
  --dtype bfloat16 \
47
  --attention-backend fa3 \
@@ -61,7 +61,7 @@ pip install transformers==4.57.3 torch==2.9.0 accelerate
61
  from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer
62
 
63
  model = AutoModel.from_pretrained(
64
- "z-lab/LLaMA3.1-8B-Instruct-DFlash-b10",
65
  trust_remote_code=True,
66
  dtype="auto",
67
  device_map="cuda:0"
 
11
  - diffusion-language-model
12
  ---
13
 
14
+ # LLaMA3.1-8B-Instruct-DFlash-UltraChat
15
  [**Paper (Coming Soon)**](#) | [**GitHub**](https://github.com/z-lab/dflash) | [**Blog**](https://z-lab.ai/projects/dflash/)
16
 
17
  **DFlash** is a novel speculative decoding method that utilizes a lightweight **block diffusion** model for drafting. It enables efficient, high-quality parallel drafting that pushes the limits of inference speed.
 
24
 
25
  ## 📊 Training Data
26
 
27
+ **LLaMA3.1-8B-Instruct-DFlash-UltraChat** is trained on **Ultrachat-200K** and **ShareGPT** datasets, aiming to align with EAGLE-3 training data. The assistant reponses in the datasets are regenerated by `meta-llama/Llama-3.1-8B-Instruct`.
28
 
29
  ## 🚀 Quick Start
30
 
 
41
  python -m sglang.launch_server \
42
  --model-path meta-llama/Llama-3.1-8B-Instruct \
43
  --speculative-algorithm DFLASH \
44
+ --speculative-draft-model-path z-lab/LLaMA3.1-8B-Instruct-DFlash-UltraChat \
45
  --tp-size 1 \
46
  --dtype bfloat16 \
47
  --attention-backend fa3 \
 
61
  from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer
62
 
63
  model = AutoModel.from_pretrained(
64
+ "z-lab/LLaMA3.1-8B-Instruct-DFlash-UltraChat",
65
  trust_remote_code=True,
66
  dtype="auto",
67
  device_map="cuda:0"