Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ tags:
 - diffusion-language-model
 ---
-# LLaMA3.1-8B-Instruct-DFlash-b10
 [**Paper (Coming Soon)**](#) | [**GitHub**](https://github.com/z-lab/dflash) | [**Blog**](https://z-lab.ai/projects/dflash/)
 **DFlash** is a novel speculative decoding method that utilizes a lightweight **block diffusion** model for drafting. It enables efficient, high-quality parallel drafting that pushes the limits of inference speed.
@@ -24,7 +24,7 @@ This model is the **drafter** component. It must be used in conjunction with the
 ## 📊 Training Data
-**LLaMA3.1-8B-Instruct-DFlash-b10** is trained on **Ultrachat-200K** and **ShareGPT** datasets, aiming to align with EAGLE-3 training data. The assistant reponses in the datasets are regenerated by `meta-llama/Llama-3.1-8B-Instruct`.
 ## 🚀 Quick Start
@@ -41,7 +41,7 @@ uv pip install "git+https://github.com/sgl-project/sglang.git@refs/pull/16818/he
 python -m sglang.launch_server \
     --model-path meta-llama/Llama-3.1-8B-Instruct \
     --speculative-algorithm DFLASH \
-    --speculative-draft-model-path z-lab/LLaMA3.1-8B-Instruct-DFlash-b10 \
     --tp-size 1 \
     --dtype bfloat16 \
     --attention-backend fa3 \
@@ -61,7 +61,7 @@ pip install transformers==4.57.3 torch==2.9.0 accelerate
 from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer
 model = AutoModel.from_pretrained(
-    "z-lab/LLaMA3.1-8B-Instruct-DFlash-b10",
     trust_remote_code=True,
     dtype="auto",
     device_map="cuda:0"

 - diffusion-language-model
 ---
+# LLaMA3.1-8B-Instruct-DFlash-UltraChat
 [**Paper (Coming Soon)**](#) | [**GitHub**](https://github.com/z-lab/dflash) | [**Blog**](https://z-lab.ai/projects/dflash/)
 **DFlash** is a novel speculative decoding method that utilizes a lightweight **block diffusion** model for drafting. It enables efficient, high-quality parallel drafting that pushes the limits of inference speed.
 ## 📊 Training Data
+**LLaMA3.1-8B-Instruct-DFlash-UltraChat** is trained on **Ultrachat-200K** and **ShareGPT** datasets, aiming to align with EAGLE-3 training data. The assistant reponses in the datasets are regenerated by `meta-llama/Llama-3.1-8B-Instruct`.
 ## 🚀 Quick Start
 python -m sglang.launch_server \
     --model-path meta-llama/Llama-3.1-8B-Instruct \
     --speculative-algorithm DFLASH \
+    --speculative-draft-model-path z-lab/LLaMA3.1-8B-Instruct-DFlash-UltraChat \
     --tp-size 1 \
     --dtype bfloat16 \
     --attention-backend fa3 \
 from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer
 model = AutoModel.from_pretrained(
+    "z-lab/LLaMA3.1-8B-Instruct-DFlash-UltraChat",
     trust_remote_code=True,
     dtype="auto",
     device_map="cuda:0"