Zigeng
/

dParallel-LLaDA-8B-instruct

Text Generation

feature-extraction

Model card Files Files and versions

Zigeng commited on Oct 14, 2025

Commit

973a8e0

·

verified ·

1 Parent(s): 2337009

Update README.md

Files changed (1) hide show

README.md +12 -32

README.md CHANGED Viewed

@@ -52,28 +52,27 @@ We introduce dParallel, a simple and effective method that unlocks the inherent
       <td><a href="https://arxiv.org/pdf/2509.26488">ArXiv-Link</a></td>
     </tr>
     <tr>
-      <td>🤖 <strong>Model</strong></td>
-      <td><a href="https://huggingface.co/Zigeng/dParallel-LLaDA-8B-instruct">dParallel-LLaDA-8b-instruct</a></td>
     </tr>
     <tr>
-      <td>📊 <strong>Data</strong></td>
       <td><a href="https://huggingface.co/datasets/Zigeng/dParallel_LLaDA_Distill_Data">
 dParallel-LLaDA-Distill Dataset</a></td>
     </tr>
   </tbody>
 </table>
-## 🔥Updates
-* 🔥 **[Oct 1, 2025]**: Our arxiv paper is available.
-* 🔥 **[Oct 1, 2025]**: Code, model and dataset are released.
-## 🔧  Installation:
-```bash
-conda create -n dparallel python==3.10
-conda activate dparallel
-pip3 install -r requirements.txt
-```
 ## 🚀 Quick Start:
 ```python
@@ -99,25 +98,6 @@ print("Response:",tokenizer.batch_decode(out[0][:, input_ids.shape[1]:], skip_sp
 print("NFE:",out[1])
 ```
-## ⚡ Evaluation:
-We provide evaluation scripts covering GSM8K, Minerva_MATH, HumanEval, and MBPP benchmarks. Importantly, both our reported results and the accompanying code are obtained without using caching or sparse attention techniques. Nevertheless, our method is fully compatible with these optimizations, and integrating them can yield even greater speedups.
-```bash
-sh eval.sh
-```
-## 🔥 Training
-### 1. Certainty-Forcing Distillation with LoRA:
-We provide training scripts for our proposed Certainty-Forcing Distillation process. The implementation utilizes LoRA during the training process, with the configuration details specified in [config_lora_llada.yaml](https://github.com/czg1225/dParallel/blob/master/configs/config_lora_llada.yaml). The training can be completed with 24 GB memory GPUs.
-```python
-deepspeed --master_port 29501 --include localhost:0,1,2,3,4,5,6,7 llada_train.py
-```
-### 2. LoRA Merge:
-After training, merge the LoRA weights to get the dParallel-dLLM.
-```python
-python merge_lora.py
-```
 ## 📖 Experimental Results

       <td><a href="https://arxiv.org/pdf/2509.26488">ArXiv-Link</a></td>
     </tr>
     <tr>
+      <td>🤖 <strong>LLaDA Model</strong></td>
+      <td><a href="https://huggingface.co/Zigeng/dParallel-LLaDA-8B-instruct">dParallel-LLaDA-8B-instruct</a></td>
     </tr>
     <tr>
+      <td>🤖 <strong>Dream Model</strong></td>
+      <td><a href="https://huggingface.co/Zigeng/dParallel_Dream_7B_Instruct">dParallel-Dream-7B-instruct</a></td>
+    </tr>
+    <tr>
+      <td>📊 <strong>LLaDA Data</strong></td>
       <td><a href="https://huggingface.co/datasets/Zigeng/dParallel_LLaDA_Distill_Data">
 dParallel-LLaDA-Distill Dataset</a></td>
     </tr>
+    <tr>
+      <td>📊 <strong>Dream Data</strong></td>
+      <td><a href="https://huggingface.co/datasets/Zigeng/dParallel_Dream_Distill_Data">
+dParallel-Dream-Distill Dataset</a></td>
+    </tr>
   </tbody>
 </table>
 ## 🚀 Quick Start:
 ```python
 print("NFE:",out[1])
 ```
 ## 📖 Experimental Results