Update README.md
Browse files
README.md
CHANGED
|
@@ -52,28 +52,27 @@ We introduce dParallel, a simple and effective method that unlocks the inherent
|
|
| 52 |
<td><a href="https://arxiv.org/pdf/2509.26488">ArXiv-Link</a></td>
|
| 53 |
</tr>
|
| 54 |
<tr>
|
| 55 |
-
<td>π€ <strong>Model</strong></td>
|
| 56 |
-
<td><a href="https://huggingface.co/Zigeng/dParallel-LLaDA-8B-instruct">dParallel-LLaDA-
|
| 57 |
</tr>
|
| 58 |
<tr>
|
| 59 |
-
<td
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
<td><a href="https://huggingface.co/datasets/Zigeng/dParallel_LLaDA_Distill_Data">
|
| 61 |
dParallel-LLaDA-Distill Dataset</a></td>
|
| 62 |
</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
</tbody>
|
| 64 |
</table>
|
| 65 |
|
| 66 |
-
## π₯Updates
|
| 67 |
-
* π₯ **[Oct 1, 2025]**: Our arxiv paper is available.
|
| 68 |
-
* π₯ **[Oct 1, 2025]**: Code, model and dataset are released.
|
| 69 |
|
| 70 |
-
## π§ Installation:
|
| 71 |
-
|
| 72 |
-
```bash
|
| 73 |
-
conda create -n dparallel python==3.10
|
| 74 |
-
conda activate dparallel
|
| 75 |
-
pip3 install -r requirements.txt
|
| 76 |
-
```
|
| 77 |
|
| 78 |
## π Quick Start:
|
| 79 |
```python
|
|
@@ -99,25 +98,6 @@ print("Response:",tokenizer.batch_decode(out[0][:, input_ids.shape[1]:], skip_sp
|
|
| 99 |
print("NFE:",out[1])
|
| 100 |
```
|
| 101 |
|
| 102 |
-
## β‘ Evaluation:
|
| 103 |
-
We provide evaluation scripts covering GSM8K, Minerva_MATH, HumanEval, and MBPP benchmarks. Importantly, both our reported results and the accompanying code are obtained without using caching or sparse attention techniques. Nevertheless, our method is fully compatible with these optimizations, and integrating them can yield even greater speedups.
|
| 104 |
-
```bash
|
| 105 |
-
sh eval.sh
|
| 106 |
-
```
|
| 107 |
-
|
| 108 |
-
## π₯ Training
|
| 109 |
-
### 1. Certainty-Forcing Distillation with LoRA:
|
| 110 |
-
We provide training scripts for our proposed Certainty-Forcing Distillation process. The implementation utilizes LoRA during the training process, with the configuration details specified in [config_lora_llada.yaml](https://github.com/czg1225/dParallel/blob/master/configs/config_lora_llada.yaml). The training can be completed with 24 GB memory GPUs.
|
| 111 |
-
```python
|
| 112 |
-
deepspeed --master_port 29501 --include localhost:0,1,2,3,4,5,6,7 llada_train.py
|
| 113 |
-
```
|
| 114 |
-
|
| 115 |
-
### 2. LoRA Merge:
|
| 116 |
-
After training, merge the LoRA weights to get the dParallel-dLLM.
|
| 117 |
-
```python
|
| 118 |
-
python merge_lora.py
|
| 119 |
-
```
|
| 120 |
-
|
| 121 |
|
| 122 |
|
| 123 |
## π Experimental Results
|
|
|
|
| 52 |
<td><a href="https://arxiv.org/pdf/2509.26488">ArXiv-Link</a></td>
|
| 53 |
</tr>
|
| 54 |
<tr>
|
| 55 |
+
<td>π€ <strong>LLaDA Model</strong></td>
|
| 56 |
+
<td><a href="https://huggingface.co/Zigeng/dParallel-LLaDA-8B-instruct">dParallel-LLaDA-8B-instruct</a></td>
|
| 57 |
</tr>
|
| 58 |
<tr>
|
| 59 |
+
<td>π€ <strong>Dream Model</strong></td>
|
| 60 |
+
<td><a href="https://huggingface.co/Zigeng/dParallel_Dream_7B_Instruct">dParallel-Dream-7B-instruct</a></td>
|
| 61 |
+
</tr>
|
| 62 |
+
<tr>
|
| 63 |
+
<td>π <strong>LLaDA Data</strong></td>
|
| 64 |
<td><a href="https://huggingface.co/datasets/Zigeng/dParallel_LLaDA_Distill_Data">
|
| 65 |
dParallel-LLaDA-Distill Dataset</a></td>
|
| 66 |
</tr>
|
| 67 |
+
<tr>
|
| 68 |
+
<td>π <strong>Dream Data</strong></td>
|
| 69 |
+
<td><a href="https://huggingface.co/datasets/Zigeng/dParallel_Dream_Distill_Data">
|
| 70 |
+
dParallel-Dream-Distill Dataset</a></td>
|
| 71 |
+
</tr>
|
| 72 |
</tbody>
|
| 73 |
</table>
|
| 74 |
|
|
|
|
|
|
|
|
|
|
| 75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
|
| 77 |
## π Quick Start:
|
| 78 |
```python
|
|
|
|
| 98 |
print("NFE:",out[1])
|
| 99 |
```
|
| 100 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
|
| 102 |
|
| 103 |
## π Experimental Results
|