Update README.md
Browse files
README.md
CHANGED
|
@@ -64,17 +64,6 @@ dParallel-LLaDA-Distill Dataset</a></td>
|
|
| 64 |
</tbody>
|
| 65 |
</table>
|
| 66 |
|
| 67 |
-
## 🔥Updates
|
| 68 |
-
* 🔥 **[Oct 2, 2025]**: Our arxiv paper is available.
|
| 69 |
-
* 🔥 **[Oct 1, 2025]**: Code, model and dataset are released.
|
| 70 |
-
|
| 71 |
-
## 🔧 Installation:
|
| 72 |
-
|
| 73 |
-
```bash
|
| 74 |
-
conda create -n dparallel python==3.10
|
| 75 |
-
conda activate dparallel
|
| 76 |
-
pip3 install -r requirements.txt
|
| 77 |
-
```
|
| 78 |
|
| 79 |
## 🚀 Quick Start:
|
| 80 |
```python
|
|
@@ -100,27 +89,6 @@ print("Response:",tokenizer.batch_decode(out[0][:, input_ids.shape[1]:], skip_sp
|
|
| 100 |
print("NFE:",out[1])
|
| 101 |
```
|
| 102 |
|
| 103 |
-
|
| 104 |
-
## 🔥 Training
|
| 105 |
-
### 1. Certainty-Forcing Distillation with LoRA:
|
| 106 |
-
We provide training scripts for our proposed Certainty-Forcing Distillation process. The implementation utilizes LoRA during the training process, with the configuration details specified in [config_lora_llada.yaml](https://github.com/czg1225/dParallel/blob/master/configs/config_lora_llada.yaml).
|
| 107 |
-
```bash
|
| 108 |
-
deepspeed --master_port 29501 --include localhost:0,1,2,3 llada_train.py
|
| 109 |
-
```
|
| 110 |
-
|
| 111 |
-
### 2. LoRA Merge:
|
| 112 |
-
After training, merge the LoRA weights to get the dParallel-dLLM.
|
| 113 |
-
```bash
|
| 114 |
-
python merge_lora.py
|
| 115 |
-
```
|
| 116 |
-
|
| 117 |
-
## ⚡ Evaluation:
|
| 118 |
-
We provide evaluation scripts for the GSM8K, Minerva_MATH, HumanEval, and MBPP benchmarks. Although our approach does not rely on caching or sparse attention techniques, it is fully compatible with them and can achieve even greater speedups when combined.
|
| 119 |
-
```bash
|
| 120 |
-
sh eval.sh
|
| 121 |
-
```
|
| 122 |
-
|
| 123 |
-
|
| 124 |
## 📖 Experimental Results
|
| 125 |
### Results on LLaDA-8B-Instruct:
|
| 126 |

|
|
@@ -131,9 +99,6 @@ sh eval.sh
|
|
| 131 |
### Better Speed-Accuracy Trade-off:
|
| 132 |

|
| 133 |
|
| 134 |
-
## ☀️ Acknowledgement
|
| 135 |
-
Our code builds on [LLaDA](https://github.com/ML-GSAI/LLaDA), [Dream](https://github.com/DreamLM/Dream), [Fast-dLLM](https://github.com/NVlabs/Fast-dLLM/tree/main), and [dKV-Cache](https://github.com/horseee/dkv-cache), and we acknowledge these great works for laying the groundwork that made our approach possible.
|
| 136 |
-
|
| 137 |
## Citation
|
| 138 |
If our research assists your work, please give us a star ⭐ or cite us using:
|
| 139 |
```
|
|
|
|
| 64 |
</tbody>
|
| 65 |
</table>
|
| 66 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
|
| 68 |
## 🚀 Quick Start:
|
| 69 |
```python
|
|
|
|
| 89 |
print("NFE:",out[1])
|
| 90 |
```
|
| 91 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
## 📖 Experimental Results
|
| 93 |
### Results on LLaDA-8B-Instruct:
|
| 94 |

|
|
|
|
| 99 |
### Better Speed-Accuracy Trade-off:
|
| 100 |

|
| 101 |
|
|
|
|
|
|
|
|
|
|
| 102 |
## Citation
|
| 103 |
If our research assists your work, please give us a star ⭐ or cite us using:
|
| 104 |
```
|