Zigeng commited on
Commit
49c94ec
·
verified ·
1 Parent(s): b50451d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -35
README.md CHANGED
@@ -64,17 +64,6 @@ dParallel-LLaDA-Distill Dataset</a></td>
64
  </tbody>
65
  </table>
66
 
67
- ## 🔥Updates
68
- * 🔥 **[Oct 2, 2025]**: Our arxiv paper is available.
69
- * 🔥 **[Oct 1, 2025]**: Code, model and dataset are released.
70
-
71
- ## 🔧 Installation:
72
-
73
- ```bash
74
- conda create -n dparallel python==3.10
75
- conda activate dparallel
76
- pip3 install -r requirements.txt
77
- ```
78
 
79
  ## 🚀 Quick Start:
80
  ```python
@@ -100,27 +89,6 @@ print("Response:",tokenizer.batch_decode(out[0][:, input_ids.shape[1]:], skip_sp
100
  print("NFE:",out[1])
101
  ```
102
 
103
-
104
- ## 🔥 Training
105
- ### 1. Certainty-Forcing Distillation with LoRA:
106
- We provide training scripts for our proposed Certainty-Forcing Distillation process. The implementation utilizes LoRA during the training process, with the configuration details specified in [config_lora_llada.yaml](https://github.com/czg1225/dParallel/blob/master/configs/config_lora_llada.yaml).
107
- ```bash
108
- deepspeed --master_port 29501 --include localhost:0,1,2,3 llada_train.py
109
- ```
110
-
111
- ### 2. LoRA Merge:
112
- After training, merge the LoRA weights to get the dParallel-dLLM.
113
- ```bash
114
- python merge_lora.py
115
- ```
116
-
117
- ## ⚡ Evaluation:
118
- We provide evaluation scripts for the GSM8K, Minerva_MATH, HumanEval, and MBPP benchmarks. Although our approach does not rely on caching or sparse attention techniques, it is fully compatible with them and can achieve even greater speedups when combined.
119
- ```bash
120
- sh eval.sh
121
- ```
122
-
123
-
124
  ## 📖 Experimental Results
125
  ### Results on LLaDA-8B-Instruct:
126
  ![llada-exp](assets/llada_exp.png)
@@ -131,9 +99,6 @@ sh eval.sh
131
  ### Better Speed-Accuracy Trade-off:
132
  ![trade-off](assets/trade-off.png)
133
 
134
- ## ☀️ Acknowledgement
135
- Our code builds on [LLaDA](https://github.com/ML-GSAI/LLaDA), [Dream](https://github.com/DreamLM/Dream), [Fast-dLLM](https://github.com/NVlabs/Fast-dLLM/tree/main), and [dKV-Cache](https://github.com/horseee/dkv-cache), and we acknowledge these great works for laying the groundwork that made our approach possible.
136
-
137
  ## Citation
138
  If our research assists your work, please give us a star ⭐ or cite us using:
139
  ```
 
64
  </tbody>
65
  </table>
66
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
  ## 🚀 Quick Start:
69
  ```python
 
89
  print("NFE:",out[1])
90
  ```
91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
  ## 📖 Experimental Results
93
  ### Results on LLaDA-8B-Instruct:
94
  ![llada-exp](assets/llada_exp.png)
 
99
  ### Better Speed-Accuracy Trade-off:
100
  ![trade-off](assets/trade-off.png)
101
 
 
 
 
102
  ## Citation
103
  If our research assists your work, please give us a star ⭐ or cite us using:
104
  ```