Zigeng commited on
Commit
973a8e0
Β·
verified Β·
1 Parent(s): 2337009

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -32
README.md CHANGED
@@ -52,28 +52,27 @@ We introduce dParallel, a simple and effective method that unlocks the inherent
52
  <td><a href="https://arxiv.org/pdf/2509.26488">ArXiv-Link</a></td>
53
  </tr>
54
  <tr>
55
- <td>πŸ€– <strong>Model</strong></td>
56
- <td><a href="https://huggingface.co/Zigeng/dParallel-LLaDA-8B-instruct">dParallel-LLaDA-8b-instruct</a></td>
57
  </tr>
58
  <tr>
59
- <td>πŸ“Š <strong>Data</strong></td>
 
 
 
 
60
  <td><a href="https://huggingface.co/datasets/Zigeng/dParallel_LLaDA_Distill_Data">
61
  dParallel-LLaDA-Distill Dataset</a></td>
62
  </tr>
 
 
 
 
 
63
  </tbody>
64
  </table>
65
 
66
- ## πŸ”₯Updates
67
- * πŸ”₯ **[Oct 1, 2025]**: Our arxiv paper is available.
68
- * πŸ”₯ **[Oct 1, 2025]**: Code, model and dataset are released.
69
 
70
- ## πŸ”§ Installation:
71
-
72
- ```bash
73
- conda create -n dparallel python==3.10
74
- conda activate dparallel
75
- pip3 install -r requirements.txt
76
- ```
77
 
78
  ## πŸš€ Quick Start:
79
  ```python
@@ -99,25 +98,6 @@ print("Response:",tokenizer.batch_decode(out[0][:, input_ids.shape[1]:], skip_sp
99
  print("NFE:",out[1])
100
  ```
101
 
102
- ## ⚑ Evaluation:
103
- We provide evaluation scripts covering GSM8K, Minerva_MATH, HumanEval, and MBPP benchmarks. Importantly, both our reported results and the accompanying code are obtained without using caching or sparse attention techniques. Nevertheless, our method is fully compatible with these optimizations, and integrating them can yield even greater speedups.
104
- ```bash
105
- sh eval.sh
106
- ```
107
-
108
- ## πŸ”₯ Training
109
- ### 1. Certainty-Forcing Distillation with LoRA:
110
- We provide training scripts for our proposed Certainty-Forcing Distillation process. The implementation utilizes LoRA during the training process, with the configuration details specified in [config_lora_llada.yaml](https://github.com/czg1225/dParallel/blob/master/configs/config_lora_llada.yaml). The training can be completed with 24 GB memory GPUs.
111
- ```python
112
- deepspeed --master_port 29501 --include localhost:0,1,2,3,4,5,6,7 llada_train.py
113
- ```
114
-
115
- ### 2. LoRA Merge:
116
- After training, merge the LoRA weights to get the dParallel-dLLM.
117
- ```python
118
- python merge_lora.py
119
- ```
120
-
121
 
122
 
123
  ## πŸ“– Experimental Results
 
52
  <td><a href="https://arxiv.org/pdf/2509.26488">ArXiv-Link</a></td>
53
  </tr>
54
  <tr>
55
+ <td>πŸ€– <strong>LLaDA Model</strong></td>
56
+ <td><a href="https://huggingface.co/Zigeng/dParallel-LLaDA-8B-instruct">dParallel-LLaDA-8B-instruct</a></td>
57
  </tr>
58
  <tr>
59
+ <td>πŸ€– <strong>Dream Model</strong></td>
60
+ <td><a href="https://huggingface.co/Zigeng/dParallel_Dream_7B_Instruct">dParallel-Dream-7B-instruct</a></td>
61
+ </tr>
62
+ <tr>
63
+ <td>πŸ“Š <strong>LLaDA Data</strong></td>
64
  <td><a href="https://huggingface.co/datasets/Zigeng/dParallel_LLaDA_Distill_Data">
65
  dParallel-LLaDA-Distill Dataset</a></td>
66
  </tr>
67
+ <tr>
68
+ <td>πŸ“Š <strong>Dream Data</strong></td>
69
+ <td><a href="https://huggingface.co/datasets/Zigeng/dParallel_Dream_Distill_Data">
70
+ dParallel-Dream-Distill Dataset</a></td>
71
+ </tr>
72
  </tbody>
73
  </table>
74
 
 
 
 
75
 
 
 
 
 
 
 
 
76
 
77
  ## πŸš€ Quick Start:
78
  ```python
 
98
  print("NFE:",out[1])
99
  ```
100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
 
102
 
103
  ## πŸ“– Experimental Results