Text Generation
Transformers
Safetensors
llada2_moe
conversational
custom_code
Zigeng commited on
Commit
6aec62a
·
verified ·
1 Parent(s): ff5c1d3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -10
README.md CHANGED
@@ -38,16 +38,6 @@ base_model:
38
  <em>Superior Parallelism-Accuracy Trade-off, Increased TPF with Maintained Accuracy.</em>
39
  </div>
40
 
41
- ## 💡 Introduction
42
-
43
- We present DMax, a new paradigm for efficient dLLMs. It mitigates error accumulation in parallel decoding, enabling aggressive decoding parallelism while preserving generation quality. Unlike conventional masked dLLMs that decode through a binary mask-to-token transition, DMax reformulates decoding as a progressive self-refinement from mask embeddings to token embeddings. At the core of our approach is On-Policy Uniform Training, a novel training strategy that efficiently unifies masked and uniform dLLMs, equipping the model to recover clean tokens from both masked inputs and its own erroneous predictions. Building on this foundation, we further intoduce Soft Parallel Decoding. Extensive experiments across a variety of benchmarks demonstrate the effectiveness of DMax.
44
-
45
- <!-- ![figure](assets/intro.png) -->
46
- <div align="center">
47
- <img src="assets/train.png" width="100%" />
48
- <br>
49
- <em>Overview of the On-Policy Uniform Training.</em>
50
- </div>
51
 
52
  ## 💻 Model and Datasets
53
 
 
38
  <em>Superior Parallelism-Accuracy Trade-off, Increased TPF with Maintained Accuracy.</em>
39
  </div>
40
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  ## 💻 Model and Datasets
43