metadata
base_model:
- inclusionAI/LLaDA2.0-mini
datasets:
- Zigeng/DMax-LLaDA-2.0-Mini-Code-Trajectories
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
DMax is a new paradigm for efficient diffusion language models (dLLMs) that enables aggressive decoding parallelism while preserving generation quality. This repository contains the DMax-Coder-16B model, specialized for highly parallel code generation.
πͺ Highlights
- Aggressive Decoding Parallelism: Achieves 6.0 TPF on math and reasoning tasks and 6.6 TPF on code tasks while preserving accuracy.
- Self-Revising dLLM: Extends a pretrained MDLM into a UDLM with an intrinsic ability to revise its own erroneous predictions during decoding.
- Soft Parallel Decoding: Uses interpolation between mask and token embeddings to propagate confidence priors from previous steps.
Superior Parallelism-Accuracy Trade-off, Increased TPF with Maintained Accuracy.
π» Model and Datasets
| Model | Description | Source Model | Link |
|---|---|---|---|
| π€ DMax-Math-16B | Highly parallel dLLM for math and reasoning. | LLaDA-2.0-mini | HF |
| π€ DMax-Coder-16B | Highly parallel dLLM for code generation. | LLaDA-2.0-mini | HF |
| Dataset | Description | Link |
|---|---|---|
| π DMax-Math-Training-Data | math trajectories generated by LLaDA-2.0-mini | HF |
| π DMax-Code-Training-Data | code trajectories generated by LLaDA-2.0-mini | HF |
π Quick Start
import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"Zigeng/DMax-Coder-16B", trust_remote_code=True, device_map="cuda:0"
)
model = model.to(torch.bfloat16)
model.eval()
tokenizer = AutoTokenizer.from_pretrained("Zigeng/DMax-Coder-16B", trust_remote_code=True)
prompt = "Write a python function to find the first repeated character in a given string." + "
Please enclose your code within delimiters as follows:
```python
# YOUR CODE HERE
"
input_ids = tokenizer.apply_chat_template( [{"role": "user", "content": prompt}], add_generation_prompt=True, tokenize=True, return_tensors="pt", )
nfe, generated_tokens = model.generate_spd( inputs=input_ids, gen_length=2048, block_length=32, threshold=0.65, )
generated_answer = tokenizer.decode( generated_tokens[0], skip_special_tokens=True, )
print(generated_answer) print("nfe:",nfe,"token length",len(generated_tokens[0]))
## π Citation
```bibtex
@misc{chen2026dmaxaggressiveparalleldecoding,
title={DMax: Aggressive Parallel Decoding for dLLMs},
author={Zigeng Chen and Gongfan Fang and Xinyin Ma and Ruonan Yu and Xinchao Wang},
year={2026},
eprint={2604.08302},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2604.08302},
}