Add pipeline tag, library metadata, and improve model card (#1)
Browse files- Add pipeline tag, library metadata, and improve model card (65976a5556695e8d8d7f71da8a89c90cb408136e)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -1,9 +1,11 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
datasets:
|
| 4 |
-
- Zigeng/DMax-LLaDA-2.0-Mini-Math-Trajectories
|
| 5 |
base_model:
|
| 6 |
- inclusionAI/LLaDA2.0-mini
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
---
|
| 8 |
|
| 9 |
<div align="center">
|
|
@@ -12,7 +14,7 @@ base_model:
|
|
| 12 |
<a href="https://github.com/czg1225/DMax/blob/main/LICENSE">
|
| 13 |
<img alt="Apache" src="https://img.shields.io/badge/License-Apache-4E94CE.svg">
|
| 14 |
</a>
|
| 15 |
-
<a href="https://arxiv.org/
|
| 16 |
<img src="https://img.shields.io/badge/Paper-Arxiv-darkred.svg" alt="Paper">
|
| 17 |
</a>
|
| 18 |
<a href="https://github.com/czg1225/DMax">
|
|
@@ -21,10 +23,9 @@ base_model:
|
|
| 21 |
</div>
|
| 22 |
</div>
|
| 23 |
|
| 24 |
-
|
| 25 |
-
> [Zigeng Chen](https://czg1225.github.io/chenzigeng99/), [Gongfan Fang](https://fangggf.github.io/), [Xinyin Ma](https://horseee.github.io/), [Ruonan Yu](https://scholar.google.com/citations?user=UHP95egAAAAJ&hl=en), [Xinchao Wang](https://sites.google.com/site/sitexinchaowang/)
|
| 26 |
-
> [xML Lab](https://sites.google.com/view/xml-nus), National University of Singapore
|
| 27 |
|
|
|
|
| 28 |
|
| 29 |
## 💪 Highlights
|
| 30 |
|
|
@@ -65,7 +66,9 @@ model = model.to(torch.bfloat16)
|
|
| 65 |
model.eval()
|
| 66 |
tokenizer = AutoTokenizer.from_pretrained("Zigeng/DMax-Math-16B", trust_remote_code=True)
|
| 67 |
|
| 68 |
-
prompt = "A robe takes 2 bolts of blue fiber and half that much white fiber. How many bolts in total does it take?" + "
|
|
|
|
|
|
|
| 69 |
|
| 70 |
input_ids = tokenizer.apply_chat_template(
|
| 71 |
[{"role": "user", "content": prompt}],
|
|
@@ -94,5 +97,16 @@ print("nfe:",nfe,"token length",len(generated_tokens[0]))
|
|
| 94 |
|
| 95 |

|
| 96 |
|
| 97 |
-
|
| 98 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- inclusionAI/LLaDA2.0-mini
|
| 4 |
+
datasets:
|
| 5 |
+
- Zigeng/DMax-LLaDA-2.0-Mini-Math-Trajectories
|
| 6 |
+
license: apache-2.0
|
| 7 |
+
library_name: transformers
|
| 8 |
+
pipeline_tag: text-generation
|
| 9 |
---
|
| 10 |
|
| 11 |
<div align="center">
|
|
|
|
| 14 |
<a href="https://github.com/czg1225/DMax/blob/main/LICENSE">
|
| 15 |
<img alt="Apache" src="https://img.shields.io/badge/License-Apache-4E94CE.svg">
|
| 16 |
</a>
|
| 17 |
+
<a href="https://arxiv.org/abs/2604.08302">
|
| 18 |
<img src="https://img.shields.io/badge/Paper-Arxiv-darkred.svg" alt="Paper">
|
| 19 |
</a>
|
| 20 |
<a href="https://github.com/czg1225/DMax">
|
|
|
|
| 23 |
</div>
|
| 24 |
</div>
|
| 25 |
|
| 26 |
+
This repository contains the weights for **DMax-Math-16B**, presented in the paper [DMax: Aggressive Parallel Decoding for dLLMs](https://huggingface.co/papers/2604.08302).
|
|
|
|
|
|
|
| 27 |
|
| 28 |
+
DMax is a new paradigm for efficient diffusion language models (dLLMs) that mitigates error accumulation in parallel decoding, enabling aggressive decoding parallelism while preserving generation quality.
|
| 29 |
|
| 30 |
## 💪 Highlights
|
| 31 |
|
|
|
|
| 66 |
model.eval()
|
| 67 |
tokenizer = AutoTokenizer.from_pretrained("Zigeng/DMax-Math-16B", trust_remote_code=True)
|
| 68 |
|
| 69 |
+
prompt = "A robe takes 2 bolts of blue fiber and half that much white fiber. How many bolts in total does it take?" + "
|
| 70 |
+
Let's think step by step
|
| 71 |
+
"
|
| 72 |
|
| 73 |
input_ids = tokenizer.apply_chat_template(
|
| 74 |
[{"role": "user", "content": prompt}],
|
|
|
|
| 97 |
|
| 98 |

|
| 99 |
|
| 100 |
+
## 📚 Citation
|
| 101 |
+
|
| 102 |
+
```bibtex
|
| 103 |
+
@misc{chen2026dmaxaggressiveparalleldecoding,
|
| 104 |
+
title={DMax: Aggressive Parallel Decoding for dLLMs},
|
| 105 |
+
author={Zigeng Chen and Gongfan Fang and Xinyin Ma and Ruonan Yu and Xinchao Wang},
|
| 106 |
+
year={2026},
|
| 107 |
+
eprint={2604.08302},
|
| 108 |
+
archivePrefix={arXiv},
|
| 109 |
+
primaryClass={cs.LG},
|
| 110 |
+
url={https://arxiv.org/abs/2604.08302},
|
| 111 |
+
}
|
| 112 |
+
```
|