File size: 1,886 Bytes
7c0045b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | ---
pipeline_tag: text-generation
library_name: transformers
tags:
- diffusion
- reinforcement-learning
- grpo
---
# dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning
dUltra is an on-policy reinforcement learning framework based on Group Relative Policy Optimization (GRPO) that learns unmasking strategies for efficient parallel decoding in masked diffusion language models (MDLMs).
By training an unmasking planner head that predicts per-token unmasking likelihoods, dUltra enables better exploitation of parallel generation. Across mathematical reasoning and code generation tasks, it achieves superior accuracy-efficiency trade-offs compared to state-of-the-art heuristic and distillation baselines.
- **Paper:** [dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning](https://huggingface.co/papers/2512.21446)
- **Repository:** [https://github.com/chinsengi/dUltra-os](https://github.com/chinsengi/dUltra-os)
## Sample Usage
To use this model, you can load it using the following code snippet from the official repository (requires `trust_remote_code=True`):
```python
import torch
from model.llada.lladou import LLaDOUModelLM
from transformers import AutoTokenizer
model = LLaDOUModelLM.from_pretrained(
"sengi/dUltra-math",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained("sengi/dUltra-math")
```
## Citation
If you find this work useful, please cite:
```bibtex
@misc{chen2025dultraultrafastdiffusionlanguage,
title={dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning},
author={Shirui Chen and Jiantao Jiao and Lillian J. Ratliff and Banghua Zhu},
year={2025},
eprint={2512.21446},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2512.21446},
}
``` |