File size: 2,210 Bytes
f76290d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
library_name: transformers
pipeline_tag: text-generation
---

# dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning

dUltra is an on-policy reinforcement learning framework based on Group Relative Policy Optimization (GRPO) that learns unmasking strategies for efficient parallel decoding in Masked Diffusion Language Models (MDLMs). By training an unmasking planner head, dUltra enables diffusion language models to achieve state-of-the-art performance in terms of accuracy and efficiency trade-offs.

- **Paper:** [dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning](https://huggingface.co/papers/2512.21446)
- **GitHub Repository:** [https://github.com/chinsengi/dUltra-os](https://github.com/chinsengi/dUltra-os)

## Model Description

Masked diffusion language models offer the potential for parallel token generation. dUltra introduces an unmasking planner head that predicts per-token unmasking likelihoods under independent Bernoulli distributions. The framework jointly optimizes the base diffusion LLM and the unmasking order planner using reward signals combining verifiable reward, distillation reward, and the number of unmasking steps. dUltra achieves superior accuracy-efficiency trade-offs across mathematical reasoning and code generation tasks.

## Usage

To use the dUltra model, you can load it with the `transformers` library. Note that `trust_remote_code=True` is required to load the custom model architecture.

```python
import torch
from model.llada.lladou import LLaDOUModelLM
from transformers import AutoTokenizer

model = LLaDOUModelLM.from_pretrained(
            "sengi/dUltra-math",
            trust_remote_code=True,
            torch_dtype=torch.bfloat16,
        )
tokenizer = AutoTokenizer.from_pretrained("sengi/dUltra-math")
```

## Citation

```bibtex
@misc{chen2025dultraultrafastdiffusionlanguage,
      title={dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning},
      author={Shirui Chen and Jiantao Jiao and Lillian J. Ratliff and Banghua Zhu},
      year={2025},
      eprint={2512.21446},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2512.21446},
}
```