dUltra-coding / README.md
nielsr's picture
nielsr HF Staff
Add comprehensive model card for dUltra
f76290d verified
|
raw
history blame
2.21 kB
---
library_name: transformers
pipeline_tag: text-generation
---
# dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning
dUltra is an on-policy reinforcement learning framework based on Group Relative Policy Optimization (GRPO) that learns unmasking strategies for efficient parallel decoding in Masked Diffusion Language Models (MDLMs). By training an unmasking planner head, dUltra enables diffusion language models to achieve state-of-the-art performance in terms of accuracy and efficiency trade-offs.
- **Paper:** [dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning](https://huggingface.co/papers/2512.21446)
- **GitHub Repository:** [https://github.com/chinsengi/dUltra-os](https://github.com/chinsengi/dUltra-os)
## Model Description
Masked diffusion language models offer the potential for parallel token generation. dUltra introduces an unmasking planner head that predicts per-token unmasking likelihoods under independent Bernoulli distributions. The framework jointly optimizes the base diffusion LLM and the unmasking order planner using reward signals combining verifiable reward, distillation reward, and the number of unmasking steps. dUltra achieves superior accuracy-efficiency trade-offs across mathematical reasoning and code generation tasks.
## Usage
To use the dUltra model, you can load it with the `transformers` library. Note that `trust_remote_code=True` is required to load the custom model architecture.
```python
import torch
from model.llada.lladou import LLaDOUModelLM
from transformers import AutoTokenizer
model = LLaDOUModelLM.from_pretrained(
"sengi/dUltra-math",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained("sengi/dUltra-math")
```
## Citation
```bibtex
@misc{chen2025dultraultrafastdiffusionlanguage,
title={dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning},
author={Shirui Chen and Jiantao Jiao and Lillian J. Ratliff and Banghua Zhu},
year={2025},
eprint={2512.21446},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2512.21446},
}
```