| | --- |
| | library_name: transformers |
| | pipeline_tag: text-generation |
| | --- |
| | |
| | # dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning |
| |
|
| | dUltra is an on-policy reinforcement learning framework based on Group Relative Policy Optimization (GRPO) that learns unmasking strategies for efficient parallel decoding in Masked Diffusion Language Models (MDLMs). |
| |
|
| | Existing acceleration methods for MDLMs often rely on fixed heuristics or distillation. dUltra introduces an unmasking planner head that predicts per-token unmasking likelihoods, allowing the model to learn task-specific unmasking trajectories. This approach achieves superior accuracy-efficiency trade-offs on mathematical reasoning and code generation tasks. |
| |
|
| | - **Paper:** [dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning](https://huggingface.co/papers/2512.21446) |
| | - **GitHub Repository:** [chinsengi/dUltra-os](https://github.com/chinsengi/dUltra-os) |
| |
|
| | ## Sample Usage |
| |
|
| | To use a trained dUltra model, you can use the following code snippet from the official repository. Note that `trust_remote_code=True` is required for the custom architecture. |
| |
|
| | ```python |
| | import torch |
| | from model.llada.lladou import LLaDOUModelLM |
| | from transformers import AutoTokenizer |
| | |
| | model = LLaDOUModelLM.from_pretrained( |
| | "sengi/dUltra-math", |
| | trust_remote_code=True, |
| | torch_dtype=torch.bfloat16, |
| | ) |
| | tokenizer = AutoTokenizer.from_pretrained("sengi/dUltra-math") |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{chen2025dultraultrafastdiffusionlanguage, |
| | title={dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning}, |
| | author={Shirui Chen and Jiantao Jiao and Lillian J. Ratliff and Banghua Zhu}, |
| | year={2025}, |
| | eprint={2512.21446}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.LG}, |
| | url={https://arxiv.org/abs/2512.21446}, |
| | } |
| | ``` |