| --- |
| pipeline_tag: text-generation |
| library_name: transformers |
| tags: |
| - diffusion |
| - reinforcement-learning |
| - grpo |
| --- |
| |
| # dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning |
|
|
| dUltra is an on-policy reinforcement learning framework based on Group Relative Policy Optimization (GRPO) that learns unmasking strategies for efficient parallel decoding in masked diffusion language models (MDLMs). |
|
|
| By training an unmasking planner head that predicts per-token unmasking likelihoods, dUltra enables better exploitation of parallel generation. Across mathematical reasoning and code generation tasks, it achieves superior accuracy-efficiency trade-offs compared to state-of-the-art heuristic and distillation baselines. |
|
|
| - **Paper:** [dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning](https://huggingface.co/papers/2512.21446) |
| - **Repository:** [https://github.com/chinsengi/dUltra-os](https://github.com/chinsengi/dUltra-os) |
|
|
| ## Sample Usage |
|
|
| To use this model, you can load it using the following code snippet from the official repository (requires `trust_remote_code=True`): |
|
|
| ```python |
| import torch |
| from model.llada.lladou import LLaDOUModelLM |
| from transformers import AutoTokenizer |
| |
| model = LLaDOUModelLM.from_pretrained( |
| "sengi/dUltra-math", |
| trust_remote_code=True, |
| torch_dtype=torch.bfloat16, |
| ) |
| tokenizer = AutoTokenizer.from_pretrained("sengi/dUltra-math") |
| ``` |
|
|
| ## Citation |
|
|
| If you find this work useful, please cite: |
|
|
| ```bibtex |
| @misc{chen2025dultraultrafastdiffusionlanguage, |
| title={dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning}, |
| author={Shirui Chen and Jiantao Jiao and Lillian J. Ratliff and Banghua Zhu}, |
| year={2025}, |
| eprint={2512.21446}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.LG}, |
| url={https://arxiv.org/abs/2512.21446}, |
| } |
| ``` |