dUltra-coding / README.md
nielsr's picture
nielsr HF Staff
Add comprehensive model card for dUltra
f76290d verified
|
raw
history blame
2.21 kB
metadata
library_name: transformers
pipeline_tag: text-generation

dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning

dUltra is an on-policy reinforcement learning framework based on Group Relative Policy Optimization (GRPO) that learns unmasking strategies for efficient parallel decoding in Masked Diffusion Language Models (MDLMs). By training an unmasking planner head, dUltra enables diffusion language models to achieve state-of-the-art performance in terms of accuracy and efficiency trade-offs.

Model Description

Masked diffusion language models offer the potential for parallel token generation. dUltra introduces an unmasking planner head that predicts per-token unmasking likelihoods under independent Bernoulli distributions. The framework jointly optimizes the base diffusion LLM and the unmasking order planner using reward signals combining verifiable reward, distillation reward, and the number of unmasking steps. dUltra achieves superior accuracy-efficiency trade-offs across mathematical reasoning and code generation tasks.

Usage

To use the dUltra model, you can load it with the transformers library. Note that trust_remote_code=True is required to load the custom model architecture.

import torch
from model.llada.lladou import LLaDOUModelLM
from transformers import AutoTokenizer

model = LLaDOUModelLM.from_pretrained(
            "sengi/dUltra-math",
            trust_remote_code=True,
            torch_dtype=torch.bfloat16,
        )
tokenizer = AutoTokenizer.from_pretrained("sengi/dUltra-math")

Citation

@misc{chen2025dultraultrafastdiffusionlanguage,
      title={dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning},
      author={Shirui Chen and Jiantao Jiao and Lillian J. Ratliff and Banghua Zhu},
      year={2025},
      eprint={2512.21446},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2512.21446},
}