dUltra-coding-b128 / README.md
nielsr's picture
nielsr HF Staff
Add model card and metadata
eb40b9b verified
|
raw
history blame
1.86 kB
metadata
library_name: transformers
pipeline_tag: text-generation

dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning

dUltra is an on-policy reinforcement learning framework based on Group Relative Policy Optimization (GRPO) that learns unmasking strategies for efficient parallel decoding in Masked Diffusion Language Models (MDLMs).

Existing acceleration methods for MDLMs often rely on fixed heuristics or distillation. dUltra introduces an unmasking planner head that predicts per-token unmasking likelihoods, allowing the model to learn task-specific unmasking trajectories. This approach achieves superior accuracy-efficiency trade-offs on mathematical reasoning and code generation tasks.

Sample Usage

To use a trained dUltra model, you can use the following code snippet from the official repository. Note that trust_remote_code=True is required for the custom architecture.

import torch
from model.llada.lladou import LLaDOUModelLM
from transformers import AutoTokenizer

model = LLaDOUModelLM.from_pretrained(
            "sengi/dUltra-math",
            trust_remote_code=True,
            torch_dtype=torch.bfloat16,
        )
tokenizer = AutoTokenizer.from_pretrained("sengi/dUltra-math")

Citation

@misc{chen2025dultraultrafastdiffusionlanguage,
      title={dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning},
      author={Shirui Chen and Jiantao Jiao and Lillian J. Ratliff and Banghua Zhu},
      year={2025},
      eprint={2512.21446},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2512.21446},
}