nielsr HF Staff commited on
Commit
7c0045b
·
verified ·
1 Parent(s): ec9133d

Add model card and metadata for dUltra

Browse files

Hi! I'm Niels from the Hugging Face community science team.

This PR adds a model card for dUltra, an on-policy reinforcement learning framework for masked diffusion language models.

The PR includes:
- Relevant metadata (pipeline tag and library name).
- Links to the paper and the official GitHub repository.
- A summary of the model's architecture and training approach.
- A sample usage code snippet based on the official README.
- BibTeX citation information.

Please feel free to merge if this looks good to you!

Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-generation
3
+ library_name: transformers
4
+ tags:
5
+ - diffusion
6
+ - reinforcement-learning
7
+ - grpo
8
+ ---
9
+
10
+ # dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning
11
+
12
+ dUltra is an on-policy reinforcement learning framework based on Group Relative Policy Optimization (GRPO) that learns unmasking strategies for efficient parallel decoding in masked diffusion language models (MDLMs).
13
+
14
+ By training an unmasking planner head that predicts per-token unmasking likelihoods, dUltra enables better exploitation of parallel generation. Across mathematical reasoning and code generation tasks, it achieves superior accuracy-efficiency trade-offs compared to state-of-the-art heuristic and distillation baselines.
15
+
16
+ - **Paper:** [dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning](https://huggingface.co/papers/2512.21446)
17
+ - **Repository:** [https://github.com/chinsengi/dUltra-os](https://github.com/chinsengi/dUltra-os)
18
+
19
+ ## Sample Usage
20
+
21
+ To use this model, you can load it using the following code snippet from the official repository (requires `trust_remote_code=True`):
22
+
23
+ ```python
24
+ import torch
25
+ from model.llada.lladou import LLaDOUModelLM
26
+ from transformers import AutoTokenizer
27
+
28
+ model = LLaDOUModelLM.from_pretrained(
29
+ "sengi/dUltra-math",
30
+ trust_remote_code=True,
31
+ torch_dtype=torch.bfloat16,
32
+ )
33
+ tokenizer = AutoTokenizer.from_pretrained("sengi/dUltra-math")
34
+ ```
35
+
36
+ ## Citation
37
+
38
+ If you find this work useful, please cite:
39
+
40
+ ```bibtex
41
+ @misc{chen2025dultraultrafastdiffusionlanguage,
42
+ title={dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning},
43
+ author={Shirui Chen and Jiantao Jiao and Lillian J. Ratliff and Banghua Zhu},
44
+ year={2025},
45
+ eprint={2512.21446},
46
+ archivePrefix={arXiv},
47
+ primaryClass={cs.LG},
48
+ url={https://arxiv.org/abs/2512.21446},
49
+ }
50
+ ```