sengi
/

dUltra-coding-b128

Model card Files Files and versions

dUltra-coding-b128 / README.md

nielsr's picture

nielsr HF Staff

Add model card and metadata

eb40b9b verified about 1 month ago

|

1.86 kB

	---
	library_name: transformers
	pipeline_tag: text-generation
	---

	# dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning

	dUltra is an on-policy reinforcement learning framework based on Group Relative Policy Optimization (GRPO) that learns unmasking strategies for efficient parallel decoding in Masked Diffusion Language Models (MDLMs).

	Existing acceleration methods for MDLMs often rely on fixed heuristics or distillation. dUltra introduces an unmasking planner head that predicts per-token unmasking likelihoods, allowing the model to learn task-specific unmasking trajectories. This approach achieves superior accuracy-efficiency trade-offs on mathematical reasoning and code generation tasks.

	- Paper: [dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning](https://huggingface.co/papers/2512.21446)
	- GitHub Repository: [chinsengi/dUltra-os](https://github.com/chinsengi/dUltra-os)

	## Sample Usage

	To use a trained dUltra model, you can use the following code snippet from the official repository. Note that `trust_remote_code=True` is required for the custom architecture.

	```python
	import torch
	from model.llada.lladou import LLaDOUModelLM
	from transformers import AutoTokenizer

	model = LLaDOUModelLM.from_pretrained(
	"sengi/dUltra-math",
	trust_remote_code=True,
	torch_dtype=torch.bfloat16,
	)
	tokenizer = AutoTokenizer.from_pretrained("sengi/dUltra-math")
	```

	## Citation

	```bibtex
	@misc{chen2025dultraultrafastdiffusionlanguage,
	title={dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning},
	author={Shirui Chen and Jiantao Jiao and Lillian J. Ratliff and Banghua Zhu},
	year={2025},
	eprint={2512.21446},
	archivePrefix={arXiv},
	primaryClass={cs.LG},
	url={https://arxiv.org/abs/2512.21446},
	}
	```