--- license: mit language: - en tags: - DLLM - diffusion-language-model - on-policy-distillation - post-training library_name: transformers pipeline_tag: text-generation base_model: Qwen/Qwen3-0.6B datasets: - divelab/opdlm_train_data arxiv: 2606.06712 --- # OPDLM-0.6B OPDLM-0.6B is a block diffusion language model (DLM) obtained by post-training an autoregressive language model (ARLM) into a diffusion language model via **on-policy distillation**. arXiv report: [arxiv.org/abs/2606.06712](https://arxiv.org/abs/2606.06712) ## Highlights - **Converted, not pretrained from scratch:** built from a strong ARLM, reusing its prior. - **Training-efficient:** orders of magnitude fewer tokens than from-scratch DLM training (same base ARLM). - **Inference-efficient:** parallel token decoding via block diffusion. ## Model Details - **Developed by:** DIVE Lab, Texas A&M University - **Base model:** [Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) - **Model type:** Block diffusion language model (decoder-based) - **Block size:** 4 - **Parameters:** ~0.6B - **Language:** English - **License:** MIT ## Training - **Method:** On-policy distillation from a frozen ARLM teacher into a block DLM student. - **Conversion budget:** ~B tokens - **Data:** [opdlm_train_data](https://huggingface.co/datasets/divelab/opdlm_train_data) ## Results For detailed results and benchmarks, please refer to our paper: [arxiv.org/abs/2606.06712](https://arxiv.org/abs/2606.06712) ## Citation ```bibtex @misc{su2026dataefficientautoregressivetodiffusionlanguagemodels, title={Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation}, author={Xingyu Su and Jacob Helwig and Shubham Parashar and Atharv Chagi and Lakshmi Jotsna and Degui Zhi and James Caverlee and Dileep Kalathil and Shuiwang Ji}, year={2026}, eprint={2606.06712}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2606.06712}, } ```