shubhamprshr commited on
Commit
e6d44e2
·
verified ·
1 Parent(s): f89d485

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md CHANGED
@@ -12,4 +12,45 @@ pipeline_tag: text-generation
12
  base_model: Qwen/Qwen3-0.6B
13
  datasets:
14
  - divelab/opdlm_train_data
 
15
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  base_model: Qwen/Qwen3-0.6B
13
  datasets:
14
  - divelab/opdlm_train_data
15
+ arxiv: 2606.06712
16
  ---
17
+ # OPDLM-0.6B
18
+
19
+ OPDLM-0.6B is a block diffusion language model (DLM) obtained by post-training an
20
+ autoregressive language model (ARLM) into a diffusion language model via
21
+ **on-policy distillation**. arXiv report: [arxiv.org/abs/2606.06712](https://arxiv.org/abs/2606.06712)
22
+
23
+ ## Highlights
24
+ - **Converted, not pretrained from scratch:** built from a strong ARLM, reusing its prior.
25
+ - **Training-efficient:** orders of magnitude fewer tokens than from-scratch DLM training (same base ARLM).
26
+ - **Inference-efficient:** parallel token decoding via block diffusion.
27
+
28
+ ## Model Details
29
+ - **Developed by:** DIVE Lab, Texas A&M University
30
+ - **Base model:** [Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)
31
+ - **Model type:** Block diffusion language model (decoder-based)
32
+ - **Block size:** 4
33
+ - **Parameters:** ~0.6B
34
+ - **Language:** English
35
+ - **License:** MIT
36
+
37
+ ## Training
38
+ - **Method:** On-policy distillation from a frozen ARLM teacher into a block DLM student.
39
+ - **Conversion budget:** ~<fill in>B tokens
40
+ - **Data:** [opdlm_train_data](https://huggingface.co/datasets/divelab/opdlm_train_data)
41
+
42
+ ## Results
43
+ For detailed results and benchmarks, please refer to our paper: [arxiv.org/abs/2606.06712](https://arxiv.org/abs/2606.06712)
44
+
45
+ ## Citation
46
+ ```bibtex
47
+ @misc{su2026dataefficientautoregressivetodiffusionlanguagemodels,
48
+ title={Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation},
49
+ author={Xingyu Su and Jacob Helwig and Shubham Parashar and Atharv Chagi and Lakshmi Jotsna and Degui Zhi and James Caverlee and Dileep Kalathil and Shuiwang Ji},
50
+ year={2026},
51
+ eprint={2606.06712},
52
+ archivePrefix={arXiv},
53
+ primaryClass={cs.CL},
54
+ url={https://arxiv.org/abs/2606.06712},
55
+ }
56
+ ```