divelab
/

OPDLM-8B

Text Generation

diffusion-language-model

on-policy-distillation

Model card Files Files and versions

shubhamprshr commited on 6 days ago

Commit

bf0939c

·

verified ·

1 Parent(s): c2aa296

Update README.md

Files changed (1) hide show

README.md +58 -1

README.md CHANGED Viewed

@@ -4,4 +4,61 @@ language:
 - en
 tags:
 - DLLM
----

 - en
 tags:
 - DLLM
+- diffusion-language-model
+- on-policy-distillation
+- post-training
+library_name: transformers
+pipeline_tag: text-generation
+base_model: Qwen/Qwen3-8B
+datasets:
+- divelab/opdlm_train_data
+---
+# OPDLM-8B
+OPDLM-8B is a block diffusion language model (DLM) obtained by post-training an
+autoregressive language model (ARLM) into a diffusion language model via
+**on-policy distillation**.
+## Highlights
+- **Converted, not pretrained from scratch:** built from a strong ARLM, reusing its prior.
+- **Training-efficient:** ~75M tokens of conversion vs. ~50B tokens for from-scratch DLM training (same base ARLM).
+- **Inference-efficient:** parallel token decoding via block diffusion.
+## Model Details
+- **Developed by:** [FILL: DIVE Lab, Texas A&M University]
+- **Base model:** [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)
+- **Model type:** Block diffusion language model (decoder-based)
+- **Block size:** [FILL: e.g. 4]
+- **Parameters:** ~8B
+- **Language:** English
+- **License:** MIT
+## Training
+- **Method:** On-policy distillation from a frozen ARLM teacher into a block DLM student.
+- **Conversion budget:** [CONFIRM: ~75M tokens]
+- **Data:** [opdlm_train_data](https://huggingface.co/datasets/divelab/opdlm_train_data)
+## Evaluation
+[CONFIRM all numbers — these are from our table for OPDLM-8B (non-thinking);
+fill the thinking variant separately if releasing it]
+| Benchmark   | Score |
+|-------------|-------|
+| MMLU        | 70.9  |
+| MMLU-Pro    | 53.7  |
+| GPQA-Diamond| 36.1  |
+| IFEval      | 50.1  |
+| GSM8K       | 87.1  |
+| MATH500     | 71.2  |
+| AIME-24     | 14.7  |
+| AIME-25     | 12.4  |
+| HumanEval   | 59.8  |
+| MBPP        | 48.7  |
+Decoding: [FILL: static one-token-per-step / dynamic sampling — state which these are]
+## Citation
+```bibtex
+[FILL: BibTeX once the paper/arXiv is up]
+```