shubhamprshr commited on
Commit
bf0939c
·
verified ·
1 Parent(s): c2aa296

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -1
README.md CHANGED
@@ -4,4 +4,61 @@ language:
4
  - en
5
  tags:
6
  - DLLM
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - en
5
  tags:
6
  - DLLM
7
+ - diffusion-language-model
8
+ - on-policy-distillation
9
+ - post-training
10
+ library_name: transformers
11
+ pipeline_tag: text-generation
12
+ base_model: Qwen/Qwen3-8B
13
+ datasets:
14
+ - divelab/opdlm_train_data
15
+ ---
16
+
17
+ # OPDLM-8B
18
+
19
+ OPDLM-8B is a block diffusion language model (DLM) obtained by post-training an
20
+ autoregressive language model (ARLM) into a diffusion language model via
21
+ **on-policy distillation**.
22
+
23
+ ## Highlights
24
+ - **Converted, not pretrained from scratch:** built from a strong ARLM, reusing its prior.
25
+ - **Training-efficient:** ~75M tokens of conversion vs. ~50B tokens for from-scratch DLM training (same base ARLM).
26
+ - **Inference-efficient:** parallel token decoding via block diffusion.
27
+
28
+ ## Model Details
29
+ - **Developed by:** [FILL: DIVE Lab, Texas A&M University]
30
+ - **Base model:** [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)
31
+ - **Model type:** Block diffusion language model (decoder-based)
32
+ - **Block size:** [FILL: e.g. 4]
33
+ - **Parameters:** ~8B
34
+ - **Language:** English
35
+ - **License:** MIT
36
+
37
+ ## Training
38
+ - **Method:** On-policy distillation from a frozen ARLM teacher into a block DLM student.
39
+ - **Conversion budget:** [CONFIRM: ~75M tokens]
40
+ - **Data:** [opdlm_train_data](https://huggingface.co/datasets/divelab/opdlm_train_data)
41
+
42
+ ## Evaluation
43
+ [CONFIRM all numbers — these are from our table for OPDLM-8B (non-thinking);
44
+ fill the thinking variant separately if releasing it]
45
+
46
+ | Benchmark | Score |
47
+ |-------------|-------|
48
+ | MMLU | 70.9 |
49
+ | MMLU-Pro | 53.7 |
50
+ | GPQA-Diamond| 36.1 |
51
+ | IFEval | 50.1 |
52
+ | GSM8K | 87.1 |
53
+ | MATH500 | 71.2 |
54
+ | AIME-24 | 14.7 |
55
+ | AIME-25 | 12.4 |
56
+ | HumanEval | 59.8 |
57
+ | MBPP | 48.7 |
58
+
59
+ Decoding: [FILL: static one-token-per-step / dynamic sampling — state which these are]
60
+
61
+ ## Citation
62
+ ```bibtex
63
+ [FILL: BibTeX once the paper/arXiv is up]
64
+ ```