shubhamprshr commited on
Commit
fab2ee0
·
verified ·
1 Parent(s): bf0939c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -7
README.md CHANGED
@@ -22,26 +22,24 @@ autoregressive language model (ARLM) into a diffusion language model via
22
 
23
  ## Highlights
24
  - **Converted, not pretrained from scratch:** built from a strong ARLM, reusing its prior.
25
- - **Training-efficient:** ~75M tokens of conversion vs. ~50B tokens for from-scratch DLM training (same base ARLM).
26
  - **Inference-efficient:** parallel token decoding via block diffusion.
27
 
28
  ## Model Details
29
- - **Developed by:** [FILL: DIVE Lab, Texas A&M University]
30
  - **Base model:** [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)
31
  - **Model type:** Block diffusion language model (decoder-based)
32
- - **Block size:** [FILL: e.g. 4]
33
  - **Parameters:** ~8B
34
  - **Language:** English
35
  - **License:** MIT
36
 
37
  ## Training
38
  - **Method:** On-policy distillation from a frozen ARLM teacher into a block DLM student.
39
- - **Conversion budget:** [CONFIRM: ~75M tokens]
40
  - **Data:** [opdlm_train_data](https://huggingface.co/datasets/divelab/opdlm_train_data)
41
 
42
  ## Evaluation
43
- [CONFIRM all numbers — these are from our table for OPDLM-8B (non-thinking);
44
- fill the thinking variant separately if releasing it]
45
 
46
  | Benchmark | Score |
47
  |-------------|-------|
@@ -56,7 +54,7 @@ fill the thinking variant separately if releasing it]
56
  | HumanEval | 59.8 |
57
  | MBPP | 48.7 |
58
 
59
- Decoding: [FILL: static one-token-per-step / dynamic sampling — state which these are]
60
 
61
  ## Citation
62
  ```bibtex
 
22
 
23
  ## Highlights
24
  - **Converted, not pretrained from scratch:** built from a strong ARLM, reusing its prior.
25
+ - **Training-efficient:** ~0.066B tokens of conversion vs. ~50B tokens for from-scratch DLM training (same base ARLM).
26
  - **Inference-efficient:** parallel token decoding via block diffusion.
27
 
28
  ## Model Details
29
+ - **Developed by:** DIVE Lab, Texas A&M University
30
  - **Base model:** [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)
31
  - **Model type:** Block diffusion language model (decoder-based)
32
+ - **Block size:** 4
33
  - **Parameters:** ~8B
34
  - **Language:** English
35
  - **License:** MIT
36
 
37
  ## Training
38
  - **Method:** On-policy distillation from a frozen ARLM teacher into a block DLM student.
39
+ - **Conversion budget:** ~0.066B tokens
40
  - **Data:** [opdlm_train_data](https://huggingface.co/datasets/divelab/opdlm_train_data)
41
 
42
  ## Evaluation
 
 
43
 
44
  | Benchmark | Score |
45
  |-------------|-------|
 
54
  | HumanEval | 59.8 |
55
  | MBPP | 48.7 |
56
 
57
+ Decoding: static (one token per step)
58
 
59
  ## Citation
60
  ```bibtex