iLLaDA-8B-Instruct

iLLaDA is an 8B fully bidirectional masked diffusion language model trained from scratch with 12T pre-training tokens, an 8192-token context length, variable-length generation, and confidence-based scoring for multiple-choice evaluation. Inference and evaluation codes: https://github.com/ML-GSAI/LLaDA.

Architecture

iLLaDA 8B LLaDA 8B
Layers 32 32
Model dimension 4096 4096
Attention heads 32 32
Key/Value heads 8 32
FFN dimension 14,336 12,288
Vocabulary size 155,136 126,464
Maximum sequence length 8192 4096
Embedding and LM-head Tied Untied
Total parameters 7.62B 8.02B
Non-embedding parameters 6.98B 6.98B

Benchmark Results of Instruct Models

iLLaDA 8B LLaDA 8B Dream 7B Qwen2.5 7B
Model Diffusion Diffusion Diffusion AR
MMLU 71.6 65.5 67.0 76.6
MMLU-Pro 52.3 37.0 43.3 56.3
MMLU-Redux 76.4 68.9 76.3 75.7
GSM8K 89.0 77.5 81.0 91.6
MATH 56.7 42.2 39.2 75.5
HumanEval 65.9 49.4 55.5 84.8
MBPP 58.0 41.0 58.8 79.2
Average 67.1 54.5 60.2 77.1
Downloads last month
32
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support