iLLaDA-8B-Instruct

iLLaDA is an 8B fully bidirectional masked diffusion language model trained from scratch with 12T pre-training tokens, an 8192-token context length, variable-length generation, and confidence-based scoring for multiple-choice evaluation. Inference and evaluation codes: https://github.com/ML-GSAI/LLaDA.

Architecture

	iLLaDA 8B	LLaDA 8B
Layers	32	32
Model dimension	4096	4096
Attention heads	32	32
Key/Value heads	8	32
FFN dimension	14,336	12,288
Vocabulary size	155,136	126,464
Maximum sequence length	8192	4096
Embedding and LM-head	Tied	Untied
Total parameters	7.62B	8.02B
Non-embedding parameters	6.98B	6.98B

Benchmark Results of Instruct Models

	iLLaDA 8B	LLaDA 8B	Dream 7B	Qwen2.5 7B
Model	Diffusion	Diffusion	Diffusion	AR
MMLU	71.6	65.5	67.0	76.6
MMLU-Pro	52.3	37.0	43.3	56.3
MMLU-Redux	76.4	68.9	76.3	75.7
GSM8K	89.0	77.5	81.0	91.6
MATH	56.7	42.2	39.2	75.5
HumanEval	65.9	49.4	55.5	84.8
MBPP	58.0	41.0	58.8	79.2
Average	67.1	54.5	60.2	77.1

Downloads last month: 32

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support