File size: 8,571 Bytes
0c40661
 
 
 
 
 
 
 
2a86ac6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0c40661
 
 
 
 
 
 
 
 
 
 
 
 
6c1e52f
0c40661
 
 
 
6c1e52f
0c40661
 
6c1e52f
0c40661
 
 
 
6c1e52f
0c40661
 
 
 
 
 
 
6c1e52f
0c40661
 
 
 
6c1e52f
0c40661
 
 
 
 
 
 
 
cbc4f97
0c40661
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
---
license: apache-2.0
library_name: transformers
tags:
- dllm
- diffusion
- llm
- text_generation
model-index:
  - name: LLaDA2.0-flash
    results:
      - task:
          name: Text Generation
          type: text-generation
        dataset:
          name: Benchmarks
          type: benchmarks
        metrics:
          - name: Average
            type: average
            value: 79.32

          # Knowledge
          - name: MMLU
            type: mmlu
            value: 87.69
          - name: MMLU-Pro
            type: mmlu-pro
            value: 73.36
          - name: GPQA
            type: gpqa
            value: 61.98
          - name: ARC-C
            type: arc-c
            value: 95.93
          - name: CMMLU
            type: cmmlu
            value: 85.13
          - name: C-EVAL
            type: c-eval
            value: 86.75
          - name: GAOKAO-Bench
            type: gaokao-bench
            value: 93.90

          # Reasoning
          - name: SQuAD 2.0
            type: squad-v2
            value: 90.00
          - name: DROP
            type: drop
            value: 87.90
          - name: KOR-Bench
            type: kor-bench
            value: 64.24
          - name: HellaSwag
            type: hellaswag
            value: 84.97

          # Coding
          - name: CRUXEval-O
            type: cruxeval-o
            value: 85.12
          - name: MBPP
            type: mbpp
            value: 88.29
          - name: MultiPL-E
            type: multipl-e
            value: 74.87
          - name: HumanEval
            type: humaneval
            value: 94.51
          - name: Bigcodebench-Full
            type: bigcodebench-full
            value: 41.58
          - name: LiveCodeBench
            type: livecodebench
            value: 42.29
          - name: Spider
            type: spider
            value: 82.49

          # Math
          - name: GSM8K
            type: gsm8k
            value: 96.06
          - name: MATH
            type: math
            value: 95.44
          - name: OlympiadBench
            type: olympiadbench
            value: 74.07
          - name: AIME 2025
            type: aime-2025
            value: 60.00

          # Agent & Alignment
          - name: BFCL_Live
            type: bfcl_live
            value: 75.43
          - name: IFEval-strict -prompt
            type: ifeval-strict
            value: 81.70
---
# LLaDA2.0-flash

**LLaDA2.0-flash** is a diffusion language model featuring a 100BA6B Mixture-of-Experts (MoE) architecture. As an enhanced, instruction-tuned iteration of the LLaDA2.0 series, it is optimized for practical applications.

<div align="center">
  <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*uOo8QKQMiBwAAAAAgNAAAAgAemJ7AQ/original" width="800" />
</div>

---

| Benchmark | Qwen3-30B-A3B-Instruct-2507| Ling-flash-2.0 | LLaDA2.0-flash-preview | LLaDA2.0-flash |
| :---: | :---: | :---: | :---: | :---: |
| **Average** | 79.47 | 78.03 | 71.92 | 79.32 |
| **Knowledge** | | | | |
| MMLU | 87.13 | 87.98 | 83.15 | 87.69 |
| MMLU-Pro | 74.23 | 76.84 | 49.22 | 73.36 |
| GPQA | 57.34 | 67.12 | 46.59 | 61.98 |
| arc-c | 95.81 | 95.08 | 93.90 | 95.93 |
| CMMLU | 86.36 | 86.59 | 67.53 | 85.13 |
| C-EVAL | 88.17 | 88.03 | 66.54 | 86.75 |
| GAOKAO-Bench | 94.53 | 93.24 | 86.12 | 93.90 |
| **Reasoning** | | | | |
| SQuAD 2.0 | 89.51 | 81.32 | 85.61 | 90.00 |
| DROP | 87.57 | 88.32 | 79.49 | 87.90 |
| KOR-Bench | 68.00 | 68.96 | 37.26 | 64.24 |
| HellaSwag | 86.31 | 81.59 | 86.00 | 84.97 |
| **Coding** | | | | |
| CRUXEval-O | 86.75 | 82.75 | 61.88 | 85.12 |
| MBPP | 86.65 | 85.01 | 77.75 | 88.29 |
| MultiPL-E | 70.67 | 65.76 | 62.43 | 74.87 |
| HumanEval | 93.29 | 85.98 | 80.49 | 94.51 |
| Bigcodebench-Full | 41.49 | 40.70 | 30.44 | 41.58 |
| LiveCodeBench | 41.63 | 44.11 | 28.58 | 42.29 |
| Spider | 81.79 | 80.58 | 81.37 | 82.49 |
| **Math** | | | | |
| GSM8K | 96.36 | 95.45 | 89.01 | 96.06 |
| MATH | 96.70 | 96.1 | 73.50 | 95.44 |
| OlympiadBench | 77.59 | 76.19 | 47.78 | 74.07 |
| AIME 2025 | 61.88 | 55.89 | 23.33 | 60.00 |
| **Agent & Alignment** | | | | |
| BFCL_Live | 73.19 | 67.57 | 74.11 | 75.43 |
| IFEval-strict -prompt | 84.29 | 81.52 | 62.50 | 81.70 |



## πŸš€ Performance Highlights
+ **Leading MoE Architecture**:
The open-source **Mixture-of-Experts (MoE) diffusion large language model** continually trained on the Ling2.0 series with approximately **20 trillion tokens**.
+ **Efficient Inference**:
With **100 billion total parameters**, only **6.1 billion** are activated during inference. LLaDA2.0-flash significantly reduces computational costs while outperforming open-source dense models of similar scale.
+ **Impressive Performance on Code & Complex Reasoning**:
Excels in tasks such as **code generation** and **advanced mathematical reasoning**, demonstrating strong reasoning capabilities.
+ **Tool Use**:
Supports **tool calling** and achieves excellent performance in complex agent-based tasks.
+ **Open & Extensible**:
Fully open-source with commitment to transparency. We plan to release a **leading inference framework** in the future and continue investing in cutting-edge areas like **diffusion LLMs (dLLM)** to drive disruptive innovation.

## πŸ—ΊοΈ What's Next

+ **Supercharged Reasoning with LLaDA 2.0:** LLaDA 2.0 series will be fine-tuned with **Reinforcement Learning**, unlocking a new level of sophisticated reasoning and problem-solving abilities.
+ **Tools for Innovators:** The model was finetuned on the [dFactory](https://github.com/inclusionAI/dFactory) framework using Fully Sharded Data Parallel (FSDP2). We have begun open-sourcing dFactory and will continuously release our advanced post-training technologies. Whether you want to master the current model or build your own customized versions, you'll have the tools you need. Stay tuned for more updates!

---

## πŸ“¦ Model Variants
| Model ID | Description | Hugging Face Link |
| --- | --- | --- |
| `inclusionAI/LLaDA2.0-mini` | Instruction-tuned model, ready for downstream applications. | [πŸ€— Model Card](https://huggingface.co/inclusionAI/LLaDA2.0-mini) |
| `inclusionAI/LLaDA2.0-flash` | Instruction-tuned model, ready for downstream applications. | [πŸ€— Model Card](https://huggingface.co/inclusionAI/LLaDA2.0-flash) |


---

## πŸ” Model Overview
**LLaDA2.0-flash** has the following specifications:

+ **Type**: Mixture-of-Experts (MoE) Diffusion Language Model
+ **Total Parameters (Non-Embedding)**: 100B
+ **Number of Layers**: 32
+ **Attention Heads**: 32
+ **Context Length**: 32,768 tokens
+ **Position Embedding**: Rotary (RoPE)
+ **Vocabulary Size**: 157,184

---

### πŸ€— Hugging Face Transformers
Make sure you have `transformers` and its dependencies installed:

```python
import torch
import torch.nn.functional as F
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model_path = "/path/to/LLaDA2.0-mini-preview"
device = "auto"
model = AutoModelForCausalLM.from_pretrained(
    model_path, trust_remote_code=True, device_map=device
)
model = model.to(torch.bfloat16)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

prompt = "Why does Camus think that Sisyphus is happy?"
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    tokenize=True,
    return_tensors="pt",
)
generated_tokens = model.generate(
    inputs=input_ids,
    eos_early_stop=True,
    gen_length=512,
    block_length=32,
    steps=32,
    temperature=0.0,
)
generated_answer = tokenizer.decode(
    generated_tokens[0],
    skip_special_tokens=True,
)
print(generated_answer)
```

### Best Practices
To achieve optimal performance, we recommend the following settings:

1. **Sampling Parameters**:
   We suggest using `Temperature=0.0`, `block_length=32`, and `steps=32`. Using a higher temperature value may occasionally result in language mixing and a slight decrease in model performance.

2. **Adequate Output Length**:
   We recommend using an output length of 32768 tokens for most queries.

---

## 🌐 License
This project is licensed under the terms of the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).

---

## 🀝 Contact & Collaboration
For questions, collaborations, or feedback, please reach out via [Hugging Face](https://huggingface.co/inclusionAI/LLaDA2.0-flash) or open an issue in the [repository](https://github.com/inclusionAI).

πŸ‘‰ Join us in advancing open, efficient, and intelligent language models!