utdawn commited on
Commit
e763a7e
Β·
verified Β·
1 Parent(s): 849c58d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +440 -0
README.md ADDED
@@ -0,0 +1,440 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ tags:
5
+ - dllm
6
+ - diffusion
7
+ - llm
8
+ - text_generation
9
+ ---
10
+ # LLaDA2.1-flash
11
+
12
+ **LLaDA2.1-flash** is a diffusion language model of the LLaDA series featuring the editing enhancement. It significantly improves inference speed while delivering strong task performance.
13
+
14
+ <div align="center">
15
+ <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*uOo8QKQMiBwAAAAAgNAAAAgAemJ7AQ/original" width="800" />
16
+ </div>
17
+
18
+ <div align="center">
19
+ <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*biwvQpCmKjEAAAAAULAAAAgAemJ7AQ/original" width="800" />
20
+ </div>
21
+ ---
22
+ <table>
23
+ <thead>
24
+ <tr>
25
+ <th align="left"><b>Benchmark</b></th>
26
+ <th align="center"><b>Qwen3-30B-<br>A3B-Inst-2507</b><br><sub>(Score)</sub></th>
27
+ <th align="center"><b>Ling-flash-2.0</b><br><br><sub>(Score)</sub></th>
28
+ <th align="center"><b>LLaDA2.0-flash</b><br><br><sub>(Score | TPF)</sub></th>
29
+ <th align="center"><b>LLaDA2.1-flash<br>(S Mode)</b><br><sub>(Score | TPF)</sub></th>
30
+ <th align="center"><b>LLaDA2.1-flash<br>(Q Mode)</b><br><sub>(Score | TPF)</sub></th>
31
+ </tr>
32
+ </thead>
33
+ <tbody>
34
+ <tr>
35
+ <td align="left"><b>Average</b></td>
36
+ <td align="center">73.09</td>
37
+ <td align="center">71.52</td>
38
+ <td align="center">72.43 | 3.08</td>
39
+ <td align="center">72.34 | 5.93</td>
40
+ <td align="center">73.54 | 3.64</td>
41
+ </tr>
42
+ <tr>
43
+ <td colspan="6" align="center"><b>Knowledge</b></td>
44
+ </tr>
45
+ <tr>
46
+ <td align="left">GPQA</td>
47
+ <td align="center">54.14</td>
48
+ <td align="center">69.16</td>
49
+ <td align="center">62.31 | 3.29</td>
50
+ <td align="center">66.67 | 3.95</td>
51
+ <td align="center">67.30 | 2.37</td>
52
+ </tr>
53
+ <tr>
54
+ <td align="left">MMLU-Pro</td>
55
+ <td align="center">74.21</td>
56
+ <td align="center">77.55</td>
57
+ <td align="center">74.79 | 2.36</td>
58
+ <td align="center">75.31 | 4.43</td>
59
+ <td align="center">76.59 | 2.62</td>
60
+ </tr>
61
+ <tr>
62
+ <td align="left">C-EVAL</td>
63
+ <td align="center">88.12</td>
64
+ <td align="center">87.54</td>
65
+ <td align="center">85.21 | 1.90</td>
66
+ <td align="center">86.93 | 2.71</td>
67
+ <td align="center">86.71 | 1.75</td>
68
+ </tr>
69
+ <tr>
70
+ <td align="left">PHYBench</td>
71
+ <td align="center">29.84</td>
72
+ <td align="center">27.67</td>
73
+ <td align="center">30.06 | 2.70</td>
74
+ <td align="center">26.04 | 4.10</td>
75
+ <td align="center">28.23 | 2.66</td>
76
+ </tr>
77
+ <tr>
78
+ <td align="left">TriviaQA</td>
79
+ <td align="center">65.61</td>
80
+ <td align="center">69.76</td>
81
+ <td align="center">66.88 | 1.94</td>
82
+ <td align="center">72.55 | 4.30</td>
83
+ <td align="center">72.93 | 2.92</td>
84
+ </tr>
85
+ <tr>
86
+ <td colspan="6" align="center"><b>Reasoning</b></td>
87
+ </tr>
88
+ <tr>
89
+ <td align="left">BIG-Bench Hard</td>
90
+ <td align="center">85.54</td>
91
+ <td align="center">89.36</td>
92
+ <td align="center">86.75 | 2.66</td>
93
+ <td align="center">87.82 | 5.61</td>
94
+ <td align="center">88.69 | 3.28</td>
95
+ </tr>
96
+ <tr>
97
+ <td align="left">BIG-Bench Extra Hard</td>
98
+ <td align="center">37.80</td>
99
+ <td align="center">23.24</td>
100
+ <td align="center">27.86 | 4.60</td>
101
+ <td align="center">33.51 | 5.04</td>
102
+ <td align="center">35.77 | 3.17</td>
103
+ </tr>
104
+ <tr>
105
+ <td align="left">bbh-zh</td>
106
+ <td align="center">86.18</td>
107
+ <td align="center">75.09</td>
108
+ <td align="center">87.52 | 3.21</td>
109
+ <td align="center">82.55 | 5.78</td>
110
+ <td align="center">86.23 | 3.77</td>
111
+ </tr>
112
+ <tr>
113
+ <td align="left">MuSR</td>
114
+ <td align="center">79.15</td>
115
+ <td align="center">82.72</td>
116
+ <td align="center">82.72 | 1.70</td>
117
+ <td align="center">80.10 | 2.90</td>
118
+ <td align="center">79.84 | 1.85</td>
119
+ </tr>
120
+ <tr>
121
+ <td align="left">ZebraLogic</td>
122
+ <td align="center">90.97</td>
123
+ <td align="center">87.60</td>
124
+ <td align="center">82.30 | 2.74</td>
125
+ <td align="center">84.20 | 5.80</td>
126
+ <td align="center">88.90 | 3.26</td>
127
+ </tr>
128
+ <tr>
129
+ <td align="left">PrOntoQA</td>
130
+ <td align="center">97.12</td>
131
+ <td align="center">97.88</td>
132
+ <td align="center">96.50 | 2.64</td>
133
+ <td align="center">95.00 | 9.23</td>
134
+ <td align="center">97.00 | 5.73</td>
135
+ </tr>
136
+ <tr>
137
+ <td align="left">PIQA</td>
138
+ <td align="center">91.57</td>
139
+ <td align="center">91.95</td>
140
+ <td align="center">96.50 | 1.43</td>
141
+ <td align="center">92.44 | 2.38</td>
142
+ <td align="center">92.17 | 1.44</td>
143
+ </tr>
144
+ <tr>
145
+ <td align="left">OCNLI</td>
146
+ <td align="center">71.59</td>
147
+ <td align="center">65.36</td>
148
+ <td align="center">71.63 | 1.09</td>
149
+ <td align="center">72.17 | 1.83</td>
150
+ <td align="center">72.75 | 1.32</td>
151
+ </tr>
152
+ <tr>
153
+ <td align="left">HellaSwag</td>
154
+ <td align="center">86.31</td>
155
+ <td align="center">81.59</td>
156
+ <td align="center">84.97 | 1.26</td>
157
+ <td align="center">85.60 | 2.31</td>
158
+ <td align="center">85.31 | 1.51</td>
159
+ </tr>
160
+ <tr>
161
+ <td align="left">KOR-Bench</td>
162
+ <td align="center">69.2</td>
163
+ <td align="center">69.44</td>
164
+ <td align="center">63.04 | 3.44</td>
165
+ <td align="center">62.80 | 4.97</td>
166
+ <td align="center">65.12 | 2.77</td>
167
+ </tr>
168
+ <tr>
169
+ <td align="left">DROP</td>
170
+ <td align="center">87.57</td>
171
+ <td align="center">88.32</td>
172
+ <td align="center">87.90 | 2.26</td>
173
+ <td align="center">87.55 | 5.40</td>
174
+ <td align="center">87.86 | 2.53</td>
175
+ </tr>
176
+ <tr>
177
+ <td align="left">SQuAD 2.0</td>
178
+ <td align="center">89.51</td>
179
+ <td align="center">81.32</td>
180
+ <td align="center">90.00 | 3.10</td>
181
+ <td align="center">90.65 | 5.01</td>
182
+ <td align="center">90.80 | 3.90</td>
183
+ </tr>
184
+ <tr>
185
+ <td colspan="6" align="center"><b>Coding</b></td>
186
+ </tr>
187
+ <tr>
188
+ <td align="left">LiveCodeBench</td>
189
+ <td align="center">46.42</td>
190
+ <td align="center">52.48</td>
191
+ <td align="center">42.51 | 4.23</td>
192
+ <td align="center">44.05 | 6.48</td>
193
+ <td align="center">45.37 | 3.80</td>
194
+ </tr>
195
+ <tr>
196
+ <td align="left">CRUXEval-O</td>
197
+ <td align="center">86.75</td>
198
+ <td align="center">82.75</td>
199
+ <td align="center">85.12 | 3.21</td>
200
+ <td align="center">85.25 | 6.54</td>
201
+ <td align="center">87.50 | 3.80</td>
202
+ </tr>
203
+ <tr>
204
+ <td align="left">MBPP+</td>
205
+ <td align="center">78.21</td>
206
+ <td align="center">80.89</td>
207
+ <td align="center">79.37 | 4.02</td>
208
+ <td align="center">76.72 | 10.43</td>
209
+ <td align="center">77.25 | 5.96</td>
210
+ </tr>
211
+ <tr>
212
+ <td align="left">HumanEval+</td>
213
+ <td align="center">87.88</td>
214
+ <td align="center">87.58</td>
215
+ <td align="center">88.41 | 6.45</td>
216
+ <td align="center">89.63 | 13.81</td>
217
+ <td align="center">89.63 | 9.18</td>
218
+ </tr>
219
+ <tr>
220
+ <td align="left">MultiPL-E</td>
221
+ <td align="center">70.67</td>
222
+ <td align="center">65.76</td>
223
+ <td align="center">74.87 | 3.14</td>
224
+ <td align="center">70.89 | 7.77</td>
225
+ <td align="center">73.34 | 4.33</td>
226
+ </tr>
227
+ <tr>
228
+ <td align="left">BigCodeBench-Full</td>
229
+ <td align="center">41.49</td>
230
+ <td align="center">40.70</td>
231
+ <td align="center">41.58 | 3.33</td>
232
+ <td align="center">37.11 | 8.51</td>
233
+ <td align="center">39.21 | 4.70</td>
234
+ </tr>
235
+ <tr>
236
+ <td align="left">BIRD-SQL</td>
237
+ <td align="center">47.75</td>
238
+ <td align="center">47.49</td>
239
+ <td align="center">45.76 | 2.16</td>
240
+ <td align="center">42.18 | 5.09</td>
241
+ <td align="center">44.04 | 2.95</td>
242
+ </tr>
243
+ <tr>
244
+ <td align="left">Spider</td>
245
+ <td align="center">81.79</td>
246
+ <td align="center">80.58</td>
247
+ <td align="center">82.49 | 4.42</td>
248
+ <td align="center">79.18 | 8.74</td>
249
+ <td align="center">81.04 | 5.70</td>
250
+ </tr>
251
+ <tr>
252
+ <td colspan="6" align="center"><b>Math</b></td>
253
+ </tr>
254
+ <tr>
255
+ <td align="left">AIME 2025</td>
256
+ <td align="center">61.88</td>
257
+ <td align="center">55.89</td>
258
+ <td align="center">60.00 | 4.57</td>
259
+ <td align="center">63.33 | 5.36</td>
260
+ <td align="center">63.33 | 3.46</td>
261
+ </tr>
262
+ <tr>
263
+ <td align="left">OlympiadBench</td>
264
+ <td align="center">77.59</td>
265
+ <td align="center">76.19</td>
266
+ <td align="center">74.07 | 3.70</td>
267
+ <td align="center">75.85 | 6.46</td>
268
+ <td align="center">76.59 | 3.81</td>
269
+ </tr>
270
+ <tr>
271
+ <td align="left">GSM-Plus</td>
272
+ <td align="center">89.41</td>
273
+ <td align="center">89.71</td>
274
+ <td align="center">89.74 | 2.68</td>
275
+ <td align="center">89.23 | 7.14</td>
276
+ <td align="center">89.69 | 3.83</td>
277
+ </tr>
278
+ <tr>
279
+ <td align="left">CMATH</td>
280
+ <td align="center">96.58</td>
281
+ <td align="center">96.52</td>
282
+ <td align="center">96.90 | 2.17</td>
283
+ <td align="center">96.54 | 4.84</td>
284
+ <td align="center">96.63 | 2.65</td>
285
+ </tr>
286
+ <tr>
287
+ <td align="left">Omni-MATH</td>
288
+ <td align="center">54.00</td>
289
+ <td align="center">53.00</td>
290
+ <td align="center">50.30 | 3.39</td>
291
+ <td align="center">52.30 | 6.01</td>
292
+ <td align="center">54.10 | 3.50</td>
293
+ </tr>
294
+ <tr>
295
+ <td colspan="6" align="center"><b>Agent & Alignment</b></td>
296
+ </tr>
297
+ <tr>
298
+ <td align="left">IFEval-strict-prompt</td>
299
+ <td align="center">83.73</td>
300
+ <td align="center">81.15</td>
301
+ <td align="center">82.62 | 1.47</td>
302
+ <td align="center">83.36 | 2.24</td>
303
+ <td align="center">83.55 | 1.41</td>
304
+ </tr>
305
+ <tr>
306
+ <td align="left">BFCL v3</td>
307
+ <td align="center">73.41</td>
308
+ <td align="center">67.69</td>
309
+ <td align="center">74.94 | 4.87</td>
310
+ <td align="center">74.86 | 9.24</td>
311
+ <td align="center">75.61 | 6.76</td>
312
+ </tr>
313
+ <tr>
314
+ <td align="left">Nexus FC</td>
315
+ <td align="center">49.93</td>
316
+ <td align="center">36.25</td>
317
+ <td align="center">50.45 | 5.53</td>
318
+ <td align="center">44.83 | 11.29</td>
319
+ <td align="center">47.65 | 7.38</td>
320
+ </tr>
321
+ </tbody>
322
+ </table>
323
+
324
+ ---
325
+ ## πŸš€ Highlights
326
+ + **Error-Correcting Editable:** Structural innovation of editable generation for dLLM
327
+ + **Speedy vs Quality Mode:** The 100B flash model achieves ultra-fast inference under Speed Mode while remaining competitive across various tasks and under Quality Mode.
328
+ + **Reinforcement Learning on 100B-scale dLLM:** Tailored algorithm and framework to enable reinforcement learning for large dLLM.
329
+
330
+ ## πŸ—ΊοΈ What's Next
331
+
332
+ + **Powerful Agentic/Tool Use Capability with LLaDA:** Next update will be equipped with powerful **Agentic** and long-distance tool-use capability.
333
+ + **Extreme Editing:** Next update will feature stronger and more extensive editing capabilities, aimed at correcting more errors in parallel reasoning.
334
+ + **Explore More Training Paradigms:** We want to explore more training paradigms than SFT and RL for dLLM.
335
+ ---
336
+
337
+ ## πŸ“¦ Model Variants
338
+ | Model ID | Description | Hugging Face Link |
339
+ | --- | --- | --- |
340
+ | `inclusionAI/LLaDA2.1-mini` | Instruction-tuned model, ready for downstream applications. | [πŸ€— Model Card](https://huggingface.co/inclusionAI/LLaDA2.1-mini) |
341
+ | `inclusionAI/LLaDA2.1-flash` | Instruction-tuned model, ready for downstream applications. | [πŸ€— Model Card](https://huggingface.co/inclusionAI/LLaDA2.1-flash) |
342
+
343
+
344
+ ---
345
+
346
+ ## πŸ” Model Overview
347
+ **LLaDA2.1-flash** has the following specifications:
348
+
349
+ + **Type**: Mixture-of-Experts (MoE) Diffusion Language Model
350
+ + **Total Parameters (Non-Embedding)**: 100B
351
+ + **Number of Layers**: 32
352
+ + **Attention Heads**: 32
353
+ + **Context Length**: 32,768 tokens
354
+ + **Position Embedding**: Rotary (RoPE)
355
+ + **Vocabulary Size**: 157,184
356
+
357
+ ---
358
+
359
+ ### πŸ€— Hugging Face Transformers
360
+ Make sure you have `transformers` and its dependencies installed:
361
+
362
+ ```python
363
+ import torch
364
+ import torch.nn.functional as F
365
+ from transformers import AutoModelForCausalLM, AutoTokenizer
366
+
367
+ model_path = "/path/to/LLaDA2.1-flash"
368
+ device = "auto"
369
+ model = AutoModelForCausalLM.from_pretrained(
370
+ model_path, trust_remote_code=True, device_map=device,
371
+ )
372
+ model = model.to(torch.bfloat16)
373
+ model.eval()
374
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
375
+
376
+ prompt = """Calculate 1+5-28*0.5-200=?"""
377
+ input_ids = tokenizer.apply_chat_template(
378
+ [{"role": "user", "content": prompt}],
379
+ add_generation_prompt=True,
380
+ tokenize=True,
381
+ return_tensors="pt",
382
+ )
383
+ generated_tokens = model.generate(
384
+ inputs=input_ids,
385
+ eos_early_stop=True,
386
+ gen_length=512,
387
+ block_length=32,
388
+ threshold=0.5,
389
+ editing_threshold=0,
390
+ temperature=0.0,
391
+ )
392
+ generated_answer = tokenizer.decode(
393
+ generated_tokens[0],
394
+ skip_special_tokens=True,
395
+ )
396
+ print(generated_answer)
397
+ ```
398
+ Multi-block Editing inference comming soon.
399
+ ### Best Practices
400
+ To achieve optimal performance, we recommend the following settings:
401
+
402
+ 1. **Sampling Parameters**:
403
+ We recommend the following general sampling parameters: `block_length=32`, `temperature=0.0`, `top_p=None` and `top_k=None`. We are currently exploring more diverse sampling configurations.
404
+
405
+ 2. **Denoising Thresholds**:
406
+ There are three denoising params: `threshold`, `editing_threshold` and `max_post_steps`. We recommend `threshold=0.7`, `editing_threshold=0.5` for **Quality Mode** and `threshold=0.5`, `editing_threshold=0.0` for **Speed Mode**. For both modes, we suggest setting max_post_steps to a value greater than 5. We recommend 16 as a balanced default, which was used for most of our internal testing.
407
+
408
+ Note: Low `threshold` may causes stuttering in trade-off for quick inference.
409
+
410
+ 3. **Adequate Output Length**:
411
+ We recommend using an output length of 16384 tokens for most scenarios.
412
+
413
+ ---
414
+ ## Deployment
415
+ ### SGLang
416
+ SGLang enables dLLM inference either through offline batching or by launching an HTTP server for online requests. You can start the SGLang dLLM using the following commands:
417
+
418
+ ``` bash
419
+ python3 -m sglang.launch_server \
420
+ --model-path inclusionAI/LLaDA2.1-flash \
421
+ --dllm-algorithm JointThreshold \
422
+ --tp-size 4 \
423
+ --trust-remote-code \
424
+ --mem-fraction-static 0.8 \
425
+ --max-running-requests 1 \
426
+ --attention-backend flashinfer
427
+ ```
428
+
429
+ ### Enviroment Preparation
430
+ Pull Request (PR) has been submitted and merged to the SGLang community, please prepare the environment with the lateset version
431
+ ___
432
+ ## 🌐 License
433
+ This project is licensed under the terms of the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
434
+
435
+ ---
436
+
437
+ ## 🀝 Contact & Collaboration
438
+ For questions, collaborations, or feedback, please reach out via [Hugging Face](https://huggingface.co/inclusionAI/LLaDA2.1-flash) or open an issue in the [repository](https://github.com/inclusionAI).
439
+
440
+ πŸ‘‰ Join us in advancing open, efficient, and intelligent language models!