utdawn commited on
Commit
c8ebb85
·
verified ·
1 Parent(s): c0b959e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +459 -0
README.md ADDED
@@ -0,0 +1,459 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ tags:
5
+ - dllm
6
+ - diffusion
7
+ - llm
8
+ - text_generation
9
+ ---
10
+ # LLaDA2.1-mini
11
+
12
+ **LLaDA2.1-mini** is a diffusion language model of the LLaDA series featuring the editing enhancement. It significantly improves inference speed while delivering strong task performance.
13
+
14
+ <div align="center">
15
+ <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*uOo8QKQMiBwAAAAAgNAAAAgAemJ7AQ/original" width="800" />
16
+ </div>
17
+
18
+
19
+ <div align="center">
20
+ <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*biwvQpCmKjEAAAAAULAAAAgAemJ7AQ/original" width="800" />
21
+ </div>
22
+
23
+ ---
24
+ ## Model Performance
25
+
26
+ <table>
27
+ <thead>
28
+ <tr>
29
+ <th align="left"><b>Benchmark</b></th>
30
+ <th align="center"><b>Qwen3-8B<br>(no_think)</b><br><sub>(Score)</sub></th>
31
+ <th align="center"><b>Ling-mini-2.0</b><br><br><sub>(Score)</sub></th>
32
+ <th align="center"><b>LLaDA2.0-mini</b><br><br><sub>(Score | TPF)</sub></th>
33
+ <th align="center"><b>LLaDA2.1-mini<br>(S Mode)</b><br><sub>(Score | TPF)</sub></th>
34
+ <th align="center"><b>LLaDA2.1-mini<br>(Q Mode)</b><br><sub>(Score | TPF)</sub></th>
35
+ </tr>
36
+ </thead>
37
+ <tbody>
38
+ <tr>
39
+ <td align="left"><b>Average</b></td>
40
+ <td align="center">61.59</td>
41
+ <td align="center">64.72</td>
42
+ <td align="center">63.39 | 2.60</td>
43
+ <td align="center">62.24 | 5.34</td>
44
+ <td align="center">63.90 | 3.12</td>
45
+ </tr>
46
+ <tr><td colspan="6" align="center"><b>Knowledge</b></td></tr>
47
+ <tr>
48
+ <td align="left">GPQA</td>
49
+ <td align="center">48.01</td>
50
+ <td align="center">59.41</td>
51
+ <td align="center">47.76 | 2.73</td>
52
+ <td align="center">48.36 | 3.62</td>
53
+ <td align="center">53.28 | 2.12</td>
54
+ </tr>
55
+ <tr>
56
+ <td align="left">MMLU-Pro</td>
57
+ <td align="center">65.83</td>
58
+ <td align="center">67.18</td>
59
+ <td align="center">64.27 | 2.15</td>
60
+ <td align="center">63.42 | 4.22</td>
61
+ <td align="center">64.84 | 2.41</td>
62
+ </tr>
63
+ <tr>
64
+ <td align="left">C-EVAL</td>
65
+ <td align="center">80.6</td>
66
+ <td align="center">82.17</td>
67
+ <td align="center">81.80 | 1.78</td>
68
+ <td align="center">78.40 | 3.39</td>
69
+ <td align="center">78.59 | 1.91</td>
70
+ </tr>
71
+ <tr>
72
+ <td align="left">PHYBench</td>
73
+ <td align="center">9.76</td>
74
+ <td align="center">14.59</td>
75
+ <td align="center">11.70 | 2.48</td>
76
+ <td align="center">12.75 | 4.41</td>
77
+ <td align="center">13.05 | 2.52</td>
78
+ </tr>
79
+ <tr>
80
+ <td align="left">TriviaQA</td>
81
+ <td align="center">52.51</td>
82
+ <td align="center">55.63</td>
83
+ <td align="center">51.33 | 1.54</td>
84
+ <td align="center">53.33 | 3.21</td>
85
+ <td align="center">54.24 | 2.02</td>
86
+ </tr>
87
+ <tr><td colspan="6" align="center"><b>Reasoning</b></td></tr>
88
+ <tr>
89
+ <td align="left">BIG-Bench Hard</td>
90
+ <td align="center">79.48</td>
91
+ <td align="center">83.70</td>
92
+ <td align="center">78.21 | 2.36</td>
93
+ <td align="center">78.42 | 5.02</td>
94
+ <td align="center">80.58 | 2.86</td>
95
+ </tr>
96
+ <tr>
97
+ <td align="left">BIG-Bench Extra Hard</td>
98
+ <td align="center">18.27</td>
99
+ <td align="center">14.81</td>
100
+ <td align="center">16.47 | 2.03</td>
101
+ <td align="center">15.30 | 3.19</td>
102
+ <td align="center">15.78 | 1.66</td>
103
+ </tr>
104
+ <tr>
105
+ <td align="left">bbh-zh</td>
106
+ <td align="center">80.09</td>
107
+ <td align="center">66.11</td>
108
+ <td align="center">75.75 | 2.77</td>
109
+ <td align="center">67.65 | 3.89</td>
110
+ <td align="center">70.40 | 2.35</td>
111
+ </tr>
112
+ <tr>
113
+ <td align="left">MuSR</td>
114
+ <td align="center">70.02</td>
115
+ <td align="center">71.36</td>
116
+ <td align="center">71.48 | 1.45</td>
117
+ <td align="center">70.43 | 2.48</td>
118
+ <td align="center">71.89 | 1.56</td>
119
+ </tr>
120
+ <tr>
121
+ <td align="left">ZebraLogic</td>
122
+ <td align="center">37.48</td>
123
+ <td align="center">79.85</td>
124
+ <td align="center">64.20 | 2.30</td>
125
+ <td align="center">68.50 | 5.38</td>
126
+ <td align="center">77.10 | 2.93</td>
127
+ </tr>
128
+ <tr>
129
+ <td align="left">PrOntoQA</td>
130
+ <td align="center">93.12</td>
131
+ <td align="center">96.06</td>
132
+ <td align="center">86.00 | 2.36</td>
133
+ <td align="center">87.50 | 4.86</td>
134
+ <td align="center">84.50 | 2.73</td>
135
+ </tr>
136
+ <tr>
137
+ <td align="left">PIQA</td>
138
+ <td align="center">88.30</td>
139
+ <td align="center">87.54</td>
140
+ <td align="center">86.51 | 1.45</td>
141
+ <td align="center">84.87 | 2.59</td>
142
+ <td align="center">86.89 | 1.45</td>
143
+ </tr>
144
+ <tr>
145
+ <td align="left">OCNLI</td>
146
+ <td align="center">61.49</td>
147
+ <td align="center">60.17</td>
148
+ <td align="center">64.51 | 4.06</td>
149
+ <td align="center">61.02 | 1.78</td>
150
+ <td align="center">61.59 | 1.23</td>
151
+ </tr>
152
+ <tr>
153
+ <td align="left">HellaSwag</td>
154
+ <td align="center">79.56</td>
155
+ <td align="center">69.02</td>
156
+ <td align="center">79.01 | 1.50</td>
157
+ <td align="center">75.71 | 2.39</td>
158
+ <td align="center">76.19 | 1.49</td>
159
+ </tr>
160
+ <tr>
161
+ <td align="left">KOR-Bench</td>
162
+ <td align="center">54.96</td>
163
+ <td align="center">63.2</td>
164
+ <td align="center">49.92 | 2.45</td>
165
+ <td align="center">46.64 | 4.28</td>
166
+ <td align="center">48.00 | 2.35</td>
167
+ </tr>
168
+ <tr>
169
+ <td align="left">DROP</td>
170
+ <td align="center">84.56</td>
171
+ <td align="center">78.80</td>
172
+ <td align="center">81.89 | 2.02</td>
173
+ <td align="center">81.55 | 5.84</td>
174
+ <td align="center">82.37 | 2.87</td>
175
+ </tr>
176
+ <tr>
177
+ <td align="left">SQuAD 2.0</td>
178
+ <td align="center">85.21</td>
179
+ <td align="center">75.56</td>
180
+ <td align="center">86.50 | 2.47</td>
181
+ <td align="center">84.51 | 4.33</td>
182
+ <td align="center">85.13 | 3.09</td>
183
+ </tr>
184
+ <tr><td colspan="6" align="center"><b>Coding</b></td></tr>
185
+ <tr>
186
+ <td align="left">LiveCodeBench</td>
187
+ <td align="center">26.76</td>
188
+ <td align="center">42.29</td>
189
+ <td align="center">31.83 | 3.34</td>
190
+ <td align="center">28.85 | 6.42</td>
191
+ <td align="center">30.40 | 3.63</td>
192
+ </tr>
193
+ <tr>
194
+ <td align="left">CRUXEval-O</td>
195
+ <td align="center">74.06</td>
196
+ <td align="center">76.12</td>
197
+ <td align="center">71.62 | 2.78</td>
198
+ <td align="center">70.62 | 5.85</td>
199
+ <td align="center">73.75 | 3.35</td>
200
+ </tr>
201
+ <tr>
202
+ <td align="left">MBPP+</td>
203
+ <td align="center">72.69</td>
204
+ <td align="center">77.25</td>
205
+ <td align="center">78.24 | 3.43</td>
206
+ <td align="center">78.84 | 10.59</td>
207
+ <td align="center">74.07 | 6.30</td>
208
+ </tr>
209
+ <tr>
210
+ <td align="left">HumanEval+</td>
211
+ <td align="center">79.5</td>
212
+ <td align="center">80.03</td>
213
+ <td align="center">81.71 | 5.16</td>
214
+ <td align="center">80.49 | 12.32</td>
215
+ <td align="center">82.93 | 7.77</td>
216
+ </tr>
217
+ <tr>
218
+ <td align="left">MultiPL-E</td>
219
+ <td align="center">61.70</td>
220
+ <td align="center">67.09</td>
221
+ <td align="center">67.46 | 2.78</td>
222
+ <td align="center">64.16 | 7.23</td>
223
+ <td align="center">67.17 | 4.01</td>
224
+ </tr>
225
+ <tr>
226
+ <td align="left">BigCodeBench-Full</td>
227
+ <td align="center">36.05</td>
228
+ <td align="center">35.00</td>
229
+ <td align="center">32.89 | 2.87</td>
230
+ <td align="center">30.18 | 7.33</td>
231
+ <td align="center">34.39 | 4.09</td>
232
+ </tr>
233
+ <tr>
234
+ <td align="left">Aider</td>
235
+ <td align="center">55.64</td>
236
+ <td align="center">49.62</td>
237
+ <td align="center">39.85 | 3.57</td>
238
+ <td align="center">43.61 | 8.11</td>
239
+ <td align="center">45.11 | 4.85</td>
240
+ </tr>
241
+ <tr>
242
+ <td align="left">BIRD-SQL</td>
243
+ <td align="center">36.11</td>
244
+ <td align="center">39.67</td>
245
+ <td align="center">39.34 | 1.96</td>
246
+ <td align="center">37.32 | 4.48</td>
247
+ <td align="center">38.40 | 2.42</td>
248
+ </tr>
249
+ <tr>
250
+ <td align="left">Spider</td>
251
+ <td align="center">72.80</td>
252
+ <td align="center">76.43</td>
253
+ <td align="center">76.76 | 3.93</td>
254
+ <td align="center">75.78 | 7.98</td>
255
+ <td align="center">77.55 | 5.48</td>
256
+ </tr>
257
+ <tr><td colspan="6" align="center"><b>Math</b></td></tr>
258
+ <tr>
259
+ <td align="left">AIME 2025</td>
260
+ <td align="center">22.08</td>
261
+ <td align="center">47.66</td>
262
+ <td align="center">36.67 | 2.41</td>
263
+ <td align="center">36.67 | 6.34</td>
264
+ <td align="center">43.33 | 3.29</td>
265
+ </tr>
266
+ <tr>
267
+ <td align="left">OlympiadBench</td>
268
+ <td align="center">55.33</td>
269
+ <td align="center">72.30</td>
270
+ <td align="center">67.70 | 2.63</td>
271
+ <td align="center">64.30 | 7.08</td>
272
+ <td align="center">66.67 | 3.99</td>
273
+ </tr>
274
+ <tr>
275
+ <td align="left">GSM-Plus</td>
276
+ <td align="center">85.56</td>
277
+ <td align="center">87.18</td>
278
+ <td align="center">86.50 | 2.41</td>
279
+ <td align="center">85.88 | 6.82</td>
280
+ <td align="center">86.55 | 3.69</td>
281
+ </tr>
282
+ <tr>
283
+ <td align="left">CMATH</td>
284
+ <td align="center">95.42</td>
285
+ <td align="center">96.40</td>
286
+ <td align="center">95.72 | 1.98</td>
287
+ <td align="center">95.63 | 4.94</td>
288
+ <td align="center">94.99 | 2.56</td>
289
+ </tr>
290
+ <tr>
291
+ <td align="left">Omni-MATH</td>
292
+ <td align="center">33.20</td>
293
+ <td align="center">48.80</td>
294
+ <td align="center">41.70 | 2.57</td>
295
+ <td align="center">41.70 | 6.41</td>
296
+ <td align="center">43.60 | 3.56</td>
297
+ </tr>
298
+ <tr><td colspan="6" align="center"><b>Agent & Alignment</b></td></tr>
299
+ <tr>
300
+ <td align="left">IFEval-strict-prompt</td>
301
+ <td align="center">84.29</td>
302
+ <td align="center">76.16</td>
303
+ <td align="center">80.78 | 1.24</td>
304
+ <td align="center">81.33 | 1.83</td>
305
+ <td align="center">83.18 | 1.25</td>
306
+ </tr>
307
+ <tr>
308
+ <td align="left">BFCL v3</td>
309
+ <td align="center">70.12</td>
310
+ <td align="center">53.75</td>
311
+ <td align="center">70.72 | 4.26</td>
312
+ <td align="center">72.06 | 7.39</td>
313
+ <td align="center">73.61 | 5.14</td>
314
+ </tr>
315
+ <tr>
316
+ <td align="left">CodeIF-Bench</td>
317
+ <td align="center">50.00</td>
318
+ <td align="center">46.00</td>
319
+ <td align="center">46.00 | 2.62</td>
320
+ <td align="center">42.00 | 6.68</td>
321
+ <td align="center">48.00 | 3.62</td>
322
+ </tr>
323
+ <tr>
324
+ <td align="left">Nexus FC</td>
325
+ <td align="center">37.71</td>
326
+ <td align="center">34.38</td>
327
+ <td align="center">35.18 | 4.06</td>
328
+ <td align="center">31.59 | 8.27</td>
329
+ <td align="center">33.69 | 4.91</td>
330
+ </tr>
331
+ </tbody>
332
+ </table>
333
+
334
+ ---
335
+
336
+ ## 🚀 Highlights
337
+ + **Error-Correcting Editable:** Structural innovation of editable generation for dLLM
338
+ + **Speedy vs Quality Mode:** The 16B mini model achieves ultra-fast inference under Speed Mode while remaining competitive across various tasks and under Quality Mode.
339
+ + **Reinforcement Learning on 100B-scale dLLM:** Tailored algorithm and framework to enable reinforcement learning for large dLLM.
340
+
341
+ ## 🗺️ What's Next
342
+
343
+ + **Powerful Agentic/Tool Use Capability with LLaDA:** Next update will be equipped with powerful **Agentic** and long-distance tool-use capability.
344
+ + **Extreme Editing:** Next update will feature stronger and more extensive editing capabilities, aimed at correcting more errors in parallel reasoning.
345
+ + **Explore More Training Paradigms:** We want to explore more training paradigms than SFT and RL for dLLM.
346
+
347
+ ---
348
+
349
+ ## 📦 Model Variants
350
+
351
+ | Model ID | Description | Hugging Face Link |
352
+ | --- | --- | --- |
353
+ | `inclusionAI/LLaDA2.1-mini` | Instruction-tuned model, ready for downstream applications. | [🤗 Model Card](https://huggingface.co/inclusionAI/LLaDA2.1-mini) |
354
+ | `inclusionAI/LLaDA2.1-flash` | Instruction-tuned model, ready for downstream applications. | [🤗 Model Card](https://huggingface.co/inclusionAI/LLaDA2.1-flash) |
355
+
356
+
357
+ ---
358
+
359
+ ## 🔍 Model Overview
360
+ **LLaDA2.1-mini** has the following specifications:
361
+
362
+ + **Type**: Mixture-of-Experts (MoE) Diffusion Language Model
363
+ + **Total Parameters (Non-Embedding)**: 16B
364
+ + **Number of Layers**: 20
365
+ + **Attention Heads**: 16
366
+ + **Context Length**: 32,768 tokens
367
+ + **Position Embedding**: Rotary (RoPE)
368
+ + **Vocabulary Size**: 157,184
369
+
370
+ ---
371
+
372
+ ### 🤗 Hugging Face Transformers
373
+ Make sure you have `transformers` and its dependencies installed:
374
+
375
+ ```python
376
+ import torch
377
+ import torch.nn.functional as F
378
+ from transformers import AutoModelForCausalLM, AutoTokenizer
379
+
380
+ model_path = "/path/to/LLaDA2.1-mini"
381
+ device = "auto"
382
+ model = AutoModelForCausalLM.from_pretrained(
383
+ model_path, trust_remote_code=True, device_map=device,
384
+ )
385
+ model = model.to(torch.bfloat16)
386
+ model.eval()
387
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
388
+
389
+ prompt = """Calculate 1+5-28*0.5-200=?"""
390
+ input_ids = tokenizer.apply_chat_template(
391
+ [{"role": "user", "content": prompt}],
392
+ add_generation_prompt=True,
393
+ tokenize=True,
394
+ return_tensors="pt",
395
+ )
396
+ generated_tokens = model.generate(
397
+ inputs=input_ids,
398
+ eos_early_stop=True,
399
+ gen_length=512,
400
+ block_length=32,
401
+ threshold=0.5,
402
+ editing_threshold=0,
403
+ temperature=0.0,
404
+ )
405
+ generated_answer = tokenizer.decode(
406
+ generated_tokens[0],
407
+ skip_special_tokens=True,
408
+ )
409
+ print(generated_answer)
410
+ ```
411
+
412
+ ### Best Practices
413
+ To achieve optimal performance, we recommend the following settings:
414
+
415
+ 1. **Sampling Parameters**:
416
+ We recommend the following general sampling parameters: `block_length=32`, `temperature=0.0`, `top_p=None` and `top_k=None`. We are currently exploring more diverse sampling configurations.
417
+
418
+ 2. **Denoising Thresholds**:
419
+ There are two denoising params: `threshold` and `editing_threshold`. We recommend `threshold=0.7`, `editing_threshold=0.5` for **Quality Mode** and `threshold=0.5`, `editing_threshold=0.0` for **Speed Mode**.
420
+
421
+ Note: Low `threshold` may causes stuttering in trade-off for quick inference.
422
+
423
+ 3. **Adequate Output Length**:
424
+ We recommend using an output length of 16384 tokens for most scenarios.
425
+
426
+ ---
427
+
428
+ ## 🤖ModelScope
429
+ If you're in mainland China, we strongly recommend you to use our model from 🤖[ModelScope](https://modelscope.cn/models/inclusionAI/LLaDA2.1-mini)
430
+
431
+ ---
432
+
433
+ ## Deployment
434
+ ### SGLang
435
+ SGLang enables dLLM inference either through offline batching or by launching an HTTP server for online requests. You can start the SGLang dLLM using the following commands:
436
+
437
+ ``` bash
438
+ python3 -m sglang.launch_server \
439
+ --model-path inclusionAI/LLaDA2.1-mini \
440
+ --dllm-algorithm JointThreshold \
441
+ --tp-size 1 \
442
+ --trust-remote-code \
443
+ --mem-fraction-static 0.8 \
444
+ --max-running-requests 1 \
445
+ --attention-backend flashinfer
446
+ ```
447
+
448
+ ### Enviroment Preparation
449
+ Pull Request (PR) has been submitted and merged to the SGLang community, please prepare the environment with the lateset version
450
+ ___
451
+ ## 🌐 License
452
+ This project is licensed under the terms of the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
453
+
454
+ ---
455
+
456
+ ## 🤝 Contact & Collaboration
457
+ For questions, collaborations, or feedback, please reach out via [Hugging Face](https://huggingface.co/inclusionAI/LLaDA2.1-mini) or open an issue in the [repository](https://github.com/inclusionAI).
458
+
459
+ 👉 Join us in advancing open, efficient, and intelligent language models!