File size: 6,225 Bytes
5e97d1a d699e90 5e97d1a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
---
license: apache-2.0
library_name: transformers
tags:
- dllm
- diffusion
- llm
- text_generation
---
# LLaDA2.0-mini
**LLaDA2.0-mini** is a diffusion language model featuring a 16BA1B Mixture-of-Experts (MoE) architecture. As an enhanced, instruction-tuned iteration of the LLaDA series, it is optimized for practical applications.
<div align="center">
<img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*uOo8QKQMiBwAAAAAgNAAAAgAemJ7AQ/original" width="800" />
</div>
---
| Benchmark | Qwen3-8B (no thinking) | Ling-mini-2.0 | LLaDA2.0-mini-preview | LLaDA2.0-mini |
| :---: | :---: | :---: | :---: | :---: |
| **Average** | 70.19 | 72.13 | 61.75 | 71.67 |
| **Knowledge** | | | | |
| MMLU | 80.94 | 82.15 | 72.49 | 80.53 |
| MMLU-Pro | 65.48 | 63.72 | 49.22 | 63.22 |
| GPQA | 46.59 | 56.80 | 31.82 | 47.98 |
| arc-c | 93.35 | 93.09 | 89.15 | 93.56 |
| CMMLU | 79.17 | 80.84 | 67.53 | 79.50 |
| C-EVAL | 81.36 | 82.10 | 66.54 | 81.38 |
| GAOKAO-Bench | 84.94 | 87.23 | 74.46 | 84.30 |
| **Reasoning** | | | | |
| SQuAD 2.0 | 85.21 | 75.56 | 85.61 | 86.50 |
| DROP | 84.56 | 78.80 | 79.49 | 81.91 |
| KOR-Bench | 54.48 | 62.72 | 37.26 | 50.40 |
| HellaSwag | 79.56 | 69.02 | 74.01 | 79.01 |
| **Coding** | | | | |
| CRUXEval-O | 74.06 | 76.12 | 61.88 | 71.62 |
| MBPP | 78.92 | 84.07 | 77.75 | 81.50 |
| MultiPL-E | 61.7 | 67.09 | 62.43 | 67.46 |
| HumanEval | 84.76 | 85.98 | 80.49 | 86.59 |
| BigCodeBench-Full | 36.05 | 35.00 | 30.44 | 32.89 |
| LiveCodeBench | 26.38 | 34.97 | 19.93 | 31.50 |
| Spider | 72.80 | 76.43 | 75.64 | 76.76 |
| **Math** | | | | |
| GSM8K | 93.63 | 94.62 | 89.01 | 94.24 |
| MATH | 86.28 | 94.66 | 73.50 | 93.22 |
| OlympiadBench | 55.33 | 72.30 | 36.67 | 67.70 |
| AIME 2025 | 22.08 | 47.66 | 10.00 | 36.67 |
| **Agent & Alignment** | | | | |
| BFCL_Live | 70.08 | 53.98 | 74.11 | 70.90 |
| IFEval-strict -prompt | 86.9 | 76.16 | 62.50 | 80.78 |
## π Performance Highlights
+ **Leading MoE Architecture**:
The open-source **Mixture-of-Experts (MoE) diffusion large language model** continually trained on the Ling2.0 series with approximately **20 trillion tokens**.
+ **Efficient Inference**:
With **16 billion total parameters**, only **1.4 billion** are activated during inference. LLaDA2.0-mini significantly reduces computational costs while outperforming open-source dense models of similar scale.
+ **Impressive Performance on Code & Complex Reasoning**:
Excels in tasks such as **code generation** and **advanced mathematical reasoning**, demonstrating strong reasoning capabilities.
+ **Tool Use**:
Supports **tool calling** and achieves excellent performance in complex agent-based tasks.
+ **Open & Extensible**:
Fully open-source with commitment to transparency. We plan to release a **leading inference framework** in the future and continue investing in cutting-edge areas like **diffusion LLMs (dLLM)** to drive disruptive innovation.
## πΊοΈ What's Next
+ **Supercharged Reasoning with LLaDA 2.0:** LLaDA 2.0 series will be fine-tuned with **Reinforcement Learning**, unlocking a new level of sophisticated reasoning and problem-solving abilities.
+ **Tools for Innovators:** The model was finetuned on the [dFactory](https://github.com/inclusionAI/dFactory) framework using Fully Sharded Data Parallel (FSDP2). We have begun open-sourcing dFactory and will continuously release our advanced post-training technologies. Whether you want to master the current model or build your own customized versions, you'll have the tools you need. Stay tuned for more updates!
---
## π¦ Model Variants
| Model ID | Description | Hugging Face Link |
| --- | --- | --- |
| `inclusionAI/LLaDA2.0-mini` | Instruction-tuned model, ready for downstream applications. | [π€ Model Card](https://huggingface.co/inclusionAI/LLaDA2.0-mini) |
| `inclusionAI/LLaDA2.0-flash` | Instruction-tuned model, ready for downstream applications. | [π€ Model Card](https://huggingface.co/inclusionAI/LLaDA2.0-flash) |
---
## π Model Overview
**LLaDA2.0-mini** has the following specifications:
+ **Type**: Mixture-of-Experts (MoE) Diffusion Language Model
+ **Total Parameters (Non-Embedding)**: 16B
+ **Number of Layers**: 20
+ **Attention Heads**: 16
+ **Context Length**: 32,768 tokens
+ **Position Embedding**: Rotary (RoPE)
+ **Vocabulary Size**: 157,184
---
### π€ Hugging Face Transformers
Make sure you have `transformers` and its dependencies installed:
```python
import torch
import torch.nn.functional as F
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
model_path = "/path/to/LLaDA2.0-mini"
device = "cuda:0"
model = AutoModelForCausalLM.from_pretrained(
model_path, trust_remote_code=True, device_map=device
)
model = model.to(torch.bfloat16)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
prompt = "Why does Camus think that Sisyphus is happy?"
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
add_generation_prompt=True,
tokenize=True,
return_tensors="pt",
)
generated_tokens = model.generate(
inputs=input_ids,
eos_early_stop=True,
gen_length=512,
block_length=32,
steps=32,
temperature=0.0,
)
generated_answer = tokenizer.decode(
generated_tokens[0],
skip_special_tokens=True,
)
print(generated_answer)
```
### Best Practices
To achieve optimal performance, we recommend the following settings:
1. **Sampling Parameters**:
We suggest using `Temperature=0.0`, `block_length=32`, and `steps=32`. Using a higher temperature value may occasionally result in language mixing and a slight decrease in model performance.
2. **Adequate Output Length**:
We recommend using an output length of 32768 tokens for most queries.
---
## π License
This project is licensed under the terms of the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
---
## π€ Contact & Collaboration
For questions, collaborations, or feedback, please reach out via [Hugging Face](https://huggingface.co/inclusionAI/LLaDA2.0-mini) or open an issue in the [repository](https://github.com/inclusionAI).
π Join us in advancing open, efficient, and intelligent language models! |