Improve model card: Add pipeline tag, paper link, GitHub & citation
Browse filesThis PR enhances the model card for LLaDA-MoE by:
- Adding the `pipeline_tag: text-generation` to improve model discoverability on the Hub.
- Adding a direct link to the paper, [dInfer: An Efficient Inference Framework for Diffusion Language Models](https://huggingface.co/papers/2510.08666), in the model description.
- Including a prominent link to the GitHub repository: https://github.com/inclusionAI/dInfer.
- Updating the "Citation (Coming Soon)" section with the correct BibTeX entry from the paper's GitHub repository.
- Removing the redundant `text_generation` tag, as `pipeline_tag` now covers this.
These changes ensure the model card is complete, well-organized, and provides users with all necessary information at a glance.
README.md
CHANGED
|
@@ -1,15 +1,16 @@
|
|
| 1 |
---
|
|
|
|
| 2 |
license: apache-2.0
|
|
|
|
| 3 |
tags:
|
| 4 |
- dllm
|
| 5 |
- diffusion
|
| 6 |
- llm
|
| 7 |
-
- text_generation
|
| 8 |
-
library_name: transformers
|
| 9 |
---
|
|
|
|
| 10 |
# LLaDA-MoE
|
| 11 |
|
| 12 |
-
**LLaDA-MoE** is a new and upgraded series of the LLaDA diffusion language model. This pre-release includes two cutting-edge models:
|
| 13 |
|
| 14 |
- `LLaDA-MoE-7B-A1B-Base`: A base pre-trained model designed for research and secondary development.
|
| 15 |
- `LLaDA-MoE-7B-A1B-Instruct`: An instruction-tuned model optimized for practical applications.
|
|
@@ -20,8 +21,9 @@ library_name: transformers
|
|
| 20 |
<img src="https://raw.githubusercontent.com/Ulov888/LLaDA_Assets/main/benchmarks_details_table.png" width="800" />
|
| 21 |
</div>
|
| 22 |
|
| 23 |
-
|
| 24 |
-
|
|
|
|
| 25 |
|
| 26 |
## π Performance Highlights
|
| 27 |
|
|
@@ -175,17 +177,20 @@ input_ids = torch.tensor(input_ids).to(device).unsqueeze(0)
|
|
| 175 |
|
| 176 |
text = generate(model, input_ids, steps=128, gen_length=128, block_length=32, temperature=0., cfg_scale=0., remasking='low_confidence')
|
| 177 |
print(tokenizer.batch_decode(text[:, input_ids.shape[1]:], skip_special_tokens=False)[0])
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
```
|
| 183 |
|
|
|
|
| 184 |
|
| 185 |
-
|
| 186 |
|
| 187 |
-
|
| 188 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 189 |
|
| 190 |
---
|
| 191 |
|
|
@@ -197,6 +202,6 @@ This project is licensed under the terms of the [Apache License 2.0](https://www
|
|
| 197 |
|
| 198 |
## π€ Contact & Collaboration
|
| 199 |
|
| 200 |
-
For questions, collaborations, or feedback, please reach out via [Hugging Face](https://huggingface.co/inclusionAI/LLaDA-MoE-7B-A1B-Base) or open an issue in the [repository](https://github.com/inclusionAI).
|
| 201 |
|
| 202 |
π Join us in advancing open, efficient, and intelligent language models!
|
|
|
|
| 1 |
---
|
| 2 |
+
library_name: transformers
|
| 3 |
license: apache-2.0
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
tags:
|
| 6 |
- dllm
|
| 7 |
- diffusion
|
| 8 |
- llm
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
+
|
| 11 |
# LLaDA-MoE
|
| 12 |
|
| 13 |
+
**LLaDA-MoE** is a new and upgraded series of the LLaDA diffusion language model, developed as part of the `dInfer` framework presented in the paper [dInfer: An Efficient Inference Framework for Diffusion Language Models](https://huggingface.co/papers/2510.08666). This pre-release includes two cutting-edge models:
|
| 14 |
|
| 15 |
- `LLaDA-MoE-7B-A1B-Base`: A base pre-trained model designed for research and secondary development.
|
| 16 |
- `LLaDA-MoE-7B-A1B-Instruct`: An instruction-tuned model optimized for practical applications.
|
|
|
|
| 21 |
<img src="https://raw.githubusercontent.com/Ulov888/LLaDA_Assets/main/benchmarks_details_table.png" width="800" />
|
| 22 |
</div>
|
| 23 |
|
| 24 |
+
## GitHub Repository
|
| 25 |
+
For the complete codebase, training scripts, and more details on the `dInfer` framework, please visit the official GitHub repository:
|
| 26 |
+
[https://github.com/inclusionAI/dInfer](https://github.com/inclusionAI/dInfer)
|
| 27 |
|
| 28 |
## π Performance Highlights
|
| 29 |
|
|
|
|
| 177 |
|
| 178 |
text = generate(model, input_ids, steps=128, gen_length=128, block_length=32, temperature=0., cfg_scale=0., remasking='low_confidence')
|
| 179 |
print(tokenizer.batch_decode(text[:, input_ids.shape[1]:], skip_special_tokens=False)[0])
|
|
|
|
|
|
|
|
|
|
|
|
|
| 180 |
```
|
| 181 |
|
| 182 |
+
## π Citation
|
| 183 |
|
| 184 |
+
If you find `dInfer` and LLaDA-MoE useful in your research or applications, please cite our paper:
|
| 185 |
|
| 186 |
+
```bibtex
|
| 187 |
+
@article{dinfer,
|
| 188 |
+
title={dInfer: An Efficient Inference Framework for Diffusion Language Models},
|
| 189 |
+
author={Yuxin Ma, Lun Du, Lanning Wei, Kun Chen, Qian Xu, Kangyu Wang, Guofeng Feng, Guoshan Lu, Lin Liu, Xiaojing Qi, Xinyuan Zhang, Zhen Tao, Haibo Feng, Ziyun Jiang, Ying Xu, Zenan Huang, Yihong Zhuang, Haokai Xu, Jiaqi Hu, Zhenzhong Lan, Junbo Zhao, Jianguo Li, Da Zheng},
|
| 190 |
+
year={2025},
|
| 191 |
+
journal={arXiv preprint arXiv:2510.08666}
|
| 192 |
+
}
|
| 193 |
+
```
|
| 194 |
|
| 195 |
---
|
| 196 |
|
|
|
|
| 202 |
|
| 203 |
## π€ Contact & Collaboration
|
| 204 |
|
| 205 |
+
For questions, collaborations, or feedback, please reach out via [Hugging Face](https://huggingface.co/inclusionAI/LLaDA-MoE-7B-A1B-Base) or open an issue in the [repository](https://github.com/inclusionAI/dInfer).
|
| 206 |
|
| 207 |
π Join us in advancing open, efficient, and intelligent language models!
|