Text Generation
Transformers
Safetensors
llada2_moe
dllm
diffusion
llm
text_generation
conversational
custom_code
Instructions to use inclusionAI/LLaDA2.0-mini-CAP with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use inclusionAI/LLaDA2.0-mini-CAP with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="inclusionAI/LLaDA2.0-mini-CAP", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("inclusionAI/LLaDA2.0-mini-CAP", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use inclusionAI/LLaDA2.0-mini-CAP with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "inclusionAI/LLaDA2.0-mini-CAP" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "inclusionAI/LLaDA2.0-mini-CAP", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/inclusionAI/LLaDA2.0-mini-CAP
- SGLang
How to use inclusionAI/LLaDA2.0-mini-CAP with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "inclusionAI/LLaDA2.0-mini-CAP" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "inclusionAI/LLaDA2.0-mini-CAP", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "inclusionAI/LLaDA2.0-mini-CAP" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "inclusionAI/LLaDA2.0-mini-CAP", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use inclusionAI/LLaDA2.0-mini-CAP with Docker Model Runner:
docker model run hf.co/inclusionAI/LLaDA2.0-mini-CAP
Update README.md
Browse files
README.md
CHANGED
|
@@ -139,3 +139,15 @@ For questions, collaborations, or feedback, please reach out via [Hugging Face](
|
|
| 139 |
|
| 140 |
---
|
| 141 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 139 |
|
| 140 |
---
|
| 141 |
|
| 142 |
+
## Citation
|
| 143 |
+
```bibtex
|
| 144 |
+
@misc{bie2025llada20scalingdiffusionlanguage,
|
| 145 |
+
title={LLaDA2.0: Scaling Up Diffusion Language Models to 100B},
|
| 146 |
+
author={Tiwei Bie and Maosong Cao and Kun Chen and Lun Du and Mingliang Gong and Zhuochen Gong and Yanmei Gu and Jiaqi Hu and Zenan Huang and Zhenzhong Lan and Chengxi Li and Chongxuan Li and Jianguo Li and Zehuan Li and Huabin Liu and Ling Liu and Guoshan Lu and Xiaocheng Lu and Yuxin Ma and Jianfeng Tan and Lanning Wei and Ji-Rong Wen and Yipeng Xing and Xiaolu Zhang and Junbo Zhao and Da Zheng and Jun Zhou and Junlin Zhou and Zhanchao Zhou and Liwang Zhu and Yihong Zhuang},
|
| 147 |
+
year={2025},
|
| 148 |
+
eprint={2512.15745},
|
| 149 |
+
archivePrefix={arXiv},
|
| 150 |
+
primaryClass={cs.LG},
|
| 151 |
+
url={https://arxiv.org/abs/2512.15745},
|
| 152 |
+
}
|
| 153 |
+
```
|