Instructions to use inclusionAI/LLaDA2.1-mini with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use inclusionAI/LLaDA2.1-mini with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="inclusionAI/LLaDA2.1-mini", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("inclusionAI/LLaDA2.1-mini", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use inclusionAI/LLaDA2.1-mini with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "inclusionAI/LLaDA2.1-mini"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inclusionAI/LLaDA2.1-mini",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/inclusionAI/LLaDA2.1-mini

SGLang

How to use inclusionAI/LLaDA2.1-mini with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "inclusionAI/LLaDA2.1-mini" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inclusionAI/LLaDA2.1-mini",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "inclusionAI/LLaDA2.1-mini" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inclusionAI/LLaDA2.1-mini",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use inclusionAI/LLaDA2.1-mini with Docker Model Runner:
```
docker model run hf.co/inclusionAI/LLaDA2.1-mini
```

LLaDA2.1-mini

Commit History

Use create_bidirectional_mask for backend-agnostic attention mask handling (SDPA, FA2, flex)

90c63c7
verified

kashif HF Staff commited on Mar 18

fix: align _init_weights with Qwen2Moe using nn.init API

92a5aa6
verified

kashif HF Staff commited on Mar 15

fix: call super()._init_weights() to match Qwen2Moe convention

1f0d143
verified

kashif HF Staff commited on Mar 15

fix: align RotaryEmbedding with Qwen2Moe pattern for transformers compat

a0f450f
verified

kashif HF Staff commited on Mar 15

fix: add default factor=1.0 for linear rope compat with newer transformers

220496f
verified

kashif HF Staff commited on Mar 15

fix: remap default rope_type to linear for newer transformers compat

0435edb
verified

kashif HF Staff commited on Mar 15

fix: use linear rope_type instead of removed default for transformers compat

52747ba
verified

kashif HF Staff commited on Mar 15

Update README.md

bbb5715
verified

utdawn commited on Feb 12

Update README.md

c38df8b
verified

utdawn commited on Feb 10

Update modeling_llada2_moe.py

c60e761
verified

utdawn commited on Feb 9

Update README.md

6d07958
verified

utdawn commited on Feb 9

Create README.md

c8ebb85
verified

utdawn commited on Feb 9

Update modeling_llada2_moe.py

c0b959e
verified

utdawn commited on Feb 9

Create modeling_llada2_moe.py

5430aaf
verified

utdawn commited on Feb 9

Update configuration_llada2_moe.py

276abc9
verified

utdawn commited on Feb 9

Add files using upload-large-folder tool

a8af0fb
verified

m1ngcheng commited on Feb 9

initial commit

54d0b65
verified

m1ngcheng commited on Feb 9

Commit History

Use create_bidirectional_mask for backend-agnostic attention mask handling (SDPA, FA2, flex) 90c63c7 verified

fix: align _init_weights with Qwen2Moe using nn.init API 92a5aa6 verified

fix: call super()._init_weights() to match Qwen2Moe convention 1f0d143 verified

fix: align RotaryEmbedding with Qwen2Moe pattern for transformers compat a0f450f verified

fix: add default factor=1.0 for linear rope compat with newer transformers 220496f verified

fix: remap default rope_type to linear for newer transformers compat 0435edb verified

fix: use linear rope_type instead of removed default for transformers compat 52747ba verified

Update README.md bbb5715 verified

Update README.md c38df8b verified

Update modeling_llada2_moe.py c60e761 verified

Update README.md 6d07958 verified

Create README.md c8ebb85 verified

Update modeling_llada2_moe.py c0b959e verified

Create modeling_llada2_moe.py 5430aaf verified

Update configuration_llada2_moe.py 276abc9 verified

Add files using upload-large-folder tool a8af0fb verified

initial commit 54d0b65 verified

Use create_bidirectional_mask for backend-agnostic attention mask handling (SDPA, FA2, flex)

90c63c7
verified

fix: align _init_weights with Qwen2Moe using nn.init API

92a5aa6
verified

fix: call super()._init_weights() to match Qwen2Moe convention

1f0d143
verified

fix: align RotaryEmbedding with Qwen2Moe pattern for transformers compat

a0f450f
verified

fix: add default factor=1.0 for linear rope compat with newer transformers

220496f
verified

fix: remap default rope_type to linear for newer transformers compat

0435edb
verified

fix: use linear rope_type instead of removed default for transformers compat

52747ba
verified

Update README.md

bbb5715
verified

Update README.md

c38df8b
verified

Update modeling_llada2_moe.py

c60e761
verified

Update README.md

6d07958
verified

Create README.md

c8ebb85
verified

Update modeling_llada2_moe.py

c0b959e
verified

Create modeling_llada2_moe.py

5430aaf
verified

Update configuration_llada2_moe.py

276abc9
verified

Add files using upload-large-folder tool

a8af0fb
verified

initial commit

54d0b65
verified