Instructions to use inclusionAI/Ring-mini-2.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use inclusionAI/Ring-mini-2.0 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="inclusionAI/Ring-mini-2.0", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("inclusionAI/Ring-mini-2.0", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use inclusionAI/Ring-mini-2.0 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "inclusionAI/Ring-mini-2.0"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inclusionAI/Ring-mini-2.0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/inclusionAI/Ring-mini-2.0

SGLang

How to use inclusionAI/Ring-mini-2.0 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "inclusionAI/Ring-mini-2.0" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inclusionAI/Ring-mini-2.0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "inclusionAI/Ring-mini-2.0" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inclusionAI/Ring-mini-2.0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use inclusionAI/Ring-mini-2.0 with Docker Model Runner:
```
docker model run hf.co/inclusionAI/Ring-mini-2.0
```

Fix _init_weights and RotaryEmbedding for transformers v5.x compatibility

#10

by apsys - opened Mar 3

base: refs/heads/main

←

from: refs/pr/10

Discussion Files changed

+19

-7

apsys

Mar 3

•

edited Mar 3

Fix _init_weights and RotaryEmbedding initialization (for transformers 5.x)

_init_weights was using .data.normal_() directly on tensors, which bypasses the _is_hf_initialized guard in transformers v5.x. Since v5.x loads on meta device first then calls initialize_weights() post-checkpoint, this was silently re-randomizing every Linear and Embedding after from_pretrained. Model loads fine, outputs garbage. Switched to torch.nn.init.normal_() / zeros_() so the guard works.

Also, RotaryEmbedding.__init__ KeyErrors on "default" rope type - ROPE_INIT_FUNCTIONS just doesn't have that key, and Ring-mini-2.0 has rope_scaling=None so it always hits this path. Handled default inline. While at it, forced float32 for the inv_freq computation because rope_theta=600k overflows bf16 trivially.

Fix _init_weights and RotaryEmbedding initializationb0c5f624

apsys changed pull request status to open Mar 3

zhanghanxiao

inclusionAI org Mar 10

@apsys Thanks for your attention and for sharing the code. 🤝
I noticed that the partial_rotary_factor parameter doesn’t seem to be handled—was this intentionally omitted?

# code in transformers v4.56
def _compute_default_rope_parameters(
...
    partial_rotary_factor = config.partial_rotary_factor if hasattr(config, "partial_rotary_factor") else 1.0
    head_dim = getattr(config, "head_dim", None) or config.hidden_size // config.num_attention_heads
    dim = int(head_dim * partial_rotary_factor)
...

If you have any before/after comparison results for the change, it would be great if you could share them as well. Thanks again.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment