Instructions to use moonshotai/Kimi-K2.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use moonshotai/Kimi-K2.5 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="moonshotai/Kimi-K2.5", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("moonshotai/Kimi-K2.5", trust_remote_code=True, dtype="auto")

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use moonshotai/Kimi-K2.5 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "moonshotai/Kimi-K2.5"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moonshotai/Kimi-K2.5",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/moonshotai/Kimi-K2.5

SGLang

How to use moonshotai/Kimi-K2.5 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "moonshotai/Kimi-K2.5" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moonshotai/Kimi-K2.5",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "moonshotai/Kimi-K2.5" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moonshotai/Kimi-K2.5",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use moonshotai/Kimi-K2.5 with Docker Model Runner:
```
docker model run hf.co/moonshotai/Kimi-K2.5
```

performance of quantized models

#61

by darvec - opened Feb 5

Discussion

darvec

Feb 5

•

edited Feb 6

I am looking at various quantized versions of K2.5 in unsloth since my hardware can only hold UD-Q2_K_XL/UD-IQ3_XXS/Q3_K_S. Is there any comparison between the quantized version of kimi k2.5 and other good open-source models like Qwen3? For example, Qwen3-235B-A22B-Instruct-2507 is 470G large which is about the same size as Q3_K_S, but I am not sure if Q3_K_S has a better performance than qwen

ubergarm

Feb 5

I've been out for a week due to life stuff, but hoping to run some perplexity values on quants soon

The best quant available is the "full size" Q4_X by AesSedai here: https://huggingface.co/AesSedai/Kimi-K2.5-GGUF which seems too big for your desired size. Aes' smaller quants should be quite good as well with compatibility with mainline llama.cpp

Hopefully I'll get some smaller ik_llama.cpp quants out soon, which will likely offer the best quality for a given footprint. You can see some earlier perplexity graphs for Kimi-K2-Thinking here: https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF#quant-collection

In my own anecdotal experience, I prefer to use a more quantized larger model (e.g. DeepSeek-V3.2-Speciale or Kimi-K2.5) over a less quantized smaller model (e.g. Qwen3-235B).

gghfez

Feb 6

Aes' smaller quants should be quite good as well with compatibility with mainline llama.cpp

I had issues with the IQ2_XXS there. Very similar to what this guy is having with the IQ3_S: https://huggingface.co/AesSedai/Kimi-K2.5-GGUF/discussions/4

The unsloth UD-IQ2_XXS has been stable (with the q8_0 mmproj mmproj from Aes' repo), but the embedded template is dodgy, so have to run it with the jukofyork fix here: https://huggingface.co/AesSedai/Kimi-K2.5-GGUF/discussions/1

"full size" Q4_X by AesSedai here: https://huggingface.co/AesSedai/Kimi-K2.5-GGUF

This is probably the perfect way to run it right now, and has the fixed template baked in. I can't run it though -_-!

some smaller ik_llama.cpp quants

😃

ubergarm

Feb 6

@gghfez

Thanks for the links, I'm trying to catch up on everything happened in the past week hah...

I have some small quants trickling in now here: https://huggingface.co/ubergarm/Kimi-K2.5-GGUF

I'll get perplexity graph going soon.

I haven't tried the mmproj stuff at all. I did use the most recent updated official chat template, but was having some issues with pydantic-ai tool use, but it works fine with the old Kimi-K2-Thinking chat template jinja so still need to test that more.

All my Kimi-K2.5 quants keep the active weights full q8_0 and only smashing the routed exps so hopefully no looping etc.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment