Instructions to use moonshotai/Kimi-K2-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use moonshotai/Kimi-K2-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="moonshotai/Kimi-K2-Instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("moonshotai/Kimi-K2-Instruct", trust_remote_code=True, dtype="auto")

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use moonshotai/Kimi-K2-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "moonshotai/Kimi-K2-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moonshotai/Kimi-K2-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/moonshotai/Kimi-K2-Instruct

SGLang

How to use moonshotai/Kimi-K2-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "moonshotai/Kimi-K2-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moonshotai/Kimi-K2-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "moonshotai/Kimi-K2-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moonshotai/Kimi-K2-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use moonshotai/Kimi-K2-Instruct with Docker Model Runner:
```
docker model run hf.co/moonshotai/Kimi-K2-Instruct
```

Update tokenizer_config.json

#13

pinned

by bchenfireworks - opened Jul 14, 2025

base: refs/heads/main

←

from: refs/pr/13

Discussion Files changed

-1

bchenfireworks

Jul 14, 2025

Support tool call, temp 0 tau2-airline went from 14% to 50%. Old template was ignoring all the previous tool calls, breaking all multi step tool call conversations.

Update tokenizer_config.jsonc8430fc9

doramonk

Jul 14, 2025

thanks.. are u using sglang or vllm to host? i tested the template, multistep tool call in sglang still not working

bigeagle

Moonshot AI org Jul 14, 2025

•

edited Jul 14, 2025

Thank you so much. We forgot to put that in chat template.

In addition, role=tool message has a tool_call_id attribute that should be encoded to the content, the format is

"<|im_system|>tool<|im_middle|>" +
"## Return of {{tool_call_id}}\n" + "{{content}}" +
"<|im_end|>"

Can you update this part?

BTW, did you decipher this template from raw tokens?

lsw825 pinned discussion Jul 14, 2025

update tokenizer to include tool response format as well25bc044c

uploadab298929

bigmoyan

Moonshot AI org Jul 15, 2025

@bchenfireworks thank you very much !

Also, I would like to suggest the following updates:
(1) use {{ tool_call['function']['arguments'] | tojson}} instead, to make sure arguments are legal json.
(2) add more - in control statements to eliminate blanks, so we can get a compact string.

benjibc

Jul 15, 2025

Thanks! Let update the template following your suggestion.

bigmoyan

Moonshot AI org Jul 15, 2025

Thanks! Let update the template following your suggestion.

Since the template string it too loooooong to review, could you pls also provide us an example (chat history and the rendered string), so we can verify it locally. Thank you very much, really.

bigmoyan changed pull request status to merged Jul 15, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment