Instructions to use moonshotai/Kimi-K2.6 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use moonshotai/Kimi-K2.6 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="moonshotai/Kimi-K2.6", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("moonshotai/Kimi-K2.6", trust_remote_code=True, dtype="auto") - Inference
- HuggingChat
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use moonshotai/Kimi-K2.6 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "moonshotai/Kimi-K2.6" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "moonshotai/Kimi-K2.6", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/moonshotai/Kimi-K2.6
- SGLang
How to use moonshotai/Kimi-K2.6 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "moonshotai/Kimi-K2.6" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "moonshotai/Kimi-K2.6", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "moonshotai/Kimi-K2.6" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "moonshotai/Kimi-K2.6", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use moonshotai/Kimi-K2.6 with Docker Model Runner:
docker model run hf.co/moonshotai/Kimi-K2.6
Tool-call argument JSON malformations on long-content payloads (KIMI 2.6 on VLLM 0.19.1)
Reporting a reproducible failure mode in K2.6 tool-call generation for the Moonshot team.
###ENV
args: [ 'serve', 'moonshotai/Kimi-K2.6', '--host', '0.0.0.0', '--port', '8000', '--tensor-parallel-size', '8', '--max-model-len', '262144', '--kv-cache-dtype', 'auto', '--gpu-memory-utilization', '0.9', '--max-num-batched-tokens', '16384', '--chat-template', './chat_template.jinja', '--enable-prefix-caching', '--override-generation-config', '{"temperature": 1.0, "top_p": 0.95, "max_new_tokens": 131000, "repetition_penalty": 1.05}', '--served-model-name', '', '', '--decode-context-parallel-size', '8', '--mm-encoder-tp-mode', 'data', '--mm-processor-cache-type', 'shm', '--trust-remote-code', '--tokenizer-mode', 'hf', '--default-chat-template-kwargs', '{"thinking": true}', '--enable-chunked-prefill', '--max-num-seqs', '64', '--max-cudagraph-capture-size', '128', '--compilation-config', '{"cudagraph_mode": "FULL_AND_PIECEWISE"}', '--enable-auto-tool-choice', '--tool-call-parser', 'kimi_k2', '--reasoning-parser', 'kimi_k2', '--kv-cache-metrics', '--kv-cache-metrics-sample', '0.05' ]
What we observe
The model produces structurally-malformed JSON in <|tool_call_argument_begin|>...<|tool_call_end|>. Two patterns:
Pattern A — long string + missing outer close: create_google_doc-style tool calls with ~20-25KB of markdown content in a single string field consistently miss the final } of the outer object. Model emits the closing " of the string and the closing } of the inner object but then stops, treating the call as complete.
Concrete sample (truncated): {"title": "...", "content": "## Section 1\n\n- bullet\n- bullet\npython\ndef x():\n pass\n\n\n...7KB later...\n## Conclusion\n\nReady for review.\n" ^ EOF here, missing }
Pattern B — doubly-stringified JSON: Tools whose schemas accept a stringified-JSON input field (anti-pattern, but common in agent frameworks). Model produces {"input": "{\"k\": \"v\", \"k2\": ...}"} and at ~500-700 chars in, drops a delimiter inside the inner JSON. Outer parse succeeds up to that point, then fails.
We've also seen high rates of the model omitting required schema fields (e.g. dropping a required graph field on tools where path alone "feels sufficient" — but that's a separate report.
Happy to share more telemetry if useful.