Instructions to use unsloth/Kimi-K2.7-Code with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use unsloth/Kimi-K2.7-Code with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="unsloth/Kimi-K2.7-Code", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("unsloth/Kimi-K2.7-Code", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use unsloth/Kimi-K2.7-Code with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "unsloth/Kimi-K2.7-Code"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/Kimi-K2.7-Code",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/unsloth/Kimi-K2.7-Code

SGLang

How to use unsloth/Kimi-K2.7-Code with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "unsloth/Kimi-K2.7-Code" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/Kimi-K2.7-Code",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "unsloth/Kimi-K2.7-Code" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/Kimi-K2.7-Code",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio

How to use unsloth/Kimi-K2.7-Code with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/Kimi-K2.7-Code to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/Kimi-K2.7-Code to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for unsloth/Kimi-K2.7-Code to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="unsloth/Kimi-K2.7-Code",
    max_seq_length=2048,
)

Docker Model Runner
How to use unsloth/Kimi-K2.7-Code with Docker Model Runner:
```
docker model run hf.co/unsloth/Kimi-K2.7-Code
```

danielhanchen commited on 6 days ago

Commit

3ce346d

verified ·

1 Parent(s): 2d4d367

Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

README.md +21 -3
config.json +44 -31
processor_config.json +38 -0
tokenizer_config.json +19 -12

README.md CHANGED Viewed

@@ -1,13 +1,31 @@
 ---
 tags:
 - compressed-tensors
 license: other
 license_name: modified-mit
 library_name: transformers
 pipeline_tag: image-text-to-text
-base_model:
-- moonshotai/Kimi-K2.7-Code
 ---
 <div align="center">
   <picture>
       <img src="figures/kimi-logo.png" width="30%" alt="Kimi K2.7 Code">
@@ -334,4 +352,4 @@ See [THIRD PARTY NOTICES](THIRD_PARTY_NOTICES.md)
 ## 9. Contact Us
-If you have any questions, please reach out at [support@moonshot.ai](mailto:support@moonshot.ai).

 ---
+base_model:
+- moonshotai/Kimi-K2.7-Code
 tags:
 - compressed-tensors
+- unsloth
 license: other
 license_name: modified-mit
 library_name: transformers
 pipeline_tag: image-text-to-text
 ---
+<div>
+<p style="margin-top: 0;margin-bottom: 0;">
+    <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
+  </p>
+  <div style="display: flex; gap: 5px; align-items: center; ">
+    <a href="https://github.com/unslothai/unsloth/">
+      <img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
+    </a>
+    <a href="https://discord.gg/unsloth">
+      <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
+    </a>
+    <a href="https://docs.unsloth.ai/">
+      <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
+    </a>
+  </div>
+</div>
 <div align="center">
   <picture>
       <img src="figures/kimi-logo.png" width="30%" alt="Kimi K2.7 Code">
 ## 9. Contact Us
+If you have any questions, please reach out at [support@moonshot.ai](mailto:support@moonshot.ai).

config.json CHANGED Viewed

@@ -8,12 +8,47 @@
     "AutoModelForCausalLM": "modeling_kimi_k25.KimiK25ForConditionalGeneration"
   },
   "bos_token_id": 163584,
-  "dtype": "bfloat16",
   "eos_token_id": 163586,
   "ignore_index": -100,
   "media_placeholder_token_id": 163605,
   "model_type": "kimi_k25",
   "pad_token_id": 163839,
   "text_config": {
     "_name_or_path": "",
     "add_cross_attention": false,
@@ -28,24 +63,15 @@
       "AutoModelForCausalLM": "modeling_deepseek.DeepseekV3ForCausalLM"
     },
     "aux_loss_alpha": 0.001,
-    "bad_words_ids": null,
-    "begin_suppress_tokens": null,
     "bos_token_id": 163584,
     "chunk_size_feed_forward": 0,
     "cross_attention_hidden_size": null,
     "decoder_start_token_id": null,
-    "diversity_penalty": 0.0,
-    "do_sample": false,
-    "dtype": "bfloat16",
-    "early_stopping": false,
-    "encoder_no_repeat_ngram_size": 0,
     "eos_token_id": 163586,
     "ep_size": 1,
-    "exponential_decay_length_penalty": null,
     "finetuning_task": null,
     "first_k_dense_replace": 1,
-    "forced_bos_token_id": null,
-    "forced_eos_token_id": null,
     "hidden_act": "silu",
     "hidden_size": 7168,
     "id2label": {
@@ -61,29 +87,21 @@
       "LABEL_0": 0,
       "LABEL_1": 1
     },
-    "length_penalty": 1.0,
-    "max_length": 20,
     "max_position_embeddings": 262144,
-    "min_length": 0,
-    "model_type": "kimi_k2",
     "moe_intermediate_size": 2048,
     "moe_layer_freq": 1,
     "n_group": 1,
     "n_routed_experts": 384,
     "n_shared_experts": 1,
-    "no_repeat_ngram_size": 0,
     "norm_topk_prob": true,
     "num_attention_heads": 64,
-    "num_beam_groups": 1,
-    "num_beams": 1,
     "num_experts_per_tok": 8,
     "num_hidden_layers": 61,
     "num_key_value_heads": 64,
     "num_nextn_predict_layers": 0,
-    "num_return_sequences": 1,
     "output_attentions": false,
     "output_hidden_states": false,
-    "output_scores": false,
     "pad_token_id": 163839,
     "prefix": null,
     "pretraining_tp": 1,
@@ -127,18 +145,17 @@
       "quant_method": "compressed-tensors",
       "quantization_status": "compressed"
     },
-    "remove_invalid_values": false,
-    "repetition_penalty": 1.0,
     "return_dict": true,
-    "return_dict_in_generate": false,
     "rms_norm_eps": 1e-05,
-    "rope_scaling": {
       "beta_fast": 32.0,
       "beta_slow": 1.0,
       "factor": 64.0,
       "mscale": 1.0,
       "mscale_all_dim": 1.0,
       "original_max_position_embeddings": 4096,
       "type": "yarn"
     },
     "rope_theta": 50000.0,
@@ -146,30 +163,25 @@
     "scoring_func": "sigmoid",
     "sep_token_id": null,
     "seq_aux": true,
-    "suppress_tokens": null,
     "task_specific_params": null,
-    "temperature": 1.0,
     "tf_legacy_loss": false,
     "tie_encoder_decoder": false,
     "tie_word_embeddings": false,
     "tokenizer_class": null,
-    "top_k": 50,
-    "top_p": 1.0,
     "topk_group": 1,
     "topk_method": "noaux_tc",
     "torchscript": false,
-    "transformers_version": "4.56.2",
-    "typical_p": 1.0,
     "use_bfloat16": false,
     "use_cache": true,
     "v_head_dim": 128,
     "vocab_size": 163840
   },
   "tie_word_embeddings": false,
   "use_unified_vision_chunk": true,
   "video_placeholder": "<|kimi_k25_video_placeholder|>",
   "vision_config": {
-    "_attn_implementation": "flash_attention_2",
     "init_pos_emb_height": 64,
     "init_pos_emb_time": 4,
     "init_pos_emb_width": 64,
@@ -180,6 +192,7 @@
     "merge_type": "sd2_tpool",
     "mm_hidden_size": 1152,
     "mm_projector_type": "patchmerger",
     "patch_size": 14,
     "pos_emb_type": "divided_fixed",
     "projector_hidden_act": "gelu",

     "AutoModelForCausalLM": "modeling_kimi_k25.KimiK25ForConditionalGeneration"
   },
   "bos_token_id": 163584,
+  "torch_dtype": "bfloat16",
   "eos_token_id": 163586,
   "ignore_index": -100,
   "media_placeholder_token_id": 163605,
   "model_type": "kimi_k25",
   "pad_token_id": 163839,
+  "quantization_config": {
+    "config_groups": {
+      "group_0": {
+        "input_activations": null,
+        "output_activations": null,
+        "targets": [
+          "Linear"
+        ],
+        "weights": {
+          "actorder": null,
+          "block_structure": null,
+          "dynamic": false,
+          "group_size": 32,
+          "num_bits": 4,
+          "observer": "minmax",
+          "observer_kwargs": {},
+          "strategy": "group",
+          "symmetric": true,
+          "type": "int"
+        }
+      }
+    },
+    "format": "pack-quantized",
+    "ignore": [
+      "re:.*self_attn.*",
+      "re:.*shared_experts.*",
+      "re:.*mlp\\.(gate|up|gate_up|down)_proj.*",
+      "re:.*lm_head.*",
+      "re:.*vision_tower.*",
+      "re:.*mm_projector.*"
+    ],
+    "kv_cache_scheme": null,
+    "quant_method": "compressed-tensors",
+    "quantization_status": "compressed"
+  },
   "text_config": {
     "_name_or_path": "",
     "add_cross_attention": false,
       "AutoModelForCausalLM": "modeling_deepseek.DeepseekV3ForCausalLM"
     },
     "aux_loss_alpha": 0.001,
     "bos_token_id": 163584,
     "chunk_size_feed_forward": 0,
     "cross_attention_hidden_size": null,
     "decoder_start_token_id": null,
+    "torch_dtype": "bfloat16",
     "eos_token_id": 163586,
     "ep_size": 1,
     "finetuning_task": null,
     "first_k_dense_replace": 1,
     "hidden_act": "silu",
     "hidden_size": 7168,
     "id2label": {
       "LABEL_0": 0,
       "LABEL_1": 1
     },
     "max_position_embeddings": 262144,
+    "model_type": "deepseek_v3",
     "moe_intermediate_size": 2048,
     "moe_layer_freq": 1,
     "n_group": 1,
     "n_routed_experts": 384,
     "n_shared_experts": 1,
     "norm_topk_prob": true,
     "num_attention_heads": 64,
     "num_experts_per_tok": 8,
     "num_hidden_layers": 61,
     "num_key_value_heads": 64,
     "num_nextn_predict_layers": 0,
     "output_attentions": false,
     "output_hidden_states": false,
     "pad_token_id": 163839,
     "prefix": null,
     "pretraining_tp": 1,
       "quant_method": "compressed-tensors",
       "quantization_status": "compressed"
     },
     "return_dict": true,
     "rms_norm_eps": 1e-05,
+    "rope_parameters": {
       "beta_fast": 32.0,
       "beta_slow": 1.0,
       "factor": 64.0,
       "mscale": 1.0,
       "mscale_all_dim": 1.0,
       "original_max_position_embeddings": 4096,
+      "rope_theta": 50000.0,
+      "rope_type": "yarn",
       "type": "yarn"
     },
     "rope_theta": 50000.0,
     "scoring_func": "sigmoid",
     "sep_token_id": null,
     "seq_aux": true,
     "task_specific_params": null,
     "tf_legacy_loss": false,
     "tie_encoder_decoder": false,
     "tie_word_embeddings": false,
     "tokenizer_class": null,
     "topk_group": 1,
     "topk_method": "noaux_tc",
     "torchscript": false,
     "use_bfloat16": false,
     "use_cache": true,
     "v_head_dim": 128,
     "vocab_size": 163840
   },
   "tie_word_embeddings": false,
+  "transformers_version": "5.13.0.dev0",
+  "unsloth_fixed": true,
   "use_unified_vision_chunk": true,
   "video_placeholder": "<|kimi_k25_video_placeholder|>",
   "vision_config": {
     "init_pos_emb_height": 64,
     "init_pos_emb_time": 4,
     "init_pos_emb_width": 64,
     "merge_type": "sd2_tpool",
     "mm_hidden_size": 1152,
     "mm_projector_type": "patchmerger",
+    "model_type": "",
     "patch_size": 14,
     "pos_emb_type": "divided_fixed",
     "projector_hidden_act": "gelu",

processor_config.json ADDED Viewed

	@@ -0,0 +1,38 @@

+{
+  "auto_map": {
+    "AutoProcessor": "kimi_k25_processor.KimiK25Processor"
+  },
+  "image_processor": {
+    "auto_map": {
+      "AutoImageProcessor": "kimi_k25_vision_processing.KimiK25VisionProcessor",
+      "AutoProcessor": "kimi_k25_processor.KimiK25Processor"
+    },
+    "image_processor_type": "KimiK25VisionProcessor",
+    "media_proc_cfg": {
+      "config_type": "media_proc.processors.moonvit.MoonViTMediaProcessorConfig",
+      "fixed_output_tokens": null,
+      "image_mean": [
+        0.5,
+        0.5,
+        0.5
+      ],
+      "image_std": [
+        0.5,
+        0.5,
+        0.5
+      ],
+      "in_patch_limit": 16384,
+      "in_patch_limit_each_frame": 4096,
+      "in_patch_limit_video": 655360,
+      "max_num_frames_each_video": null,
+      "merge_kernel_size": 2,
+      "patch_limit_on_one_side": 512,
+      "patch_size": 14,
+      "sample_fps": 8.0,
+      "temporal_merge_kernel_size": 4,
+      "timestamp_mode": "hh:mm:ss.fff"
+    },
+    "num_frames_per_chunk": 4
+  },
+  "processor_class": "KimiK25Processor"
+}

tokenizer_config.json CHANGED Viewed

@@ -185,7 +185,18 @@
       "special": true
     }
   },
-  "additional_special_tokens": [
     "<|im_end|>",
     "<|im_user|>",
     "<|im_assistant|>",
@@ -199,17 +210,13 @@
     "<|media_end|>",
     "<|media_pad|>"
   ],
-  "bos_token": "[BOS]",
-  "clean_up_tokenization_spaces": false,
-  "eos_token": "[EOS]",
-  "model_max_length": 1000000000000000019884624838656,
   "pad_token": "[PAD]",
-  "unk_token": "[UNK]",
   "tokenizer_class": "TikTokenTokenizer",
-  "auto_map": {
-    "AutoTokenizer": [
-      "tokenization_kimi.TikTokenTokenizer",
-      null
-    ]
-  }
 }

       "special": true
     }
   },
+  "auto_map": {
+    "AutoProcessor": "kimi_k25_processor.KimiK25Processor",
+    "AutoTokenizer": [
+      "tokenization_kimi.TikTokenTokenizer",
+      null
+    ]
+  },
+  "backend": "custom",
+  "bos_token": "[BOS]",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "[EOS]",
+  "extra_special_tokens": [
     "<|im_end|>",
     "<|im_user|>",
     "<|im_assistant|>",
     "<|media_end|>",
     "<|media_pad|>"
   ],
+  "is_local": true,
+  "local_files_only": false,
+  "model_max_length": 262144,
   "pad_token": "[PAD]",
+  "padding_side": "left",
+  "processor_class": "KimiK25Processor",
   "tokenizer_class": "TikTokenTokenizer",
+  "unk_token": "[UNK]",
+  "chat_template": "{%- macro render_content(msg) -%}\n    {%- set c = msg.get('content') -%}\n    {%- if c is string -%}\n      {{ c }}\n    {%- elif c is not none -%}\n      {% for content in c -%}\n        {% if content['type'] == 'image' or content['type'] == 'image_url' -%}\n          <|media_begin|>image<|media_content|><|media_pad|><|media_end|>\n        {% elif content['type'] == 'video' or content['type']== 'video_url'-%}\n          <|kimi_k25_video_placeholder|>\n        {% else -%}\n          {{ content['text'] }}\n        {%- endif -%}\n      {%- endfor -%}\n    {%- endif -%}\n{%- endmacro -%}\n\n{% macro set_roles(message) -%}\n  {%- set role_name =  message.get('name') or  message['role'] -%}\n  {%- if message['role'] == 'user' -%}\n    <|im_user|>{{role_name}}<|im_middle|>\n  {%- elif message['role'] == 'assistant' -%}\n    <|im_assistant|>{{role_name}}<|im_middle|>\n  {%- else -%}\n    <|im_system|>{{role_name}}<|im_middle|>\n  {%- endif -%}\n{%- endmacro -%}\n\n\n{%- macro render_toolcalls(message) -%}\n  <|tool_calls_section_begin|>\n  {%- for tool_call in message['tool_calls'] -%}\n    {%- set formatted_id = tool_call['id'] -%}\n    <|tool_call_begin|>{{ formatted_id }}<|tool_call_argument_begin|>{% if tool_call['function']['arguments'] is string %}{{ tool_call['function']['arguments'] }}{% else %}{{ tool_call['function']['arguments'] | tojson }}{% endif %}<|tool_call_end|>\n  {%- endfor -%}\n  <|tool_calls_section_end|>\n{%- endmacro -%}\n\n\n{%- set preserve_thinking = preserve_thinking | default(true) -%}\n{# Find last non-tool-call assistant message. If preserve_thinking, keep -1 so hist is empty and all msgs use suffix (retain reasoning). #}\n{%- set ns = namespace(last_non_tool_call_assistant_msg=-1) -%}\n{%- if not preserve_thinking -%}\n{%- for idx in range(messages|length-1, -1, -1) -%}\n    {%- if messages[idx]['role'] == 'assistant' and not messages[idx].get('tool_calls') -%}\n        {%- set ns.last_non_tool_call_assistant_msg = idx -%}\n        {%- break -%}\n    {%- endif -%}\n{%- endfor -%}\n{%- endif -%}\n\n{# split all messages into history & suffix, reasoning_content in suffix should be reserved.#}\n{%- set hist_msgs = messages[:ns.last_non_tool_call_assistant_msg+1] -%}\n{%- set suffix_msgs = messages[ns.last_non_tool_call_assistant_msg+1:] -%}\n\n{%- if tools -%}\n  {%- if tools_ts_str -%}\n    <|im_system|>tool_declare<|im_middle|>{{ tools_ts_str }}<|im_end|>\n  {%- else -%}\n    <|im_system|>tool_declare<|im_middle|>{{ tools | tojson(separators=(',', ':')) }}<|im_end|>\n  {%- endif -%}\n{%- endif -%}\n\n  \n{%- for message in hist_msgs -%}\n  {{set_roles(message)}}\n  {%- if message['role'] == 'assistant' -%}\n    <think></think>{{render_content(message)}}\n    {%- if message.get('tool_calls') -%}\n      {{render_toolcalls(message)}}\n    {%- endif -%}\n  {%- elif message['role'] == 'tool' -%}\n    {%- set tool_call_id = message.tool_call_id -%}\n    ## Return of {{ tool_call_id }}\n{{render_content(message)}}\n  {%- elif message['content'] is not none -%}\n    {{render_content(message)}}\n  {%- endif -%}\n  <|im_end|>\n{%- endfor -%}\n\n{%- for message in suffix_msgs -%}\n  {{set_roles(message)}}\n  {%- if message['role'] == 'assistant' -%}\n    {%- if thinking is defined and thinking is false and preserve_thinking is false -%}\n    <think></think>{{render_content(message)}}\n    {%- else -%}\n    {%- set rc = message.get('reasoning', message.get('reasoning_content', '')) -%}\n    <think>{{rc}}</think>{{render_content(message)}}\n    {%- endif -%}\n    {%- if message.get('tool_calls') -%}\n     {{render_toolcalls(message)}}\n    {%- endif -%}\n  {%- elif message['role'] == 'tool' -%}\n    {%- set tool_call_id = message.tool_call_id -%}\n    ## Return of {{ tool_call_id }}\n{{render_content(message)}}\n  {%- elif message['content'] is not none -%}\n    {{render_content(message)}}\n  {%- endif -%}\n  <|im_end|>\n{%- endfor -%}\n\n\n{%- if add_generation_prompt -%}\n  <|im_assistant|>assistant<|im_middle|>\n  {%- if thinking is defined and thinking is false -%}\n  <think></think>\n  {%- else -%}\n  <think>\n  {%- endif -%}\n{%- endif -%}"
 }