amd
/

Kimi-K2-Thinking-MXFP4

@@ -2,8 +2,104 @@
 license: other
 license_name: modified-mit
 license_link: LICENSE
 ---
-# Disclaimer
-This model is provided for experimental purposes only. Its accuracy, stability, and suitability for deployment are not guaranteed. Users are advised to independently evaluate the model before any practical or production use.

 license: other
 license_name: modified-mit
 license_link: LICENSE
+base_model:
+- moonshotai/Kimi-K2-Thinking
 ---
+# Model Overview
+- **Model Architecture:** Kimi-K2-Thinking
+  - **Input:** Text
+  - **Output:** Text
+- **Supported Hardware Microarchitecture:** AMD MI350/MI355
+- **ROCm**: 7.0
+- **Operating System(s):** Linux
+- **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/)
+- **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html)
+  - **Weight quantization:** OCP MXFP4, Static
+  - **Activation quantization:** OCP MXFP4, Dynamic
+- **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
+This model was built with Kimi-K2-Thinking model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for MXFP4 quantization.
+# Model Quantization
+The model was quantized from [moonshotai/Kimi-K2-Thinking](https://huggingface.co/moonshotai/Kimi-K2-Thinking) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). The weights and activations are quantized to MXFP4.
+**Quantization scripts:**
+```
+cd Quark/examples/torch/language_modeling/llm_ptq/
+exclude_layers="*self_attn* *mlp.gate *lm_head *mlp.gate_proj *mlp.up_proj *mlp.down_proj *shared_experts.*"
+python3 quantize_quark.py \
+    --model_dir moonshotai/Kimi-K2-Thinking \
+    --quant_scheme mxfp4 \
+    --exclude_layers $exclude_layers \
+    --num_calib_data 128 \
+    --output_dir amd/Kimi-K2-Thinking-MXFP4 \
+    --model_export hf_format \
+    --device cpu
+```
+# Deployment
+### Use with vLLM
+This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend.
+## Evaluation
+The model was evaluated on GSM8K benchmarks.
+### Accuracy
+<table>
+  <tr>
+   <td><strong>Benchmark</strong>
+   </td>
+   <td><strong>Kimi-K2-Thinking </strong>
+   </td>
+   <td><strong>Kimi-K2-Thinking-MXFP4(this model)</strong>
+   </td>
+   <td><strong>Recovery</strong>
+   </td>
+  </tr>
+  <tr>
+   <td>GSM8K
+   </td>
+   <td>94.16
+   </td>
+   <td>93.48
+   </td>
+   <td>99.28%
+   </td>
+  </tr>
+</table>
+### Reproduction
+The GSM8K results were obtained using the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness.git) framework, based on the Docker image `rocm/vllm-dev:base`, with vLLM and lm-eval compiled and installed from source inside the container.
+#### Launching server
+```
+export VLLM_ATTENTION_BACKEND="TRITON_MLA"
+export VLLM_ROCM_USE_AITER=1
+export VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS=0
+vllm serve amd/Kimi-K2-Thinking-MXFP4 \
+  --tensor-parallel-size 8 \
+  --enable-auto-tool-choice \
+  --tool-call-parser kimi_k2 \
+  --reasoning-parser kimi_k2 \
+  --trust-remote-code
+```
+#### Evaluating model in a new terminal
+```
+lm_eval \
+  --model local-completions \
+  --model_args "model=amd/Kimi-K2-Thinking-MXFP4,base_url=http://0.0.0.0:8000/v1/completions,tokenized_requests=False,tokenizer_backend=None,num_concurrent=32" \
+  --tasks gsm8k \
+  --num_fewshot 5 \
+  --batch_size 1
+```
+# License
+Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,97 @@

+{%- macro render_content(msg) -%}
+    {%- set c = msg.get('content') -%}
+    {%- if c is string -%}
+      {{ c }}
+    {%- elif c is not none -%}
+      {% for content in c -%}
+        {% if content['type'] == 'image' or 'image' in content or 'image_url' in content -%}
+          <|media_start|>image<|media_content|><|media_pad|><|media_end|>
+        {% else -%}
+          {{ content['text'] }}
+        {%- endif -%}
+      {%- endfor -%}
+    {%- endif -%}
+{%- endmacro -%}
+{% macro set_roles(message) -%}
+  {%- set role_name =  message.get('name') or  message['role'] -%}
+  {%- if message['role'] == 'user' -%}
+    <|im_user|>{{role_name}}<|im_middle|>
+  {%- elif message['role'] == 'assistant' -%}
+    <|im_assistant|>{{role_name}}<|im_middle|>
+  {%- else -%}
+    <|im_system|>{{role_name}}<|im_middle|>
+  {%- endif -%}
+{%- endmacro -%}
+{%- macro render_toolcalls(message) -%}
+  <|tool_calls_section_begin|>
+  {%- for tool_call in message['tool_calls'] -%}
+    {%- set formatted_id = tool_call['id'] -%}
+    <|tool_call_begin|>{{ formatted_id }}<|tool_call_argument_begin|>{% if tool_call['function']['arguments'] is string %}{{ tool_call['function']['arguments'] }}{% else %}{{ tool_call['function']['arguments'] | tojson }}{% endif %}<|tool_call_end|>
+  {%- endfor -%}
+  <|tool_calls_section_end|>
+{%- endmacro -%}
+{# Find last non-tool-call assisitant message #}
+{%- set ns = namespace(last_non_tool_call_assistant_msg=-1) -%}
+{%- for idx in range(messages|length-1, -1, -1) -%}
+    {%- if messages[idx]['role'] == 'assistant' and not messages[idx].get('tool_calls') -%}
+        {%- set ns.last_non_tool_call_assistant_msg = idx -%}
+        {%- break -%}
+    {%- endif -%}
+{%- endfor -%}
+{# split all messages into history & suffix, reasoning_content in suffix should be reserved.#}
+{%- set hist_msgs = messages[:ns.last_non_tool_call_assistant_msg+1] -%}
+{%- set suffix_msgs = messages[ns.last_non_tool_call_assistant_msg+1:] -%}
+{%- if tools -%}
+  <|im_system|>tool_declare<|im_middle|>{{ tools | tojson(separators=(',', ':')) }}<|im_end|>
+{%- endif -%}
+{%- if messages|length == 0 or messages[0]['role'] != 'system' -%}
+  <|im_system|>system<|im_middle|>You are Kimi, an AI assistant created by Moonshot AI.<|im_end|>
+{%- endif -%}
+{%- for message in hist_msgs -%}
+  {{set_roles(message)}}
+  {%- if message['role'] == 'assistant' -%}
+    <think></think>{{render_content(message)}}
+    {%- if message.get('tool_calls') -%}
+      {{render_toolcalls(message)}}
+    {%- endif -%}
+  {%- elif message['role'] == 'tool' -%}
+    {%- set tool_call_id = message.tool_call_id -%}
+    ## Return of {{ tool_call_id }}
+{{render_content(message)}}
+  {%- elif message['content'] is not none -%}
+    {{render_content(message)}}
+  {%- endif -%}
+  <|im_end|>
+{%- endfor -%}
+{%- for message in suffix_msgs -%}
+  {{set_roles(message)}}
+  {%- if message['role'] == 'assistant' -%}
+    {%- set rc = message.get('reasoning_content', '') -%}
+    <think>{{rc}}</think>{{render_content(message)}}
+    {%- if message.get('tool_calls') -%}
+     {{render_toolcalls(message)}}
+    {%- endif -%}
+  {%- elif message['role'] == 'tool' -%}
+    {%- set tool_call_id = message.tool_call_id -%}
+    ## Return of {{ tool_call_id }}
+{{render_content(message)}}
+  {%- elif message['content'] is not none -%}
+    {{render_content(message)}}
+  {%- endif -%}
+  <|im_end|>
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+  <|im_assistant|>assistant<|im_middle|>
+{%- endif -%}