amd
/

Kimi-K2-Thinking-MXFP4

@@ -3,12 +3,12 @@ license: other
 license_name: modified-mit
 license_link: LICENSE
 base_model:
-- moonshotai/Kimi-K2-Thinking
 ---
 # Model Overview
-- **Model Architecture:** Kimi-K2-Thinking
   - **Input:** Text
   - **Output:** Text
 - **Supported Hardware Microarchitecture:** AMD MI350/MI355
@@ -16,11 +16,12 @@ base_model:
 - **Operating System(s):** Linux
 - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/)
 - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (V0.11.1)
   - **Weight quantization:** MOE-only, OCP MXFP4, Static
   - **Activation quantization:** MOE-only, OCP MXFP4, Dynamic
 - **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
-This model was built with Kimi-K2-Thinking model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for MXFP4 quantization.
 # Model Quantization
@@ -29,7 +30,7 @@ The model was quantized from [unsloth/Kimi-K2-Thinking-BF16](https://huggingface
 **Quantization scripts:**
 ```
 cd Quark/examples/torch/language_modeling/llm_ptq/
-exclude_layers="*self_attn* *mlp.gate *lm_head *mlp.gate_proj *mlp.up_proj *mlp.down_proj *shared_experts*"
 python quantize_quark.py \
     --model_dir unsloth/Kimi-K2-Thinking-BF16 \
@@ -53,7 +54,7 @@ The model was evaluated on GSM8K benchmarks.
   <tr>
    <td><strong>Benchmark</strong>
    </td>
-   <td><strong>Kimi-K2-Thinking </strong>
    </td>
    <td><strong>Kimi-K2-Thinking-MXFP4(this model)</strong>
    </td>
@@ -61,13 +62,13 @@ The model was evaluated on GSM8K benchmarks.
    </td>
   </tr>
   <tr>
-   <td>GSM8K (strict-match)
    </td>
    <td>94.16
    </td>
-   <td>93.48
    </td>
-   <td>99.28%
    </td>
   </tr>
 </table>

 license_name: modified-mit
 license_link: LICENSE
 base_model:
+- unsloth/Kimi-K2-Thinking-BF16
 ---
 # Model Overview
+- **Model Architecture:** DeepseekV3ForCausalLM
   - **Input:** Text
   - **Output:** Text
 - **Supported Hardware Microarchitecture:** AMD MI350/MI355
 - **Operating System(s):** Linux
 - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/)
 - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (V0.11.1)
+  - **Quantized layers:** Experts, Shared_experts
   - **Weight quantization:** MOE-only, OCP MXFP4, Static
   - **Activation quantization:** MOE-only, OCP MXFP4, Dynamic
 - **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
+This model was built with Kimi-K2-Thinking-BF16 model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for MXFP4 quantization.
 # Model Quantization
 **Quantization scripts:**
 ```
 cd Quark/examples/torch/language_modeling/llm_ptq/
+exclude_layers="*self_attn* *mlp.gate *lm_head *mlp.gate_proj *mlp.up_proj *mlp.down_proj"
 python quantize_quark.py \
     --model_dir unsloth/Kimi-K2-Thinking-BF16 \
   <tr>
    <td><strong>Benchmark</strong>
    </td>
+   <td><strong>Kimi-K2-Thinking-BF16 </strong>
    </td>
    <td><strong>Kimi-K2-Thinking-MXFP4(this model)</strong>
    </td>
    </td>
   </tr>
   <tr>
+   <td>GSM8K (flexible-extract)
    </td>
    <td>94.16
    </td>
+   <td>93.03
    </td>
+   <td>98.80%
    </td>
   </tr>
 </table>