| | --- |
| | license: other |
| | license_name: modified-mit |
| | license_link: LICENSE |
| | base_model: |
| | - moonshotai/Kimi-K2-Instruct-0905 |
| | --- |
| | |
| | # Model Overview |
| |
|
| | - **Model Architecture:** Kimi-K2-Instruct |
| | - **Input:** Text |
| | - **Output:** Text |
| | - **Supported Hardware Microarchitecture:** AMD MI350/MI355 |
| | - **ROCm:** 7.0 |
| | - **Operating System(s):** Linux |
| | - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/) |
| | - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) |
| | - **Weight quantization:** MOE-only, OCP MXFP4, Static |
| | - **Activation quantization:** MOE-only, OCP MXFP4, Dynamic |
| | - **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup) |
| |
|
| | This model was built with Kimi-K2-Instruct model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for MXFP4 quantization. |
| |
|
| | # Model Quantization |
| |
|
| | The model was quantized from [unsloth/Kimi-K2-Instruct-0905-BF16](https://huggingface.co/unsloth/Kimi-K2-Instruct-0905-BF16) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). The weights and activations are quantized to MXFP4. |
| |
|
| | # Deployment |
| | ### Use with vLLM |
| |
|
| | This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend. |
| |
|
| | ## Evaluation |
| | The model was evaluated on GSM8K benchmarks. |
| |
|
| | ### Accuracy |
| |
|
| | <table> |
| | <tr> |
| | <td><strong>Benchmark</strong> |
| | </td> |
| | <td><strong>Kimi-K2-Instruct-0905 </strong> |
| | </td> |
| | <td><strong>Kimi-K2-Instruct-0905-MXFP4(this model)</strong> |
| | </td> |
| | <td><strong>Recovery</strong> |
| | </td> |
| | </tr> |
| | <tr> |
| | <td>GSM8K (strict-match) |
| | </td> |
| | <td>95.53 |
| | </td> |
| | <td>94.47 |
| | </td> |
| | <td>98.89% |
| | </td> |
| | </tr> |
| | </table> |
| |
|
| | ### Reproduction |
| |
|
| | The GSM8K results were obtained using the `lm-evaluation-harness` framework, based on the Docker image `rocm/vllm-private:vllm_dev_base_mxfp4_20260122`, with vLLM and lm-eval compiled and installed from source inside the image. |
| |
|
| | #### Launching server |
| | ``` |
| | export VLLM_ATTENTION_BACKEND="TRITON_MLA" |
| | export VLLM_ROCM_USE_AITER=1 |
| | export VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS=0 |
| | |
| | vllm serve amd/Kimi-K2-Instruct-0905-MXFP4 \ |
| | --port 8000 \ |
| | --served-model-name kimi-k2-mxfp4 \ |
| | --trust-remote-code \ |
| | --tensor-parallel-size 8 \ |
| | --enable-auto-tool-choice \ |
| | --tool-call-parser kimi_k2 |
| | ``` |
| |
|
| | #### Evaluating model in a new terminal |
| | ``` |
| | lm_eval \ |
| | --model local-completions \ |
| | --model_args "model=kimi-k2-mxfp4,base_url=http://0.0.0.0:8000/v1/completions,tokenized_requests=False,tokenizer_backend=None,num_concurrent=32" \ |
| | --tasks gsm8k \ |
| | --num_fewshot 5 \ |
| | --batch_size 1 |
| | ``` |
| |
|
| | # License |
| | Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved. |