--- license: other license_name: modified-mit library_name: transformers base_model: - moonshotai/Kimi-K2-Thinking --- # Model Overview - **Model Architecture:** Kimi-K2-Thinking - **Input:** Text - **Output:** Text - **Supported Hardware Microarchitecture:** AMD MI300/MI355 - **ROCm**: 7.0 - **PyTorch**: 2.8.0 - **Transformers**: 4.53.0 - **Operating System(s):** Linux - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/) - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (V0.10) - **Weight quantization:** INT4 Per-Channel & FP8E4M3, Static - **Activation quantization:** FP8E4M3, Dynamic This model was built with moonshotai Kimi-K2-Thinking model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for INT4-FP8 quantization. # Model Quantization The model was quantized from [moonshotai/Kimi-K2-Thinking](https://huggingface.co/moonshotai/Kimi-K2-Thinking) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). # Deployment This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backends. ## Evaluation The model was evaluated on GSM8K benchmarks using the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) framework. ### Accuracy
Benchmark Kimi-K2-Thinking Kimi-K2-Thinking-W4A8(this model) Recovery
GSM8K 93.93 93.4 99.4%
### Reproduction The results of GSM8K were obtained using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and latest vLLM. Launch vLLM ``` MODEL_DIR=/data/amd/Kimi-K2-Thinking-W4A8 VLLM_ATTENTION_BACKEND="TRITON_MLA" VLLM_ROCM_USE_AITER=1 VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS=0 VLLM_ROCM_USE_AITER_FP4BMM=0 vllm serve $MODEL_DIR \ --port 8001 \ --trust-remote-code \ --gpu-memory-utilization 0.9 \ --tensor-parallel-size 8 \ --load-format "fastsafetensors" ``` GSM8K evaluation ``` MODEL_ARGS="model=/data/amd/Kimi-K2-Thinking-W4A8,base_url=http://localhost:8001/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=38768,temperature=0.6,top_p=0.95,add_bos_token=True,seed=$SEED,trust_remote_code=True" lm_eval \ --model local-completions \ --model_args $MODEL_ARGS \ --tasks gsm8k \ --num_fewshot 8 \ --batch_size auto ``` # License Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.