linzhao-amd's picture
Update README.md
e36ddc2 verified
metadata
license: other
license_name: modified-mit
license_link: LICENSE
base_model:
  - moonshotai/Kimi-K2-Thinking

Model Overview

  • Model Architecture: Kimi-K2-Thinking
    • Input: Text
    • Output: Text
  • Supported Hardware Microarchitecture: AMD MI350/MI355
  • ROCm: 7.0
  • Operating System(s): Linux
  • Inference Engine: vLLM
  • Model Optimizer: AMD-Quark
    • Weight quantization: MOE-only, OCP MXFP4, Static
    • Activation quantization: MOE-only, OCP MXFP4, Dynamic
  • Calibration Dataset: Pile

This model was built with Kimi-K2-Thinking model by applying AMD-Quark for MXFP4 quantization.

Model Quantization

The model was quantized from moonshotai/Kimi-K2-Thinking using AMD-Quark. The weights and activations are quantized to MXFP4.

Deployment

Use with vLLM

This model can be deployed efficiently using the vLLM backend.

Evaluation

The model was evaluated on GSM8K benchmarks.

Accuracy

Benchmark Kimi-K2-Thinking Kimi-K2-Thinking-MXFP4(this model) Recovery
GSM8K (strict-match) 94.16 93.48 99.28%

Reproduction

The GSM8K results were obtained using the lm-evaluation-harness framework, based on the Docker image rocm/vllm-private:vllm_dev_base_mxfp4_20260122, with vLLM, lm-eval and amd-quark compiled and installed from source inside the image.

Launching server

export VLLM_ATTENTION_BACKEND="TRITON_MLA"
export VLLM_ROCM_USE_AITER=1
export VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS=0

vllm serve amd/Kimi-K2-Thinking-MXFP4 \
  --tensor-parallel-size 8 \
  --enable-auto-tool-choice \
  --tool-call-parser kimi_k2 \
  --reasoning-parser kimi_k2 \
  --trust-remote-code

Evaluating model in a new terminal

lm_eval \
  --model local-completions \
  --model_args "model=amd/Kimi-K2-Thinking-MXFP4,base_url=http://0.0.0.0:8000/v1/completions,tokenized_requests=False,tokenizer_backend=None,num_concurrent=32" \
  --tasks gsm8k \
  --num_fewshot 5 \
  --batch_size 1

License

Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.