jiaxwang commited on
Commit
eb49a2b
·
verified ·
1 Parent(s): 1159846

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -0
README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: modified-mit
4
+ license_link: LICENSE
5
+ base_model:
6
+ - moonshotai/Kimi-K2-Instruct-0905
7
+ ---
8
+
9
+ # Model Overview
10
+
11
+ - **Model Architecture:** Kimi-K2-Instruct
12
+ - **Input:** Text
13
+ - **Output:** Text
14
+ - **Supported Hardware Microarchitecture:** AMD MI350/MI355
15
+ - **ROCm:** 7.0
16
+ - **Operating System(s):** Linux
17
+ - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/)
18
+ - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html)
19
+ - **Weight quantization:** MOE-only, OCP MXFP4, Static
20
+ - **Activation quantization:** MOE-only, OCP MXFP4, Dynamic
21
+ - **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
22
+
23
+ This model was built with Kimi-K2-Thinking model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for MXFP4 quantization.
24
+
25
+ # Model Quantization
26
+
27
+ The model was quantized from [moonshotai/Kimi-K2-Instruct-0905](https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). The weights and activations are quantized to MXFP4.
28
+
29
+ # Deployment
30
+ ### Use with vLLM
31
+
32
+ This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend.
33
+
34
+ ## Evaluation
35
+ The model was evaluated on GSM8K benchmarks.
36
+
37
+ ### Accuracy
38
+
39
+ <table>
40
+ <tr>
41
+ <td><strong>Benchmark</strong>
42
+ </td>
43
+ <td><strong>Kimi-K2-Instruct-0905 </strong>
44
+ </td>
45
+ <td><strong>Kimi-K2-Instruct-0905-MXFP4(this model)</strong>
46
+ </td>
47
+ <td><strong>Recovery</strong>
48
+ </td>
49
+ </tr>
50
+ <tr>
51
+ <td>GSM8K
52
+ </td>
53
+ <td>95.53
54
+ </td>
55
+ <td>94.47
56
+ </td>
57
+ <td>98.89%
58
+ </td>
59
+ </tr>
60
+ </table>
61
+
62
+ ### Reproduction
63
+
64
+ The GSM8K results were obtained using the `lm-evaluation-harness` framework, based on the Docker image `rocm/vllm-private:vllm_dev_base_mxfp4_20260122`, with vLLM and lm-eval compiled and installed from source inside the image.
65
+
66
+ #### Launching server
67
+ ```
68
+ export VLLM_ATTENTION_BACKEND="TRITON_MLA"
69
+ export VLLM_ROCM_USE_AITER=1
70
+ export VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS=0
71
+
72
+ vllm serve amd/Kimi-K2-Instruct-0905-MXFP4 \
73
+ --port 8000 \
74
+ --served-model-name kimi-k2-mxfp4 \
75
+ --trust-remote-code \
76
+ --tensor-parallel-size 8 \
77
+ --enable-auto-tool-choice \
78
+ --tool-call-parser kimi_k2
79
+ ```
80
+
81
+ #### Evaluating model in a new terminal
82
+ ```
83
+ lm_eval \
84
+ --model local-completions \
85
+ --model_args "model=kimi-k2-mxfp4,base_url=http://0.0.0.0:8000/v1/completions,tokenized_requests=False,tokenizer_backend=None,num_concurrent=32" \
86
+ --tasks gsm8k \
87
+ --num_fewshot 5 \
88
+ --batch_size 1
89
+ ```
90
+
91
+ # License
92
+ Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.