Files changed (1) hide show
  1. README.md +9 -8
README.md CHANGED
@@ -3,12 +3,12 @@ license: other
3
  license_name: modified-mit
4
  license_link: LICENSE
5
  base_model:
6
- - moonshotai/Kimi-K2-Thinking
7
  ---
8
 
9
  # Model Overview
10
 
11
- - **Model Architecture:** Kimi-K2-Thinking
12
  - **Input:** Text
13
  - **Output:** Text
14
  - **Supported Hardware Microarchitecture:** AMD MI350/MI355
@@ -16,11 +16,12 @@ base_model:
16
  - **Operating System(s):** Linux
17
  - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/)
18
  - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (V0.11.1)
 
19
  - **Weight quantization:** MOE-only, OCP MXFP4, Static
20
  - **Activation quantization:** MOE-only, OCP MXFP4, Dynamic
21
  - **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
22
 
23
- This model was built with Kimi-K2-Thinking model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for MXFP4 quantization.
24
 
25
  # Model Quantization
26
 
@@ -29,7 +30,7 @@ The model was quantized from [unsloth/Kimi-K2-Thinking-BF16](https://huggingface
29
  **Quantization scripts:**
30
  ```
31
  cd Quark/examples/torch/language_modeling/llm_ptq/
32
- exclude_layers="*self_attn* *mlp.gate *lm_head *mlp.gate_proj *mlp.up_proj *mlp.down_proj *shared_experts*"
33
 
34
  python quantize_quark.py \
35
  --model_dir unsloth/Kimi-K2-Thinking-BF16 \
@@ -53,7 +54,7 @@ The model was evaluated on GSM8K benchmarks.
53
  <tr>
54
  <td><strong>Benchmark</strong>
55
  </td>
56
- <td><strong>Kimi-K2-Thinking </strong>
57
  </td>
58
  <td><strong>Kimi-K2-Thinking-MXFP4(this model)</strong>
59
  </td>
@@ -61,13 +62,13 @@ The model was evaluated on GSM8K benchmarks.
61
  </td>
62
  </tr>
63
  <tr>
64
- <td>GSM8K (strict-match)
65
  </td>
66
  <td>94.16
67
  </td>
68
- <td>93.48
69
  </td>
70
- <td>99.28%
71
  </td>
72
  </tr>
73
  </table>
 
3
  license_name: modified-mit
4
  license_link: LICENSE
5
  base_model:
6
+ - unsloth/Kimi-K2-Thinking-BF16
7
  ---
8
 
9
  # Model Overview
10
 
11
+ - **Model Architecture:** DeepseekV3ForCausalLM
12
  - **Input:** Text
13
  - **Output:** Text
14
  - **Supported Hardware Microarchitecture:** AMD MI350/MI355
 
16
  - **Operating System(s):** Linux
17
  - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/)
18
  - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (V0.11.1)
19
+ - **Quantized layers:** Experts, Shared_experts
20
  - **Weight quantization:** MOE-only, OCP MXFP4, Static
21
  - **Activation quantization:** MOE-only, OCP MXFP4, Dynamic
22
  - **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
23
 
24
+ This model was built with Kimi-K2-Thinking-BF16 model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for MXFP4 quantization.
25
 
26
  # Model Quantization
27
 
 
30
  **Quantization scripts:**
31
  ```
32
  cd Quark/examples/torch/language_modeling/llm_ptq/
33
+ exclude_layers="*self_attn* *mlp.gate *lm_head *mlp.gate_proj *mlp.up_proj *mlp.down_proj"
34
 
35
  python quantize_quark.py \
36
  --model_dir unsloth/Kimi-K2-Thinking-BF16 \
 
54
  <tr>
55
  <td><strong>Benchmark</strong>
56
  </td>
57
+ <td><strong>Kimi-K2-Thinking-BF16 </strong>
58
  </td>
59
  <td><strong>Kimi-K2-Thinking-MXFP4(this model)</strong>
60
  </td>
 
62
  </td>
63
  </tr>
64
  <tr>
65
+ <td>GSM8K (flexible-extract)
66
  </td>
67
  <td>94.16
68
  </td>
69
+ <td>93.03
70
  </td>
71
+ <td>98.80%
72
  </td>
73
  </tr>
74
  </table>