mendaxia commited on
Commit
bf2eca9
·
verified ·
1 Parent(s): 21a9638

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +191 -3
README.md CHANGED
@@ -1,3 +1,191 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ ---
4
+ <!-- markdownlint-disable first-line-h1 -->
5
+ <!-- markdownlint-disable html -->
6
+ <!-- markdownlint-disable no-duplicate-header -->
7
+
8
+ <div align="center">
9
+ <div style="display: flex; align-items: center; justify-content: center;">
10
+ <img src="figures/logo.png" style="height: 1.8em; margin-right: 10px;" alt="Kimi K2: Open Agentic Intellignece"/>
11
+ <p style="margin: 0;">Kimi K2: Open Agentic Intellignece</p>
12
+ </div>
13
+ </div>
14
+ <hr>
15
+ <div align="center" style="line-height: 1;">
16
+ <a href="https://www.moonshot.ai" target="_blank" style="margin: 2px;">
17
+ <img alt="Homepage" src="https://img.shields.io/badge/Homepage-Kimi%20K2-blue?logo=K&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
18
+ </a>
19
+ <a href="https://kimi.com/" target="_blank" style="margin: 2px;">
20
+ <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-Kimi%20K2-ff6b6b?color=ff6b6b&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
21
+ </a>
22
+ <a href="https://huggingface.co/kimi-ai" target="_blank" style="margin: 2px;">
23
+ <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Kimi%20K2-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
24
+ </a>
25
+ </div>
26
+
27
+ <div align="center" style="line-height: 1;">
28
+ <a href="https://github.com/kimi-ai/Kimi-V1/blob/main/assets/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
29
+ <img alt="Wechat" src="https://img.shields.io/badge/WeChat-Kimi%20K2-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
30
+ </a>
31
+ <a href="https://x.com/kimi_moonshot" target="_blank" style="margin: 2px;">
32
+ <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-Kimi.AI-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
33
+ </a>
34
+ </div>
35
+
36
+ <div align="center" style="line-height: 1;">
37
+ <a href="https://github.com/moonshotai/Kimi-K2/blob/main/LICENSE" style="margin: 2px;">
38
+ <img alt="Code License" src="https://img.shields.io/badge/License-Modified&nbsp;MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
39
+ </a>
40
+ </div>
41
+
42
+ <p align="center">
43
+ <b>Paper Link (comming soon)</b>👁️
44
+ </p>
45
+
46
+
47
+ ## 1. Model Introduction
48
+ Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Training with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities.
49
+ #### Key Features
50
+ - Advanced Architecture: Based on DeepSeek-V3 with enhanced MoE sparsity and long-context efficiency
51
+ - MuonClip Optimizer: Novel training optimization technique that prevents attention logit explosions while maintaining performance
52
+ - Agentic Intelligence: Specifically designed for tool use, reasoning, and autonomous problem-solving
53
+ - Large-Scale Training: Pre-trained on 15.5T tokens with zero training instability
54
+ ### Model Variants
55
+ #### Kimi-K2-Base
56
+ The foundation model optimized for researchers and developers seeking full control for fine-tuning and custom solutions. Demonstrates strong performance across knowledge-intensive and reasoning benchmarks.
57
+ #### Kimi-K2-Instruct
58
+ The post-trained model optimized for general-purpose chat and agentic experiences. Features reflex-grade responses without long thinking delays, making it ideal for real-time applications.
59
+
60
+ ### Technical Innovations
61
+ - **MuonClip Optimizer**: Kimi K2 introduces the MuonClip optimizer, which addresses training instability through the qk-clip technique. This method directly rescales query and key projection weight matrices after Muon updates, controlling attention logit scales at the source:
62
+ - **Stabilization**: Prevents logit explosions while maintaining downstream performance
63
+ - **Efficiency**: Enables stable training at unprecedented scale
64
+ - **Generality**: Applicable to other stabilization use cases
65
+ - **Agentic Capabilities**
66
+ - Large-Scale Agentic Data Synthesis: Comprehensive pipeline for simulating real-world tool-using scenarios across hundreds of domains with thousands of tools
67
+ - General Reinforcement Learning: Self-judging mechanism for both verifiable and non-verifiable tasks, enabling scalable rubric-based feedback
68
+
69
+
70
+ <p align="center">
71
+ <img width="80%" src="figures/benchmark.png">
72
+ </p>
73
+
74
+ ## 2. Model Summary
75
+
76
+
77
+ <div align="center">
78
+
79
+ | | **Kimi K2-Base** | **Kimi K2** |
80
+ | :--- | :---: | :---: |
81
+ | **Architecture** | Mixture-of-Experts (MoE) | Mixture-of-Experts (MoE) |
82
+ | **Total Parameters** | 1T | 1T |
83
+ | **Activated Parameters** | 32B | 32B |
84
+ | **Number of Layers** | 80 | 80 |
85
+ | **Hidden Dimension** | 8192 | 8192 |
86
+ | **Number of Attention Heads** | 64 | 64 |
87
+ | **Number of Experts** | 64 | 64 |
88
+ | **Experts per Token** | 8 | 8 |
89
+ | **Vocabulary Size** | 128K | 128K |
90
+ | **Context Length** | 64K | 64K |
91
+ | **Attention Mechanism** | MLA | MLA |
92
+ | **Position Encoding** | RoPE | RoPE |
93
+ | **Activation Function** | SwiGLU | SwiGLU |
94
+ </div>
95
+
96
+
97
+ For deployment, please refer to Section 5: [How to Run Locally](#6-how-to-run-locally) for detailed instructions.
98
+
99
+ ## 3. Evaluation Results
100
+
101
+ ### Base Model
102
+
103
+ <div align="center">
104
+
105
+ | Benchmark | Shots | Qwen2.5-72B | Llama4-maverick | Deepseek-V3-Base | Kimi-K2-Base |
106
+ |-----------|-------|-------------|---------------------|------------------|--------------|
107
+ | MMLU | 5 | 86.08 | 84.87 | 87.1 | 87.79 |
108
+ | MMLU-pro | 5 | 62.8 | 63.47 | 60.59 | 69.17 |
109
+ | MMLU-redux-2.0 | 5 | 87.77 | 88.18 | 89.53 | 90.17 |
110
+ | SimpleQA | 5 | 10.31 | 23.74 | 26.49 | 35.25 |
111
+ | TriviaQA | 5 | 76.03 | 79.25 | 84.11 | 85.09 |
112
+ | SuperGPQA | 5 | 34.23 | 38.84 | 39.2 | 44.67 |
113
+ | C-Eval | 5 | 90.86 | 80.91 | 90.04 | 92.5 |
114
+ | CSimpleQA | 5 | 50.53 | 53.47 | 72.13 | 77.57 |
115
+ | LiveCodeBench | 1 | 22.29 | 25.14 | 24.57 | 26.29 |
116
+ | EvalPlus | - | 66.04 | 65.48 | 65.61 | 80.33 |
117
+ | MATH | 4 | 62.68 | 63.02 | 61.7 | 70.22 |
118
+ | GSM8k | 8 | 90.37 | 86.35 | 91.66 | 92.12 |
119
+
120
+
121
+ </div>
122
+
123
+ ### Chat Model
124
+ <div align="center">
125
+
126
+ | | **Benchmark (Metric)** | **Kimi-V1** |
127
+ |---|---------------------|-------------|
128
+ | English | MMLU (EM) | [待填写] |
129
+ | | MMLU-Redux (EM) | [待填写] |
130
+ | | MMLU-Pro (EM) | [待填写] |
131
+ | | DROP (3-shot F1) | [待填写] |
132
+ | | IF-Eval (Prompt Strict) | [待填写] |
133
+ | | GPQA-Diamond (Pass@1) | [待填写] |
134
+ | | SimpleQA (Correct) | [待填写] |
135
+ | | FRAMES (Acc.) | [待填写] |
136
+ | | LongBench v2 (Acc.) | [待填写] |
137
+ | Code | HumanEval-Mul (Pass@1) | [待填写] |
138
+ | | LiveCodeBench (Pass@1-COT) | [待填写] |
139
+ | | LiveCodeBench (Pass@1) | [待填写] |
140
+ | | Codeforces (Percentile) | [待填写] |
141
+ | | SWE Verified (Resolved) | [待填写] |
142
+ | | Aider-Edit (Acc.) | [待填写] |
143
+ | | Aider-Polyglot (Acc.) | [待填写] |
144
+ | Math | K2ME 2024 (Pass@1) | [待填写] |
145
+ | | MATH-500 (EM) | [待填写] |
146
+ | | CNMO 2024 (Pass@1) | [待填写] |
147
+ | Chinese | CLUEWSC (EM) | [待填写] |
148
+ | | C-Eval (EM) | [待填写] |
149
+ | | C-SimpleQA (Correct) | [待填写] |
150
+
151
+ </div>
152
+ Evaluation details can be found in our technique report.
153
+
154
+ #### Open Ended Generation Evaluation
155
+
156
+ <div align="center">
157
+
158
+ | Model | Arena-Hard | AlpacaEval 2.0 |
159
+ |-------|------------|----------------|
160
+ | Kimi-K2 | [待填写] | [待填写] |
161
+
162
+ Note: Open-ended conversation evaluations demonstrate Kimi-V1's capabilities in natural dialogue and creative tasks.
163
+ </div>
164
+
165
+ ## 4. Chat Website & API Platform
166
+ You can chat with Kimi-K2 on Kimi's official website: [chat.kimi.com](https://kimi.com)
167
+
168
+
169
+ ## 5. How to run locally
170
+
171
+ Start chatting with Kimi-V1:
172
+
173
+ ```shell
174
+ python generate.py --ckpt-path /path/to/Kimi-V1-Demo --config configs/config_45B.json --interactive --temperature 0.7 --max-new-tokens 200
175
+ ```
176
+
177
+ Or perform batch inference:
178
+
179
+ ```shell
180
+ python generate.py --ckpt-path /path/to/Kimi-V1-Demo --config configs/config_45B.json --input-file $FILE
181
+ ```
182
+
183
+ ### Inference with vLLM (recommended)
184
+
185
+ [vLLM](https://github.com/vllm-project/vllm) provides efficient inference support for Kimi-V1 with advanced parallelization techniques.
186
+
187
+ ## 5. License
188
+ This code repository is licensed under [the MIT License](LICENSE-CODE). The use of Kimi-V1 Base/Chat models is subject to [the Model License](LICENSE-MODEL). Kimi-V1 series supports commercial use under the specified terms.
189
+
190
+ ## 6. Contact
191
+ If you have any questions, please raise an issue or contact us at [service@kimi.com](service@kimi.com).