moonshotai
/

Kimi-K2-Base

+---
+library_name: transformers
+---
+<!-- markdownlint-disable first-line-h1 -->
+<!-- markdownlint-disable html -->
+<!-- markdownlint-disable no-duplicate-header -->
+<div align="center">
+  <div style="display: flex; align-items: center; justify-content: center;">
+    <img src="figures/logo.png" style="height: 1.8em; margin-right: 10px;" alt="Kimi K2: Open Agentic Intellignece"/>
+    <p style="margin: 0;">Kimi K2: Open Agentic Intellignece</p>
+  </div>
+</div>
+<hr>
+<div align="center" style="line-height: 1;">
+  <a href="https://www.moonshot.ai" target="_blank" style="margin: 2px;">
+    <img alt="Homepage" src="https://img.shields.io/badge/Homepage-Kimi%20K2-blue?logo=K&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://kimi.com/" target="_blank" style="margin: 2px;">
+    <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-Kimi%20K2-ff6b6b?color=ff6b6b&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://huggingface.co/kimi-ai" target="_blank" style="margin: 2px;">
+    <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Kimi%20K2-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+</div>
+<div align="center" style="line-height: 1;">
+  <a href="https://github.com/kimi-ai/Kimi-V1/blob/main/assets/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
+    <img alt="Wechat" src="https://img.shields.io/badge/WeChat-Kimi%20K2-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://x.com/kimi_moonshot" target="_blank" style="margin: 2px;">
+    <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-Kimi.AI-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+</div>
+<div align="center" style="line-height: 1;">
+  <a href="https://github.com/moonshotai/Kimi-K2/blob/main/LICENSE" style="margin: 2px;">
+    <img alt="Code License" src="https://img.shields.io/badge/License-Modified&nbsp;MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+</div>
+<p align="center">
+  <b>Paper Link (comming soon)</b>👁️
+</p>
+## 1. Model Introduction
+Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Training with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities.
+#### Key Features
+- Advanced Architecture: Based on DeepSeek-V3 with enhanced MoE sparsity and long-context efficiency
+- MuonClip Optimizer: Novel training optimization technique that prevents attention logit explosions while maintaining performance
+- Agentic Intelligence: Specifically designed for tool use, reasoning, and autonomous problem-solving
+- Large-Scale Training: Pre-trained on 15.5T tokens with zero training instability
+### Model Variants
+#### Kimi-K2-Base
+The foundation model optimized for researchers and developers seeking full control for fine-tuning and custom solutions. Demonstrates strong performance across knowledge-intensive and reasoning benchmarks.
+#### Kimi-K2-Instruct
+The post-trained model optimized for general-purpose chat and agentic experiences. Features reflex-grade responses without long thinking delays, making it ideal for real-time applications.
+### Technical Innovations
+- **MuonClip Optimizer**: Kimi K2 introduces the MuonClip optimizer, which addresses training instability through the qk-clip technique. This method directly rescales query and key projection weight matrices after Muon updates, controlling attention logit scales at the source:
+    - **Stabilization**: Prevents logit explosions while maintaining downstream performance
+    - **Efficiency**: Enables stable training at unprecedented scale
+    - **Generality**: Applicable to other stabilization use cases
+- **Agentic Capabilities**
+    - Large-Scale Agentic Data Synthesis: Comprehensive pipeline for simulating real-world tool-using scenarios across hundreds of domains with thousands of tools
+    - General Reinforcement Learning: Self-judging mechanism for both verifiable and non-verifiable tasks, enabling scalable rubric-based feedback
+<p align="center">
+  <img width="80%" src="figures/benchmark.png">
+</p>
+## 2. Model Summary
+<div align="center">
+|  | **Kimi K2-Base** | **Kimi K2** |
+| :--- | :---: | :---: |
+| **Architecture** | Mixture-of-Experts (MoE) | Mixture-of-Experts (MoE) |
+| **Total Parameters** | 1T | 1T |
+| **Activated Parameters** | 32B | 32B |
+| **Number of Layers** | 80 | 80 |
+| **Hidden Dimension** | 8192 | 8192 |
+| **Number of Attention Heads** | 64 | 64 |
+| **Number of Experts** | 64 | 64 |
+| **Experts per Token** | 8 | 8 |
+| **Vocabulary Size** | 128K | 128K |
+| **Context Length** | 64K | 64K |
+| **Attention Mechanism** | MLA | MLA |
+| **Position Encoding** | RoPE | RoPE |
+| **Activation Function** | SwiGLU | SwiGLU |
+</div>
+For deployment, please refer to Section 5: [How to Run Locally](#6-how-to-run-locally) for detailed instructions.
+## 3. Evaluation Results
+### Base Model
+<div align="center">
+| Benchmark | Shots | Qwen2.5-72B | Llama4-maverick | Deepseek-V3-Base | Kimi-K2-Base |
+|-----------|-------|-------------|---------------------|------------------|--------------|
+| MMLU | 5 | 86.08 | 84.87 | 87.1 | 87.79 |
+| MMLU-pro | 5 | 62.8 | 63.47 | 60.59 | 69.17 |
+| MMLU-redux-2.0 | 5 | 87.77 | 88.18 | 89.53 | 90.17 |
+| SimpleQA | 5 | 10.31 | 23.74 | 26.49 | 35.25 |
+| TriviaQA | 5 | 76.03 | 79.25 | 84.11 | 85.09 |
+| SuperGPQA | 5 | 34.23 | 38.84 | 39.2 | 44.67 |
+| C-Eval | 5 | 90.86 | 80.91 | 90.04 | 92.5 |
+| CSimpleQA | 5 | 50.53 | 53.47 | 72.13 | 77.57 |
+| LiveCodeBench | 1 | 22.29 | 25.14 | 24.57 | 26.29 |
+| EvalPlus | - | 66.04 | 65.48 | 65.61 | 80.33 |
+| MATH | 4 | 62.68 | 63.02 | 61.7 | 70.22 |
+| GSM8k | 8 | 90.37 | 86.35 | 91.66 | 92.12 |
+</div>
+### Chat Model
+<div align="center">
+| | **Benchmark (Metric)** | **Kimi-V1** |
+|---|---------------------|-------------|
+| English | MMLU (EM) | [待填写] |
+| | MMLU-Redux (EM) | [待填写] |
+| | MMLU-Pro (EM) | [待填写] |
+| | DROP (3-shot F1) | [待填写] |
+| | IF-Eval (Prompt Strict) | [待填写] |
+| | GPQA-Diamond (Pass@1) | [待填写] |
+| | SimpleQA (Correct) | [待填写] |
+| | FRAMES (Acc.) | [待填写] |
+| | LongBench v2 (Acc.) | [待填写] |
+| Code | HumanEval-Mul (Pass@1) | [待填写] |
+| | LiveCodeBench (Pass@1-COT) | [待填写] |
+| | LiveCodeBench (Pass@1) | [待填写] |
+| | Codeforces (Percentile) | [待填写] |
+| | SWE Verified (Resolved) | [待填写] |
+| | Aider-Edit (Acc.) | [待填写] |
+| | Aider-Polyglot (Acc.) | [待填写] |
+| Math | K2ME 2024 (Pass@1) | [待填写] |
+| | MATH-500 (EM) | [待填写] |
+| | CNMO 2024 (Pass@1) | [待填写] |
+| Chinese | CLUEWSC (EM) | [待填写] |
+| | C-Eval (EM) | [待填写] |
+| | C-SimpleQA (Correct) | [待填写] |
+</div>
+Evaluation details can be found in our technique report.
+#### Open Ended Generation Evaluation
+<div align="center">
+| Model | Arena-Hard | AlpacaEval 2.0 |
+|-------|------------|----------------|
+| Kimi-K2 | [待填写] | [待填写] |
+Note: Open-ended conversation evaluations demonstrate Kimi-V1's capabilities in natural dialogue and creative tasks.
+</div>
+## 4. Chat Website & API Platform
+You can chat with Kimi-K2 on Kimi's official website: [chat.kimi.com](https://kimi.com)
+## 5. How to run locally
+Start chatting with Kimi-V1:
+```shell
+python generate.py --ckpt-path /path/to/Kimi-V1-Demo --config configs/config_45B.json --interactive --temperature 0.7 --max-new-tokens 200
+```
+Or perform batch inference:
+```shell
+python generate.py --ckpt-path /path/to/Kimi-V1-Demo --config configs/config_45B.json --input-file $FILE
+```
+### Inference with vLLM (recommended)
+[vLLM](https://github.com/vllm-project/vllm) provides efficient inference support for Kimi-V1 with advanced parallelization techniques.
+## 5. License
+This code repository is licensed under [the MIT License](LICENSE-CODE). The use of Kimi-V1 Base/Chat models is subject to [the Model License](LICENSE-MODEL). Kimi-V1 series supports commercial use under the specified terms.
+## 6. Contact
+If you have any questions, please raise an issue or contact us at [service@kimi.com](service@kimi.com).