|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
**Kimi-K2-Instruct-eagle3** is a specialized draft model designed to accelerate the inference of the Kimi-K2-Instruct ecosystem using the **EAGLE3 (Extrapolation Algorithm for Greater Language-model Efficiency)** framework. |
|
|
|
|
|
Kimi-K2-Instruct with EAGLE3 achieves up to **1.8× peak throughput** versus the base model, accelerating generation across all 7 benchmarks—from +24% on MT-Bench to +80% on Math500 (configured with bs=8, steps=3, topk=1, num_draft_tokens=4). |
|
|
|
|
|
Built upon the **Llama architecture**, this model acts as a highly efficient drafter. It has been trained on **1.4 million high-quality samples** from the **Open-PerfectBlend** dataset, ensuring strict alignment with the teacher model's distribution. |
|
|
|
|
|
This model serves as a general-purpose English instruction follower with strong capabilities in: |
|
|
* **Conversation** |
|
|
* **Mathematical Reasoning** |
|
|
* **Code Generation** |
|
|
|
|
|
## Efficient Download Guide |
|
|
To minimize download time and storage usage, please note the function of the files in the repository: |
|
|
|
|
|
**For Inference**: You only need to download config.json and model.safetensors. |
|
|
|
|
|
**For Continued Training**: The file training_state.pt contains optimizer states specifically for resuming training. If you only intend to use the model for inference, you can skip downloading this file. |
|
|
|
|
|
## Performance & Acceleration |
|
|
|
|
|
The core value of this EAGLE model is its ability to predict multiple future tokens that are subsequently verified by the base model. High acceptance lengths indicate significant latency reduction. |
|
|
|
|
|
**Average Token Acceptance Lengths (MLA):** |
|
|
|
|
|
| Benchmark | Average Acceptance Length | |
|
|
| :--- | :--- | |
|
|
| **HumanEval** (Code) | **3.372** | |
|
|
| **GSM8K** (Math) | **3.165** | |
|
|
| **Math500** (Complex Math) | **3.490** | |
|
|
|
|
|
These metrics demonstrate robust acceleration performance across diverse and complex domains. |
|
|
|
|
|
 |
|
|
 |
|
|
 |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### Requirements |
|
|
|
|
|
- NVIDIA GPU |
|
|
- CUDA 12.0+ |
|
|
- PyTorch 2.0+ |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install sglang==0.5.6 |
|
|
``` |
|
|
|
|
|
### Inference with SGLang |
|
|
|
|
|
```python |
|
|
python3 -m sglang.launch_server \ |
|
|
--model-path /models/Kimi-K2-Instruct \ |
|
|
--host 0.0.0.0 --port 30012 \ |
|
|
--trust-remote-code \ |
|
|
--attention-backend flashinfer \ |
|
|
--mem-fraction-static 0.9 \ |
|
|
--tp-size 8 \ |
|
|
--speculative-algorithm EAGLE3 \ |
|
|
--speculative-draft-model-path AQ-MedAI/Kimi-K2-Instruct-eagle3 \ |
|
|
--speculative-num-steps 3 \ |
|
|
--speculative-eagle-topk 1 \ |
|
|
--speculative-num-draft-tokens 4 |
|
|
``` |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The model was trained on **1.4 million samples** sourced from the **Open-PerfectBlend** dataset. The data selection prioritizes high-quality instruction-following scenarios to maximize the draft model's predictive accuracy relative to the base model. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research or application, please cite the following: |
|
|
|
|
|
```bibtex |
|
|
@misc{kimik2eagle3, |
|
|
title={Kimi-K2-Instruct-eagle3: Accelerating Instruction Following with EAGLE}, |
|
|
author={Ant AQ Team}, |
|
|
year={2025}, |
|
|
} |