AQ-MedAI
/

Kimi-K2-Instruct-eagle3

Model card Files Files and versions

Kimi-K2-Instruct-eagle3 / README.md

xiaomenshen's picture

Update README.md

4db2f99 verified 9 days ago

|

history blame contribute delete

3.17 kB

	---
	license: mit
	---

	## Model Overview

	Kimi-K2-Instruct-eagle3 is a specialized draft model designed to accelerate the inference of the Kimi-K2-Instruct ecosystem using the EAGLE3 (Extrapolation Algorithm for Greater Language-model Efficiency) framework.

	Kimi-K2-Instruct with EAGLE3 achieves up to 1.8× peak throughput versus the base model, accelerating generation across all 7 benchmarks—from +24% on MT-Bench to +80% on Math500 (configured with bs=8, steps=3, topk=1, num_draft_tokens=4).

	Built upon the Llama architecture, this model acts as a highly efficient drafter. It has been trained on 1.4 million high-quality samples from the Open-PerfectBlend dataset, ensuring strict alignment with the teacher model's distribution.

	This model serves as a general-purpose English instruction follower with strong capabilities in:
	* Conversation
	* Mathematical Reasoning
	* Code Generation

	## Efficient Download Guide
	To minimize download time and storage usage, please note the function of the files in the repository:

	For Inference: You only need to download config.json and model.safetensors.

	For Continued Training: The file training_state.pt contains optimizer states specifically for resuming training. If you only intend to use the model for inference, you can skip downloading this file.

	## Performance & Acceleration

	The core value of this EAGLE model is its ability to predict multiple future tokens that are subsequently verified by the base model. High acceptance lengths indicate significant latency reduction.

	Average Token Acceptance Lengths (MLA):

	\| Benchmark \| Average Acceptance Length \|
	\| :--- \| :--- \|
	\| HumanEval (Code) \| 3.372 \|
	\| GSM8K (Math) \| 3.165 \|
	\| Math500 (Complex Math) \| 3.490 \|

	These metrics demonstrate robust acceleration performance across diverse and complex domains.

	![1](https://hackmd.io/_uploads/ryP6cBLXbg.png)
	![2](https://hackmd.io/_uploads/S1Da5BLmbl.png)
	![3](https://hackmd.io/_uploads/S1v65HIm-e.png)

	## Quick Start

	### Requirements

	- NVIDIA GPU
	- CUDA 12.0+
	- PyTorch 2.0+

	### Installation

	```bash
	pip install sglang==0.5.6
	```

	### Inference with SGLang

	```python
	python3 -m sglang.launch_server \
	--model-path /models/Kimi-K2-Instruct \
	--host 0.0.0.0 --port 30012 \
	--trust-remote-code \
	--attention-backend flashinfer \
	--mem-fraction-static 0.9 \
	--tp-size 8 \
	--speculative-algorithm EAGLE3 \
	--speculative-draft-model-path AQ-MedAI/Kimi-K2-Instruct-eagle3 \
	--speculative-num-steps 3 \
	--speculative-eagle-topk 1 \
	--speculative-num-draft-tokens 4
	```

	## Training Data

	The model was trained on 1.4 million samples sourced from the Open-PerfectBlend dataset. The data selection prioritizes high-quality instruction-following scenarios to maximize the draft model's predictive accuracy relative to the base model.

	## Citation

	If you use this model in your research or application, please cite the following:

	```bibtex
	@misc{kimik2eagle3,
	title={Kimi-K2-Instruct-eagle3: Accelerating Instruction Following with EAGLE},
	author={Ant AQ Team},
	year={2025},
	}