---
license: mit
---

**Eagle3 v2 Optimized Release (Build 20260329)**

Thanks to the **SpecForge** framework for their foundational contributions. Stay tuned for further updates.


## Model Overview

**Kimi-K25-eagle3** is an advanced and highly specialized draft model meticulously engineered to significantly accelerate the inference process of the Kimi-K25 ecosystem, leveraging the powerful **EAGLE3** framework.

Architected upon the robust **Llama architecture**, this model functions as an exceptionally efficient drafter. It has undergone rigorous training on **2 million high-quality samples** sourced from the comprehensive **EagleChat**, **open-perfectblend** dataset and some multimodal data. This extensive training ensures precise and strict alignment with the teacher model's distribution, thereby guaranteeing high fidelity and performance.


## Performance & Acceleration

The core value of this EAGLE3 model is its ability to predict multiple future tokens that are subsequently verified by the base model. High acceptance lengths indicate significant latency reduction. Continuous future iterations.

**Speculative Decoding Configuration:**
*   `--speculative-num-steps 3`: Configures the number of speculative decoding steps.
*   `--speculative-eagle-topk 1`: Sets the `top-k` value for the Eagle draft model during speculative decoding.
*   `--speculative-num-draft-tokens 4`: Specifies the number of draft tokens generated in each speculative step.

**Average Token Acceptance Lengths:**
| Benchmark                    | Eagle3 v1（0302） | Eagle3 v2（0329） |
| :---- | :---- | :---- |
| **HumanEval** (Code)         | **2.625**          | **3.163**          |
| **GSM8K** (Math)             | **2.746**          | **3.005**          |
| **Math500** (Complex Math)   | **2.596**          | **3.234**          |
| **MMStar** (Vision and Text) | **2.219**          | **2.627**          |

These metrics demonstrate robust acceleration performance across diverse and complex domains.

![image](https://cdn-uploads.huggingface.co/production/uploads/656fd2cdfdd27aeb99ca54b3/K-_nkMtXBpHVpqpnybSgx.png)

![image](https://cdn-uploads.huggingface.co/production/uploads/656fd2cdfdd27aeb99ca54b3/ZNa5iIiFYTP_m_bx_sHgx.png)

![image](https://cdn-uploads.huggingface.co/production/uploads/656fd2cdfdd27aeb99ca54b3/2zUSql9kWH1L8WBL7cMcL.png)


## Quick Start

### Requirements

- NVIDIA GPU 
- CUDA 12.0+
- PyTorch 2.0+

### Installation

```bash
pip install sglang==0.5.9
```
and include PR https://github.com/sgl-project/sglang/pull/19689

### Inference with SGLang

```python
python3 -m sglang.launch_server  \
    --model-path /models/Kimi-K25 \
    --host 0.0.0.0 --port 30012  \
    --trust-remote-code  \
    --mem-fraction-static 0.9 \
    --tp-size 8  \
    --speculative-algorithm EAGLE3  \
    --speculative-draft-model-path AQ-MedAI/Kimi-K25-eagle3 \
    --speculative-num-steps 3  \
    --speculative-eagle-topk 1   \
    --speculative-num-draft-tokens 4
```

## Citation

If you use this model in your research or application, please cite the following:

```bibtex
@misc{kimik25eagle3,
  title={Kimi-K25-eagle3: Accelerating Instruction Following with EAGLE},
  author={Ant AQ Team},
  year={2026},
}