Model Overview
kimi-k2.5-eagle3 is an Eagle3 MTP draft model for accelerating inference of Kimi-K2.5, trained with TorchSpec โ an online speculative decoding training framework that runs FSDP training and inference concurrently. If you find this draft model useful, please give our project TorchSpec a ๐ on GitHub.
Training data is available at lightseekorg/kimi-mtp-dataset.
Training Setup
- Cluster: 4 nodes ร 8ร H200 (32 GPUs total)
- Training: 2 nodes (16 GPUs), FSDP
- Inference: 2 nodes (16 GPUs), Engine (TP=8 per node)
- Duration: ~14 hours per phase
Training ran in two phases, each 20k steps (~300k samples):
- Phase 1: Regenerated open-perfectblend dataset
- Phase 2: Mixed dataset (English, VL, Chinese, function-call, agent, creative writing)
All training responses were regenerated by Kimi-K2.5 via Engine to match the base model's exact token distribution.
Training Curves
The plots show loss, token acceptance accuracy, and simulated accept_length during training. Both eval sets contain 256 samples drawn from each phase's own training corpus.
Phase 1 (steps 0 โ 20k):
Phase 2 (steps 20k โ 40k):
Performance
The primary metric is accept_length โ the average number of tokens accepted per speculation step with topk=1, num_steps=3, num_draft_tokens=4. Higher is better.
Benchmarks were run using SpecForge's bench_eagle3.py. BFCL v3 benchmarks (โ ) use a custom extension to the original script.
| Category | Dataset | n | Phase 1 (20k steps) | Phase 2 (40k steps) |
|---|---|---|---|---|
| Dialogue | MTBench | 80 | 2.624 | 2.687 |
| Chinese | CEval | 212 | 1.482 | 2.295 |
| Math | GSM8K | 500 | 3.123 | 3.201 |
| Code | HumanEval | 164 | 3.242 | 3.285 |
| Math | MATH500 | 500 | 3.323 | 3.342 |
| Math | AIME | 30 | 2.972 | 3.033 |
| VL | MMStar | 200 | 2.566 | 2.787 |
| Function Call โ | BFCL v3 simple | 400 | 3.729 | 3.798 |
| Function Call โ | BFCL v3 multiple | 200 | 3.745 | 3.809 |
| Function Call โ | BFCL v3 parallel | 200 | 3.596 | 3.669 |
| Function Call โ | BFCL v3 parallel_multiple | 200 | 3.525 | 3.601 |
| Function Call โ | BFCL v3 live_simple | 1547 | 3.515 | 3.667 |
| Function Call โ | BFCL v3 live_multiple | 1030 | 3.407 | 3.453 |
| Function Call โ | BFCL v3 live_parallel | 97 | 3.303 | 3.410 |
| Function Call โ | BFCL v3 live_parallel_multiple | 170 | 3.070 | 3.159 |
Quick Start
Requirements
- NVIDIA GPU with CUDA 12.0+
- SGLang โฅ 0.5.8
Launch Server
python -m sglang.launch_server \
--model-path /path/to/Kimi-K2.5 \
--tp 8 \
--trust-remote-code \
--speculative-algorithm EAGLE3 \
--speculative-draft-model-path lightseekorg/kimi-k2.5-eagle3 \
--speculative-num-steps 3 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 4 \
--mem-fraction-static 0.75 \
--dtype bfloat16
Run Benchmarks
python bench_eagle3.py \
--model-path /path/to/Kimi-K2.5 \
--port 30000 \
--config-list 1,3,1,4 \
--benchmark-list <benchmark_name> \
--skip-launch-server
--config-list format: topk,num_steps,topk,num_draft_tokens.
- Downloads last month
- 3,583
Model tree for lightseekorg/kimi-k2.5-eagle3
Base model
moonshotai/Kimi-K2.5

