Model Overview

kimi-k2.5-eagle3 is an Eagle3 MTP draft model for accelerating inference of Kimi-K2.5, trained with TorchSpec โ€” an online speculative decoding training framework that runs FSDP training and inference concurrently. If you find this draft model useful, please give our project TorchSpec a ๐ŸŒŸ on GitHub.

Training data is available at lightseekorg/kimi-mtp-dataset.

Training Setup

  • Cluster: 4 nodes ร— 8ร— H200 (32 GPUs total)
  • Training: 2 nodes (16 GPUs), FSDP
  • Inference: 2 nodes (16 GPUs), Engine (TP=8 per node)
  • Duration: ~14 hours per phase

Training ran in two phases, each 20k steps (~300k samples):

  • Phase 1: Regenerated open-perfectblend dataset
  • Phase 2: Mixed dataset (English, VL, Chinese, function-call, agent, creative writing)

All training responses were regenerated by Kimi-K2.5 via Engine to match the base model's exact token distribution.

Training Curves

The plots show loss, token acceptance accuracy, and simulated accept_length during training. Both eval sets contain 256 samples drawn from each phase's own training corpus.

Phase 1 (steps 0 โ†’ 20k):

Phase 1 training curves

Phase 2 (steps 20k โ†’ 40k):

Phase 2 training curves


Performance

The primary metric is accept_length โ€” the average number of tokens accepted per speculation step with topk=1, num_steps=3, num_draft_tokens=4. Higher is better.

Benchmarks were run using SpecForge's bench_eagle3.py. BFCL v3 benchmarks (โ€ ) use a custom extension to the original script.

accept_length by dataset and method

Category Dataset n Phase 1 (20k steps) Phase 2 (40k steps)
Dialogue MTBench 80 2.624 2.687
Chinese CEval 212 1.482 2.295
Math GSM8K 500 3.123 3.201
Code HumanEval 164 3.242 3.285
Math MATH500 500 3.323 3.342
Math AIME 30 2.972 3.033
VL MMStar 200 2.566 2.787
Function Call โ€  BFCL v3 simple 400 3.729 3.798
Function Call โ€  BFCL v3 multiple 200 3.745 3.809
Function Call โ€  BFCL v3 parallel 200 3.596 3.669
Function Call โ€  BFCL v3 parallel_multiple 200 3.525 3.601
Function Call โ€  BFCL v3 live_simple 1547 3.515 3.667
Function Call โ€  BFCL v3 live_multiple 1030 3.407 3.453
Function Call โ€  BFCL v3 live_parallel 97 3.303 3.410
Function Call โ€  BFCL v3 live_parallel_multiple 170 3.070 3.159

Quick Start

Requirements

  • NVIDIA GPU with CUDA 12.0+
  • SGLang โ‰ฅ 0.5.8

Launch Server

python -m sglang.launch_server \
    --model-path /path/to/Kimi-K2.5 \
    --tp 8 \
    --trust-remote-code \
    --speculative-algorithm EAGLE3 \
    --speculative-draft-model-path lightseekorg/kimi-k2.5-eagle3 \
    --speculative-num-steps 3 \
    --speculative-eagle-topk 1 \
    --speculative-num-draft-tokens 4 \
    --mem-fraction-static 0.75 \
    --dtype bfloat16

Run Benchmarks

python bench_eagle3.py \
    --model-path /path/to/Kimi-K2.5 \
    --port 30000 \
    --config-list 1,3,1,4 \
    --benchmark-list <benchmark_name> \
    --skip-launch-server

--config-list format: topk,num_steps,topk,num_draft_tokens.

Downloads last month
3,583
Safetensors
Model size
3B params
Tensor type
BF16
ยท
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for lightseekorg/kimi-k2.5-eagle3

Finetuned
(32)
this model