Model Overview

Description

This EAGLE-3 head has been trained for microsoft/Phi-4-mini-instruct using SpecForge.

Training Dataset

Datasets: ultrachat_200k and Magpie-Llama-3.1-Pro-300K-Filtered

Only prompts from the datasets were used for data regeneration (original responses were not used). The synthesized data was used to train the Eagle modules.

Total Size: 503.3K data points

Evaluation Dataset

Dataset: MTBench (details)
Properties: 3,300 multi-turn dialogue sequences, each annotated with expert preference votes.

Inference

Engine: SGLang v0.5.7
Test Hardware: B200

Eagle Speculative Decoding

This model is an EAGLE-3 head trained for Phi-4-mini-instruct and is ready for inference with SGLang in Eagle speculative decoding mode.

Usage

To serve the model with SGLang:

python3 -m sglang.launch_server --model-path microsoft/Phi-4-mini-instruct --trust-remote-code --speculative-algorithm EAGLE3 --speculative-draft-model-path b8zhong/Phi-4-mini-instruct-EAGLE3 --speculative-num-steps 6 --speculative-eagle-topk 10 --speculative-num-draft-tokens 32

Performance Note: EAGLE-3 provides significant speedups at low batch sizes (2.3x at BS=1) but effectiveness diminishes at higher batch sizes. For batch sizes ≥16, consider reducing --speculative-num-steps to 3-4 and --speculative-eagle-topk to 4-6.

Evaluation

EAGLE-3 acceptance rate and throughput benchmark results (MT-Bench):

Parallel Questions Baseline Throughput (token/s) EAGLE-3 Throughput (token/s) EAGLE-3 Acceptance Length Speedup
1 4 209.43 483.95 5.02 2.31x
2 8 368.99 742.86 4.74 2.01x
4 16 720.34 1328.63 4.67 1.84x
8 32 1138.43 1934.14 4.41 1.70x
16 64 2221.75 2631.79 3.93 1.18x
32 80 3584.29 3154.70 3.88 0.88x
Downloads last month
5
Safetensors
Model size
0.2B params
Tensor type
I64
·
BF16
·
BOOL
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for b8zhong/Phi-4-mini-instruct-EAGLE3

Finetuned
(52)
this model