Model Overview

Description

This EAGLE-3 head has been trained for microsoft/Phi-4-mini-instruct using SpecForge.

Training Dataset

Datasets: ultrachat_200k and Magpie-Llama-3.1-Pro-300K-Filtered

Only prompts from the datasets were used for data regeneration (original responses were not used). The synthesized data was used to train the Eagle modules.

Total Size: 503.3K data points

Evaluation Dataset

Dataset: MTBench (details)
Properties: 3,300 multi-turn dialogue sequences, each annotated with expert preference votes.

Inference

Engine: SGLang v0.5.7
Test Hardware: B200

Eagle Speculative Decoding

This model is an EAGLE-3 head trained for Phi-4-mini-instruct and is ready for inference with SGLang in Eagle speculative decoding mode.

Usage

To serve the model with SGLang:

python3 -m sglang.launch_server --model-path microsoft/Phi-4-mini-instruct --trust-remote-code --speculative-algorithm EAGLE3 --speculative-draft-model-path b8zhong/Phi-4-mini-instruct-EAGLE3 --speculative-num-steps 6 --speculative-eagle-topk 10 --speculative-num-draft-tokens 32

Performance Note: EAGLE-3 provides significant speedups at low batch sizes (2.3x at BS=1) but effectiveness diminishes at higher batch sizes. For batch sizes ≥16, consider reducing --speculative-num-steps to 3-4 and --speculative-eagle-topk to 4-6.

Evaluation

EAGLE-3 acceptance rate and throughput benchmark results (MT-Bench):

Parallel	Questions	Baseline Throughput (token/s)	EAGLE-3 Throughput (token/s)	EAGLE-3 Acceptance Length	Speedup
1	4	209.43	483.95	5.02	2.31x
2	8	368.99	742.86	4.74	2.01x
4	16	720.34	1328.63	4.67	1.84x
8	32	1138.43	1934.14	4.41	1.70x
16	64	2221.75	2631.79	3.93	1.18x
32	80	3584.29	3154.70	3.88	0.88x

Downloads last month: 5

Safetensors

Model size

0.2B params

Tensor type

I64

BF16

BOOL

Model tree for b8zhong/Phi-4-mini-instruct-EAGLE3

Base model

microsoft/Phi-4-mini-instruct

Finetuned

(52)

this model