Model Overview
Description
This EAGLE-3 head has been trained for microsoft/Phi-4-mini-instruct using SpecForge.
Training Dataset
Datasets: ultrachat_200k and Magpie-Llama-3.1-Pro-300K-Filtered
Only prompts from the datasets were used for data regeneration (original responses were not used). The synthesized data was used to train the Eagle modules.
Total Size: 503.3K data points
Evaluation Dataset
Dataset: MTBench (details)
Properties: 3,300 multi-turn dialogue sequences, each annotated with expert preference votes.
Inference
Engine: SGLang v0.5.7
Test Hardware: B200
Eagle Speculative Decoding
This model is an EAGLE-3 head trained for Phi-4-mini-instruct and is ready for inference with SGLang in Eagle speculative decoding mode.
Usage
To serve the model with SGLang:
python3 -m sglang.launch_server --model-path microsoft/Phi-4-mini-instruct --trust-remote-code --speculative-algorithm EAGLE3 --speculative-draft-model-path b8zhong/Phi-4-mini-instruct-EAGLE3 --speculative-num-steps 6 --speculative-eagle-topk 10 --speculative-num-draft-tokens 32
Performance Note: EAGLE-3 provides significant speedups at low batch sizes (2.3x at BS=1) but effectiveness diminishes at higher batch sizes. For batch sizes ≥16, consider reducing --speculative-num-steps to 3-4 and --speculative-eagle-topk to 4-6.
Evaluation
EAGLE-3 acceptance rate and throughput benchmark results (MT-Bench):
| Parallel | Questions | Baseline Throughput (token/s) | EAGLE-3 Throughput (token/s) | EAGLE-3 Acceptance Length | Speedup |
|---|---|---|---|---|---|
| 1 | 4 | 209.43 | 483.95 | 5.02 | 2.31x |
| 2 | 8 | 368.99 | 742.86 | 4.74 | 2.01x |
| 4 | 16 | 720.34 | 1328.63 | 4.67 | 1.84x |
| 8 | 32 | 1138.43 | 1934.14 | 4.41 | 1.70x |
| 16 | 64 | 2221.75 | 2631.79 | 3.93 | 1.18x |
| 32 | 80 | 3584.29 | 3154.70 | 3.88 | 0.88x |
- Downloads last month
- 5
Model tree for b8zhong/Phi-4-mini-instruct-EAGLE3
Base model
microsoft/Phi-4-mini-instruct