EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Paper • 2503.01840 • Published • 9
Post-trained EAGLE3 draft model for moonshotai/Kimi-K2.5, based on lightseekorg/kimi-k2.5-eagle3.
This model is a speculative decoding draft model trained using the EAGLE3 architecture. It can be used to accelerate inference of Kimi-K2.5 by predicting multiple tokens in parallel.
The base EAGLE3 draft model from lightseekorg/kimi-k2.5-eagle3 was further post-trained on open-source coding datasets.
This model is intended to be used as a draft model with EAGLE3-compatible inference engines such as vLLM or SGLang.