EAGLE-3 Draft Head for Qwen/Qwen2.5-0.5B

An EAGLE-3 speculative-decoding draft head trained for Qwen/Qwen2.5-0.5B using the speculators library.

EAGLE-3 uses a lightweight decoder layer to predict future hidden states, enabling the target model to verify multiple candidate tokens in a single forward pass — significantly improving inference throughput with no loss in output quality.

Training Details

Detail Value
Base / target model Qwen/Qwen2.5-0.5B
Draft architecture 1 LLaMA-style decoder layer (~30 M trainable params)
TTT steps 8
Draft tokens 4
Training samples 100 000 (random-token prompts + greedy completions)
Epochs 5
Learning rate 5 × 10⁻⁵ (cosine schedule)
Sequence length 1 024
Framework speculators ≥ 0.5.0

Usage with vLLM

Serve the target model with this draft head for speculative decoding:

vllm serve Qwen/Qwen2.5-0.5B \
  --speculative-model BalajiAI/qwen2.5-0.5b.eagle3 \
  --num-speculative-tokens 4 \
  --dtype bfloat16

License

This draft head is released under the Apache 2.0 license.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BalajiAI/qwen2.5-0.5b.eagle3

Finetuned
(614)
this model

Paper for BalajiAI/qwen2.5-0.5b.eagle3