metadata
tags:
- qwen3
- eagle3
- speculative-decoding
- draft-model
base_model:
- Qwen/Qwen3-8B
- AngelSlim/Qwen3-8B_eagle3
license: apache-2.0
AdaSPEC — EAGLE3 Draft Model for Qwen3-8B
Reimplementation of AdaSPEC (adaptive speculative decoding) training objective. Baseline comparison.
Part of a course project evaluating per-step weighted loss functions for training EAGLE3 draft models. Full pipeline and source: https://github.com/XLOverflow/anlp_course_project
Collection: Qwen3 EAGLE3 — Weighted Loss Variants
Training
- Framework: SpecForge (our fork: https://github.com/XLOverflow/SpecForge)
- Target model:
Qwen/Qwen3-8B - Draft init:
AngelSlim/Qwen3-8B_eagle3 - Data: ShareGPT-style reasoning traces (see
scripts/data/in project repo) - AdaSPEC adaptive loss (see paper)
- Initialized from:
baseline-uniform/epoch_4_step_82000
Evaluation (Qwen3-8B target)
| Dataset | Ï„ (accept. length) | Speedup | Accuracy |
|---|---|---|---|
| GSM8K | 6.856 | 4.289× | 95.15% |
| MATH500 | 6.678 | 4.206× | 94.40% |
Baselines for reference: Vanilla ≈ 1× speedup, EAGLE-orig ≈ 2× speedup.
Files
model.safetensors— draft model weights (~763 MB)config.json— model config- Corresponds to:
outputs/eagle3-adaspec/epoch_0_step_17026in the original training output
Optimizer state (~3 GB) is not uploaded — use the project repo's training scripts to resume from scratch if needed.
Usage
from huggingface_hub import snapshot_download
draft_path = snapshot_download(repo_id="XLOverflow/qwen3-eagle3-adaspec")
# Then load with EAGLE's EaModel — see scripts/eval/eval_combined.py in the project repo.