--- tags: - qwen3 - eagle3 - speculative-decoding - draft-model base_model: - Qwen/Qwen3-8B - AngelSlim/Qwen3-8B_eagle3 license: apache-2.0 --- # AdaSPEC — EAGLE3 Draft Model for Qwen3-8B Reimplementation of AdaSPEC (adaptive speculative decoding) training objective. Baseline comparison. Part of a course project evaluating per-step weighted loss functions for training EAGLE3 draft models. Full pipeline and source: **https://github.com/XLOverflow/anlp_course_project** Collection: [Qwen3 EAGLE3 — Weighted Loss Variants](https://huggingface.co/collections/XLOverflow/qwen3-eagle3-weighted-loss-variants) ## Training - **Framework:** [SpecForge](https://github.com/sgl-project/SpecForge) (our fork: https://github.com/XLOverflow/SpecForge) - **Target model:** `Qwen/Qwen3-8B` - **Draft init:** `AngelSlim/Qwen3-8B_eagle3` - **Data:** ShareGPT-style reasoning traces (see `scripts/data/` in project repo) - AdaSPEC adaptive loss (see paper) - Initialized from: `baseline-uniform/epoch_4_step_82000` ## Evaluation (Qwen3-8B target) | Dataset | τ (accept. length) | Speedup | Accuracy | |---|---|---|---| | GSM8K | 6.856 | 4.289× | 95.15% | | MATH500 | 6.678 | 4.206× | 94.40% | Baselines for reference: Vanilla ≈ 1× speedup, EAGLE-orig ≈ 2× speedup. ## Files - `model.safetensors` — draft model weights (~763 MB) - `config.json` — model config - Corresponds to: `outputs/eagle3-adaspec/epoch_0_step_17026` in the original training output Optimizer state (~3 GB) is not uploaded — use the project repo's training scripts to resume from scratch if needed. ## Usage ```python from huggingface_hub import snapshot_download draft_path = snapshot_download(repo_id="XLOverflow/qwen3-eagle3-adaspec") # Then load with EAGLE's EaModel — see scripts/eval/eval_combined.py in the project repo. ```