| --- |
| tags: |
| - qwen3 |
| - eagle3 |
| - speculative-decoding |
| - draft-model |
| base_model: |
| - Qwen/Qwen3-8B |
| - AngelSlim/Qwen3-8B_eagle3 |
| license: apache-2.0 |
| --- |
| |
| # AdaSPEC β EAGLE3 Draft Model for Qwen3-8B |
|
|
| Reimplementation of AdaSPEC (adaptive speculative decoding) training objective. Baseline comparison. |
|
|
| Part of a course project evaluating per-step weighted loss functions for training |
| EAGLE3 draft models. Full pipeline and source: |
| **https://github.com/XLOverflow/anlp_course_project** |
|
|
| Collection: [Qwen3 EAGLE3 β Weighted Loss Variants](https://huggingface.co/collections/XLOverflow/qwen3-eagle3-weighted-loss-variants) |
|
|
| ## Training |
|
|
| - **Framework:** [SpecForge](https://github.com/sgl-project/SpecForge) (our fork: https://github.com/XLOverflow/SpecForge) |
| - **Target model:** `Qwen/Qwen3-8B` |
| - **Draft init:** `AngelSlim/Qwen3-8B_eagle3` |
| - **Data:** ShareGPT-style reasoning traces (see `scripts/data/` in project repo) |
| - AdaSPEC adaptive loss (see paper) |
| - Initialized from: `baseline-uniform/epoch_4_step_82000` |
|
|
| ## Evaluation (Qwen3-8B target) |
|
|
| | Dataset | Ο (accept. length) | Speedup | Accuracy | |
| |---|---|---|---| |
| | GSM8K | 6.856 | 4.289Γ | 95.15% | |
| | MATH500 | 6.678 | 4.206Γ | 94.40% | |
|
|
| Baselines for reference: Vanilla β 1Γ speedup, EAGLE-orig β 2Γ speedup. |
|
|
|
|
| ## Files |
|
|
| - `model.safetensors` β draft model weights (~763 MB) |
| - `config.json` β model config |
| - Corresponds to: `outputs/eagle3-adaspec/epoch_0_step_17026` in the original training output |
|
|
| Optimizer state (~3 GB) is not uploaded β use the project repo's training scripts to resume from scratch if needed. |
|
|
| ## Usage |
|
|
| ```python |
| from huggingface_hub import snapshot_download |
| draft_path = snapshot_download(repo_id="XLOverflow/qwen3-eagle3-adaspec") |
| # Then load with EAGLE's EaModel β see scripts/eval/eval_combined.py in the project repo. |
| ``` |
|
|