XLOverflow's picture
Add model card with training config and eval metrics
3d2a96c verified
---
tags:
- qwen3
- eagle3
- speculative-decoding
- draft-model
base_model:
- Qwen/Qwen3-8B
- AngelSlim/Qwen3-8B_eagle3
license: apache-2.0
---
# AdaSPEC β€” EAGLE3 Draft Model for Qwen3-8B
Reimplementation of AdaSPEC (adaptive speculative decoding) training objective. Baseline comparison.
Part of a course project evaluating per-step weighted loss functions for training
EAGLE3 draft models. Full pipeline and source:
**https://github.com/XLOverflow/anlp_course_project**
Collection: [Qwen3 EAGLE3 β€” Weighted Loss Variants](https://huggingface.co/collections/XLOverflow/qwen3-eagle3-weighted-loss-variants)
## Training
- **Framework:** [SpecForge](https://github.com/sgl-project/SpecForge) (our fork: https://github.com/XLOverflow/SpecForge)
- **Target model:** `Qwen/Qwen3-8B`
- **Draft init:** `AngelSlim/Qwen3-8B_eagle3`
- **Data:** ShareGPT-style reasoning traces (see `scripts/data/` in project repo)
- AdaSPEC adaptive loss (see paper)
- Initialized from: `baseline-uniform/epoch_4_step_82000`
## Evaluation (Qwen3-8B target)
| Dataset | Ο„ (accept. length) | Speedup | Accuracy |
|---|---|---|---|
| GSM8K | 6.856 | 4.289Γ— | 95.15% |
| MATH500 | 6.678 | 4.206Γ— | 94.40% |
Baselines for reference: Vanilla β‰ˆ 1Γ— speedup, EAGLE-orig β‰ˆ 2Γ— speedup.
## Files
- `model.safetensors` β€” draft model weights (~763 MB)
- `config.json` β€” model config
- Corresponds to: `outputs/eagle3-adaspec/epoch_0_step_17026` in the original training output
Optimizer state (~3 GB) is not uploaded β€” use the project repo's training scripts to resume from scratch if needed.
## Usage
```python
from huggingface_hub import snapshot_download
draft_path = snapshot_download(repo_id="XLOverflow/qwen3-eagle3-adaspec")
# Then load with EAGLE's EaModel β€” see scripts/eval/eval_combined.py in the project repo.
```