Add model card with training config and eval metrics

3d2a96c verified about 1 month ago

1.82 kB

	---
	tags:
	- qwen3
	- eagle3
	- speculative-decoding
	- draft-model
	base_model:
	- Qwen/Qwen3-8B
	- AngelSlim/Qwen3-8B_eagle3
	license: apache-2.0
	---

	# AdaSPEC — EAGLE3 Draft Model for Qwen3-8B

	Reimplementation of AdaSPEC (adaptive speculative decoding) training objective. Baseline comparison.

	Part of a course project evaluating per-step weighted loss functions for training
	EAGLE3 draft models. Full pipeline and source:
	https://github.com/XLOverflow/anlp_course_project

	Collection: [Qwen3 EAGLE3 — Weighted Loss Variants](https://huggingface.co/collections/XLOverflow/qwen3-eagle3-weighted-loss-variants)

	## Training

	- Framework: [SpecForge](https://github.com/sgl-project/SpecForge) (our fork: https://github.com/XLOverflow/SpecForge)
	- Target model: `Qwen/Qwen3-8B`
	- Draft init: `AngelSlim/Qwen3-8B_eagle3`
	- Data: ShareGPT-style reasoning traces (see `scripts/data/` in project repo)
	- AdaSPEC adaptive loss (see paper)
	- Initialized from: `baseline-uniform/epoch_4_step_82000`

	## Evaluation (Qwen3-8B target)

	\| Dataset \| τ (accept. length) \| Speedup \| Accuracy \|
	\|---\|---\|---\|---\|
	\| GSM8K \| 6.856 \| 4.289× \| 95.15% \|
	\| MATH500 \| 6.678 \| 4.206× \| 94.40% \|

	Baselines for reference: Vanilla ≈ 1× speedup, EAGLE-orig ≈ 2× speedup.


	## Files

	- `model.safetensors` — draft model weights (~763 MB)
	- `config.json` — model config
	- Corresponds to: `outputs/eagle3-adaspec/epoch_0_step_17026` in the original training output

	Optimizer state (~3 GB) is not uploaded — use the project repo's training scripts to resume from scratch if needed.

	## Usage

	```python
	from huggingface_hub import snapshot_download
	draft_path = snapshot_download(repo_id="XLOverflow/qwen3-eagle3-adaspec")
	# Then load with EAGLE's EaModel — see scripts/eval/eval_combined.py in the project repo.
	```