XLOverflow's picture
Add model card with training config and eval metrics
3d2a96c verified
metadata
tags:
  - qwen3
  - eagle3
  - speculative-decoding
  - draft-model
base_model:
  - Qwen/Qwen3-8B
  - AngelSlim/Qwen3-8B_eagle3
license: apache-2.0

AdaSPEC — EAGLE3 Draft Model for Qwen3-8B

Reimplementation of AdaSPEC (adaptive speculative decoding) training objective. Baseline comparison.

Part of a course project evaluating per-step weighted loss functions for training EAGLE3 draft models. Full pipeline and source: https://github.com/XLOverflow/anlp_course_project

Collection: Qwen3 EAGLE3 — Weighted Loss Variants

Training

  • Framework: SpecForge (our fork: https://github.com/XLOverflow/SpecForge)
  • Target model: Qwen/Qwen3-8B
  • Draft init: AngelSlim/Qwen3-8B_eagle3
  • Data: ShareGPT-style reasoning traces (see scripts/data/ in project repo)
  • AdaSPEC adaptive loss (see paper)
  • Initialized from: baseline-uniform/epoch_4_step_82000

Evaluation (Qwen3-8B target)

Dataset Ï„ (accept. length) Speedup Accuracy
GSM8K 6.856 4.289× 95.15%
MATH500 6.678 4.206× 94.40%

Baselines for reference: Vanilla ≈ 1× speedup, EAGLE-orig ≈ 2× speedup.

Files

  • model.safetensors — draft model weights (~763 MB)
  • config.json — model config
  • Corresponds to: outputs/eagle3-adaspec/epoch_0_step_17026 in the original training output

Optimizer state (~3 GB) is not uploaded — use the project repo's training scripts to resume from scratch if needed.

Usage

from huggingface_hub import snapshot_download
draft_path = snapshot_download(repo_id="XLOverflow/qwen3-eagle3-adaspec")
# Then load with EAGLE's EaModel — see scripts/eval/eval_combined.py in the project repo.