license: apache-2.0
language:
- en
tags:
- speculative-decoding
- sparse-kv-cache
- long-context
base_model: JackFram/llama-68m
BudgetDraft — Released Checkpoints
Drafter checkpoints for the paper "BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding".
All three checkpoints fine-tune JackFram/llama-68m (68M parameters, fp32) for use as a drafter alongside the NousResearch/Yarn-Llama-2-7b-128k verifier under a sparse drafter-side KV cache.
Layout
| Subfolder | Variant | Training loss | Use in paper |
|---|---|---|---|
main/ |
A + 0.5·C (multi-view, λ=0.5) | full-cache + sparse-cache | main checkpoint reported in Table 1 / Fig 3 |
aonly/ |
A only | full-cache only (no sparse branch) | ablation: without the sparse-cache loss |
ac/ |
A + C (λ=1.0) | full-cache + sparse-cache | λ-sensitivity ablation |
Each subfolder is a standard HuggingFace checkpoint (config.json, model.safetensors, tokenizer files) — load with AutoModelForCausalLM.from_pretrained(...).
Download
# Whole repo (~786 MB):
hf download qwe123wjb/BudgetDraft-checkpoints --local-dir ./ckpts
# Just the main checkpoint:
hf download qwe123wjb/BudgetDraft-checkpoints --include "main/*" --local-dir ./ckpts
Reproducing the paper
Pair these checkpoints with the evaluation code at https://github.com/ANTI-Tony/BudgetDraft:
git clone https://github.com/ANTI-Tony/BudgetDraft.git
cd BudgetDraft
pip install -r requirements.txt
hf download qwe123wjb/BudgetDraft-checkpoints --local-dir ./ckpts
make eval-from-release CHECKPOINTS=./ckpts
The eval script orchestrates 96 configurations (main + ablation + λ-sensitivity) across three datasets (PG-19 / LongBench QMSum / NarrativeQA) and three context lengths (4K / 8K / 16K). Headline result: 6.55× speedup at 4K with 79.37% acceptance on NarrativeQA.
Quick load (Python)
from transformers import AutoModelForCausalLM, AutoTokenizer
drafter = AutoModelForCausalLM.from_pretrained(
"qwe123wjb/BudgetDraft-checkpoints",
subfolder="main",
torch_dtype="float32",
)
tokenizer = AutoTokenizer.from_pretrained(
"qwe123wjb/BudgetDraft-checkpoints",
subfolder="main",
)
Pair with the verifier:
verifier = AutoModelForCausalLM.from_pretrained(
"NousResearch/Yarn-Llama-2-7b-128k",
torch_dtype="float16",
)
Citation
Citation TBA after acceptance. See the GitHub repo for the latest.