qwe123wjb's picture
Add README
e2333e4 verified
metadata
license: apache-2.0
language:
  - en
tags:
  - speculative-decoding
  - sparse-kv-cache
  - long-context
base_model: JackFram/llama-68m

BudgetDraft — Released Checkpoints

Drafter checkpoints for the paper "BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding".

All three checkpoints fine-tune JackFram/llama-68m (68M parameters, fp32) for use as a drafter alongside the NousResearch/Yarn-Llama-2-7b-128k verifier under a sparse drafter-side KV cache.

Layout

Subfolder Variant Training loss Use in paper
main/ A + 0.5·C (multi-view, λ=0.5) full-cache + sparse-cache main checkpoint reported in Table 1 / Fig 3
aonly/ A only full-cache only (no sparse branch) ablation: without the sparse-cache loss
ac/ A + C (λ=1.0) full-cache + sparse-cache λ-sensitivity ablation

Each subfolder is a standard HuggingFace checkpoint (config.json, model.safetensors, tokenizer files) — load with AutoModelForCausalLM.from_pretrained(...).

Download

# Whole repo (~786 MB):
hf download qwe123wjb/BudgetDraft-checkpoints --local-dir ./ckpts

# Just the main checkpoint:
hf download qwe123wjb/BudgetDraft-checkpoints --include "main/*" --local-dir ./ckpts

Reproducing the paper

Pair these checkpoints with the evaluation code at https://github.com/ANTI-Tony/BudgetDraft:

git clone https://github.com/ANTI-Tony/BudgetDraft.git
cd BudgetDraft
pip install -r requirements.txt

hf download qwe123wjb/BudgetDraft-checkpoints --local-dir ./ckpts
make eval-from-release CHECKPOINTS=./ckpts

The eval script orchestrates 96 configurations (main + ablation + λ-sensitivity) across three datasets (PG-19 / LongBench QMSum / NarrativeQA) and three context lengths (4K / 8K / 16K). Headline result: 6.55× speedup at 4K with 79.37% acceptance on NarrativeQA.

Quick load (Python)

from transformers import AutoModelForCausalLM, AutoTokenizer

drafter = AutoModelForCausalLM.from_pretrained(
    "qwe123wjb/BudgetDraft-checkpoints",
    subfolder="main",
    torch_dtype="float32",
)
tokenizer = AutoTokenizer.from_pretrained(
    "qwe123wjb/BudgetDraft-checkpoints",
    subfolder="main",
)

Pair with the verifier:

verifier = AutoModelForCausalLM.from_pretrained(
    "NousResearch/Yarn-Llama-2-7b-128k",
    torch_dtype="float16",
)

Citation

Citation TBA after acceptance. See the GitHub repo for the latest.