| --- |
| license: apache-2.0 |
| language: |
| - en |
| tags: |
| - speculative-decoding |
| - sparse-kv-cache |
| - long-context |
| base_model: JackFram/llama-68m |
| --- |
| |
| # BudgetDraft — Released Checkpoints |
|
|
| Drafter checkpoints for the paper **"BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding"**. |
|
|
| All three checkpoints fine-tune `JackFram/llama-68m` (68M parameters, fp32) for use as a drafter alongside the `NousResearch/Yarn-Llama-2-7b-128k` verifier under a sparse drafter-side KV cache. |
|
|
| ## Layout |
|
|
| | Subfolder | Variant | Training loss | Use in paper | |
| |---|---|---|---| |
| | `main/` | **A + 0.5·C** (multi-view, λ=0.5) | full-cache + sparse-cache | main checkpoint reported in Table 1 / Fig 3 | |
| | `aonly/` | **A only** | full-cache only (no sparse branch) | ablation: without the sparse-cache loss | |
| | `ac/` | **A + C** (λ=1.0) | full-cache + sparse-cache | λ-sensitivity ablation | |
|
|
| Each subfolder is a standard HuggingFace checkpoint (`config.json`, `model.safetensors`, tokenizer files) — load with `AutoModelForCausalLM.from_pretrained(...)`. |
|
|
| ## Download |
|
|
| ```bash |
| # Whole repo (~786 MB): |
| hf download qwe123wjb/BudgetDraft-checkpoints --local-dir ./ckpts |
| |
| # Just the main checkpoint: |
| hf download qwe123wjb/BudgetDraft-checkpoints --include "main/*" --local-dir ./ckpts |
| ``` |
|
|
| ## Reproducing the paper |
|
|
| Pair these checkpoints with the evaluation code at <https://github.com/ANTI-Tony/BudgetDraft>: |
|
|
| ```bash |
| git clone https://github.com/ANTI-Tony/BudgetDraft.git |
| cd BudgetDraft |
| pip install -r requirements.txt |
| |
| hf download qwe123wjb/BudgetDraft-checkpoints --local-dir ./ckpts |
| make eval-from-release CHECKPOINTS=./ckpts |
| ``` |
|
|
| The eval script orchestrates 96 configurations (main + ablation + λ-sensitivity) across three datasets (PG-19 / LongBench QMSum / NarrativeQA) and three context lengths (4K / 8K / 16K). Headline result: **6.55× speedup at 4K with 79.37% acceptance on NarrativeQA**. |
|
|
| ## Quick load (Python) |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| drafter = AutoModelForCausalLM.from_pretrained( |
| "qwe123wjb/BudgetDraft-checkpoints", |
| subfolder="main", |
| torch_dtype="float32", |
| ) |
| tokenizer = AutoTokenizer.from_pretrained( |
| "qwe123wjb/BudgetDraft-checkpoints", |
| subfolder="main", |
| ) |
| ``` |
|
|
| Pair with the verifier: |
|
|
| ```python |
| verifier = AutoModelForCausalLM.from_pretrained( |
| "NousResearch/Yarn-Llama-2-7b-128k", |
| torch_dtype="float16", |
| ) |
| ``` |
|
|
| ## Citation |
|
|
| Citation TBA after acceptance. See the GitHub repo for the latest. |
|
|