--- license: apache-2.0 language: - en tags: - speculative-decoding - sparse-kv-cache - long-context base_model: JackFram/llama-68m --- # BudgetDraft — Released Checkpoints Drafter checkpoints for the paper **"BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding"**. All three checkpoints fine-tune `JackFram/llama-68m` (68M parameters, fp32) for use as a drafter alongside the `NousResearch/Yarn-Llama-2-7b-128k` verifier under a sparse drafter-side KV cache. ## Layout | Subfolder | Variant | Training loss | Use in paper | |---|---|---|---| | `main/` | **A + 0.5·C** (multi-view, λ=0.5) | full-cache + sparse-cache | main checkpoint reported in Table 1 / Fig 3 | | `aonly/` | **A only** | full-cache only (no sparse branch) | ablation: without the sparse-cache loss | | `ac/` | **A + C** (λ=1.0) | full-cache + sparse-cache | λ-sensitivity ablation | Each subfolder is a standard HuggingFace checkpoint (`config.json`, `model.safetensors`, tokenizer files) — load with `AutoModelForCausalLM.from_pretrained(...)`. ## Download ```bash # Whole repo (~786 MB): hf download qwe123wjb/BudgetDraft-checkpoints --local-dir ./ckpts # Just the main checkpoint: hf download qwe123wjb/BudgetDraft-checkpoints --include "main/*" --local-dir ./ckpts ``` ## Reproducing the paper Pair these checkpoints with the evaluation code at : ```bash git clone https://github.com/ANTI-Tony/BudgetDraft.git cd BudgetDraft pip install -r requirements.txt hf download qwe123wjb/BudgetDraft-checkpoints --local-dir ./ckpts make eval-from-release CHECKPOINTS=./ckpts ``` The eval script orchestrates 96 configurations (main + ablation + λ-sensitivity) across three datasets (PG-19 / LongBench QMSum / NarrativeQA) and three context lengths (4K / 8K / 16K). Headline result: **6.55× speedup at 4K with 79.37% acceptance on NarrativeQA**. ## Quick load (Python) ```python from transformers import AutoModelForCausalLM, AutoTokenizer drafter = AutoModelForCausalLM.from_pretrained( "qwe123wjb/BudgetDraft-checkpoints", subfolder="main", torch_dtype="float32", ) tokenizer = AutoTokenizer.from_pretrained( "qwe123wjb/BudgetDraft-checkpoints", subfolder="main", ) ``` Pair with the verifier: ```python verifier = AutoModelForCausalLM.from_pretrained( "NousResearch/Yarn-Llama-2-7b-128k", torch_dtype="float16", ) ``` ## Citation Citation TBA after acceptance. See the GitHub repo for the latest.