qwe123wjb's picture
Add README
e2333e4 verified
---
license: apache-2.0
language:
- en
tags:
- speculative-decoding
- sparse-kv-cache
- long-context
base_model: JackFram/llama-68m
---
# BudgetDraft — Released Checkpoints
Drafter checkpoints for the paper **"BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding"**.
All three checkpoints fine-tune `JackFram/llama-68m` (68M parameters, fp32) for use as a drafter alongside the `NousResearch/Yarn-Llama-2-7b-128k` verifier under a sparse drafter-side KV cache.
## Layout
| Subfolder | Variant | Training loss | Use in paper |
|---|---|---|---|
| `main/` | **A + 0.5·C** (multi-view, λ=0.5) | full-cache + sparse-cache | main checkpoint reported in Table 1 / Fig 3 |
| `aonly/` | **A only** | full-cache only (no sparse branch) | ablation: without the sparse-cache loss |
| `ac/` | **A + C** (λ=1.0) | full-cache + sparse-cache | λ-sensitivity ablation |
Each subfolder is a standard HuggingFace checkpoint (`config.json`, `model.safetensors`, tokenizer files) — load with `AutoModelForCausalLM.from_pretrained(...)`.
## Download
```bash
# Whole repo (~786 MB):
hf download qwe123wjb/BudgetDraft-checkpoints --local-dir ./ckpts
# Just the main checkpoint:
hf download qwe123wjb/BudgetDraft-checkpoints --include "main/*" --local-dir ./ckpts
```
## Reproducing the paper
Pair these checkpoints with the evaluation code at <https://github.com/ANTI-Tony/BudgetDraft>:
```bash
git clone https://github.com/ANTI-Tony/BudgetDraft.git
cd BudgetDraft
pip install -r requirements.txt
hf download qwe123wjb/BudgetDraft-checkpoints --local-dir ./ckpts
make eval-from-release CHECKPOINTS=./ckpts
```
The eval script orchestrates 96 configurations (main + ablation + λ-sensitivity) across three datasets (PG-19 / LongBench QMSum / NarrativeQA) and three context lengths (4K / 8K / 16K). Headline result: **6.55× speedup at 4K with 79.37% acceptance on NarrativeQA**.
## Quick load (Python)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
drafter = AutoModelForCausalLM.from_pretrained(
"qwe123wjb/BudgetDraft-checkpoints",
subfolder="main",
torch_dtype="float32",
)
tokenizer = AutoTokenizer.from_pretrained(
"qwe123wjb/BudgetDraft-checkpoints",
subfolder="main",
)
```
Pair with the verifier:
```python
verifier = AutoModelForCausalLM.from_pretrained(
"NousResearch/Yarn-Llama-2-7b-128k",
torch_dtype="float16",
)
```
## Citation
Citation TBA after acceptance. See the GitHub repo for the latest.