---
license: apache-2.0
language:
- en
tags:
- speculative-decoding
- sparse-kv-cache
- long-context
base_model: JackFram/llama-68m
---

# BudgetDraft — Released Checkpoints

Drafter checkpoints for the paper **"BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding"**.

All three checkpoints fine-tune `JackFram/llama-68m` (68M parameters, fp32) for use as a drafter alongside the `NousResearch/Yarn-Llama-2-7b-128k` verifier under a sparse drafter-side KV cache.

## Layout

| Subfolder | Variant | Training loss | Use in paper |
|---|---|---|---|
| `main/` | **A + 0.5·C** (multi-view, λ=0.5) | full-cache + sparse-cache | main checkpoint reported in Table 1 / Fig 3 |
| `aonly/` | **A only** | full-cache only (no sparse branch) | ablation: without the sparse-cache loss |
| `ac/` | **A + C** (λ=1.0) | full-cache + sparse-cache | λ-sensitivity ablation |

Each subfolder is a standard HuggingFace checkpoint (`config.json`, `model.safetensors`, tokenizer files) — load with `AutoModelForCausalLM.from_pretrained(...)`.

## Download

```bash
# Whole repo (~786 MB):
hf download qwe123wjb/BudgetDraft-checkpoints --local-dir ./ckpts

# Just the main checkpoint:
hf download qwe123wjb/BudgetDraft-checkpoints --include "main/*" --local-dir ./ckpts
```

## Reproducing the paper

Pair these checkpoints with the evaluation code at <https://github.com/ANTI-Tony/BudgetDraft>:

```bash
git clone https://github.com/ANTI-Tony/BudgetDraft.git
cd BudgetDraft
pip install -r requirements.txt

hf download qwe123wjb/BudgetDraft-checkpoints --local-dir ./ckpts
make eval-from-release CHECKPOINTS=./ckpts
```

The eval script orchestrates 96 configurations (main + ablation + λ-sensitivity) across three datasets (PG-19 / LongBench QMSum / NarrativeQA) and three context lengths (4K / 8K / 16K). Headline result: **6.55× speedup at 4K with 79.37% acceptance on NarrativeQA**.

## Quick load (Python)

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

drafter = AutoModelForCausalLM.from_pretrained(
    "qwe123wjb/BudgetDraft-checkpoints",
    subfolder="main",
    torch_dtype="float32",
)
tokenizer = AutoTokenizer.from_pretrained(
    "qwe123wjb/BudgetDraft-checkpoints",
    subfolder="main",
)
```

Pair with the verifier:

```python
verifier = AutoModelForCausalLM.from_pretrained(
    "NousResearch/Yarn-Llama-2-7b-128k",
    torch_dtype="float16",
)
```

## Citation

Citation TBA after acceptance. See the GitHub repo for the latest.