Update model card to benchmark-release style for LLaVA version

e4cbfe9 verified about 2 months ago

2.64 kB

license: apache-2.0
library_name: transformers
pipeline_tag: image-text-to-text
base_model:
  - llava-hf/llava-1.5-7b-hf

ViSpec-LLaVA-1.5-7b (Benchmark Release)

This model repo is part of a multimodal speculative decoding benchmark suite.

Why this repo exists

We maintain a unified benchmark codebase that includes multiple methods (Baseline, EAGLE, EAGLE2, Lookahead, MSD, ViSpec) so users can run training/evaluation more easily under one setup.

The methods are aggregated here for user convenience (shared dataset format, scripts, and metrics).
The original ideas and implementations belong to their respective authors.
This specific Hugging Face repo hosts the ViSpec-LLaVA-1.5-7b checkpoint used in our benchmark runs.

Upstream / Base Model

Base model: llava-hf/llava-1.5-7b-hf
Original ViSpec LLaVA release: JLKang/ViSpec-llava-1.5-7b-hf

Citation

If you use this checkpoint and benchmark, please cite ViSpec and the original methods you compare against.

ViSpec

@inproceedings{vispec,
  title={ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding},
  author={Kang, Jialiang and Shu, Han and Li, Wenshuo and Zhai, Yingjie and Chen, Xinghao},
  booktitle={Annual Conference on Neural Information Processing Systems},
  year={2025}
}

EAGLE / EAGLE2 / EAGLE3

@inproceedings{li2024eagle,
  author = {Yuhui Li and Fangyun Wei and Chao Zhang and Hongyang Zhang},
  title = {{EAGLE}: Speculative Sampling Requires Rethinking Feature Uncertainty},
  booktitle = {International Conference on Machine Learning},
  year = {2024}
}

@inproceedings{li2024eagle2,
  author = {Yuhui Li and Fangyun Wei and Chao Zhang and Hongyang Zhang},
  title = {{EAGLE-2}: Faster Inference of Language Models with Dynamic Draft Trees},
  booktitle = {Empirical Methods in Natural Language Processing},
  year = {2024}
}

@inproceedings{li2025eagle3,
  author = {Yuhui Li and Fangyun Wei and Chao Zhang and Hongyang Zhang},
  title = {{EAGLE-3}: Scaling up Inference Acceleration of Large Language Models via Training-Time Test},
  booktitle = {Annual Conference on Neural Information Processing Systems},
  year = {2025}
}

Other integrated baselines (links)

Lookahead Decoding: https://lmsys.org/blog/2023-11-21-lookahead-decoding/
MSD-LLaVA1.5-7B: https://huggingface.co/lucylyn/MSD-LLaVA1.5-7B
Medusa: https://github.com/FasterDecoding/Medusa

Notes

This model card focuses on benchmark usage and attribution.
For full benchmark code and scripts, please refer to the benchmark repository used in your experiment setup.