metadata
license: apache-2.0
library_name: transformers
pipeline_tag: image-text-to-text
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
tags:
- speculative-decoding
- multimodal
- qwen2-vl
- mmspec
MSD-Qwen2.5-VL-7B-Instruct (Benchmark Release)
This model repo is part of a multimodal speculative decoding benchmark suite.
Why this repo exists
We maintain a unified benchmark codebase that includes multiple methods (Baseline, EAGLE, EAGLE2, Lookahead, MSD, ViSpec) so users can run training/evaluation more easily under one setup.
- The methods are aggregated here for user convenience (shared dataset format, scripts, and metrics).
- The original ideas and implementations belong to their respective authors.
- This specific Hugging Face repo hosts the MSD-Qwen2.5-VL-7B-Instruct checkpoint used in our benchmark runs.
Upstream / Base Model
- Base model:
Qwen/Qwen2.5-VL-7B-Instruct - Original MSD Qwen checkpoint:
lucylyn/MSD-Qwen2VL-7B-Instruct
What is in this repo
config.jsonpytorch_model.bin
This checkpoint is intended to be loaded as the MSD speculative model together with the base model above (not as a standalone complete replacement for base model + processor/tokenizer assets).
Example usage (benchmark codebase)
python -m evaluation.eval_msd_mmspec \
--base-model-path Qwen/Qwen2.5-VL-7B-Instruct \
--msd-model-path Cloudriver/MSD-Qwen2.5-VL-7B-Instruct \
--data-folder dataset/MMSpec/testmini \
--answer-file results/mmspec_testmini/msd-temperature-0.jsonl \
--model-id msd-qwen2.5-vl-7b \
--temperature 0 \
--use-msd \
--total-token -1 \
--depth 5 \
--top-k 10
Method references
- MSD-LLaVA checkpoint: https://huggingface.co/lucylyn/MSD-LLaVA1.5-7B
- MSD-Qwen checkpoint: https://huggingface.co/lucylyn/MSD-Qwen2VL-7B-Instruct
- ViSpec: https://arxiv.org/abs/2509.15235
- Lookahead Decoding: https://lmsys.org/blog/2023-11-21-lookahead-decoding/
- Medusa: https://github.com/FasterDecoding/Medusa
Citation
If you use this checkpoint and benchmark, please cite the original MSD method/checkpoint and the baseline methods you compare against.
EAGLE / EAGLE2 / EAGLE3
@inproceedings{li2024eagle,
author = {Yuhui Li and Fangyun Wei and Chao Zhang and Hongyang Zhang},
title = {{EAGLE}: Speculative Sampling Requires Rethinking Feature Uncertainty},
booktitle = {International Conference on Machine Learning},
year = {2024}
}
@inproceedings{li2024eagle2,
author = {Yuhui Li and Fangyun Wei and Chao Zhang and Hongyang Zhang},
title = {{EAGLE-2}: Faster Inference of Language Models with Dynamic Draft Trees},
booktitle = {Empirical Methods in Natural Language Processing},
year = {2024}
}
@inproceedings{li2025eagle3,
author = {Yuhui Li and Fangyun Wei and Chao Zhang and Hongyang Zhang},
title = {{EAGLE-3}: Scaling up Inference Acceleration of Large Language Models via Training-Time Test},
booktitle = {Annual Conference on Neural Information Processing Systems},
year = {2025}
}
Notes
- This model card focuses on benchmark usage and attribution.
- For full benchmark code and scripts, please refer to the benchmark repository used in your experiment setup.