Cloudriver's picture
Add model card README for MSD-Qwen2.5-VL-7B-Instruct
20715f9 verified
metadata
license: apache-2.0
library_name: transformers
pipeline_tag: image-text-to-text
base_model:
  - Qwen/Qwen2.5-VL-7B-Instruct
tags:
  - speculative-decoding
  - multimodal
  - qwen2-vl
  - mmspec

MSD-Qwen2.5-VL-7B-Instruct (Benchmark Release)

This model repo is part of a multimodal speculative decoding benchmark suite.

Why this repo exists

We maintain a unified benchmark codebase that includes multiple methods (Baseline, EAGLE, EAGLE2, Lookahead, MSD, ViSpec) so users can run training/evaluation more easily under one setup.

  • The methods are aggregated here for user convenience (shared dataset format, scripts, and metrics).
  • The original ideas and implementations belong to their respective authors.
  • This specific Hugging Face repo hosts the MSD-Qwen2.5-VL-7B-Instruct checkpoint used in our benchmark runs.

Upstream / Base Model

  • Base model: Qwen/Qwen2.5-VL-7B-Instruct
  • Original MSD Qwen checkpoint: lucylyn/MSD-Qwen2VL-7B-Instruct

What is in this repo

  • config.json
  • pytorch_model.bin

This checkpoint is intended to be loaded as the MSD speculative model together with the base model above (not as a standalone complete replacement for base model + processor/tokenizer assets).

Example usage (benchmark codebase)

python -m evaluation.eval_msd_mmspec \
  --base-model-path Qwen/Qwen2.5-VL-7B-Instruct \
  --msd-model-path Cloudriver/MSD-Qwen2.5-VL-7B-Instruct \
  --data-folder dataset/MMSpec/testmini \
  --answer-file results/mmspec_testmini/msd-temperature-0.jsonl \
  --model-id msd-qwen2.5-vl-7b \
  --temperature 0 \
  --use-msd \
  --total-token -1 \
  --depth 5 \
  --top-k 10

Method references

Citation

If you use this checkpoint and benchmark, please cite the original MSD method/checkpoint and the baseline methods you compare against.

EAGLE / EAGLE2 / EAGLE3

@inproceedings{li2024eagle,
  author = {Yuhui Li and Fangyun Wei and Chao Zhang and Hongyang Zhang},
  title = {{EAGLE}: Speculative Sampling Requires Rethinking Feature Uncertainty},
  booktitle = {International Conference on Machine Learning},
  year = {2024}
}

@inproceedings{li2024eagle2,
  author = {Yuhui Li and Fangyun Wei and Chao Zhang and Hongyang Zhang},
  title = {{EAGLE-2}: Faster Inference of Language Models with Dynamic Draft Trees},
  booktitle = {Empirical Methods in Natural Language Processing},
  year = {2024}
}

@inproceedings{li2025eagle3,
  author = {Yuhui Li and Fangyun Wei and Chao Zhang and Hongyang Zhang},
  title = {{EAGLE-3}: Scaling up Inference Acceleration of Large Language Models via Training-Time Test},
  booktitle = {Annual Conference on Neural Information Processing Systems},
  year = {2025}
}

Notes

  • This model card focuses on benchmark usage and attribution.
  • For full benchmark code and scripts, please refer to the benchmark repository used in your experiment setup.