PLUME-Qwen2-VL-2B / README.md
CUDAOUTOFMEMORY's picture
Add metadata and improve model card (#1)
6cf1262
metadata
base_model:
  - Qwen/Qwen2-VL-2B-Instruct
datasets:
  - VLM2Vec/MMEB-V2
language:
  - en
library_name: transformers
pipeline_tag: feature-extraction

PLUME-Qwen2-VL-2B

PLUME: Latent Reasoning Based Universal Multimodal Embedding

PLUME is a latent reasoning framework for universal multimodal embedding (UME). It replaces explicit chain-of-thought (CoT) generation with a short autoregressive rollout of continuous latent states, achieving stronger retrieval performance while delivering over 30x faster inference compared to explicit-CoT methods.

Project Page | Paper | Code

Highlights

  • Replaces hundreds of explicit reasoning tokens with only 8 latent steps
  • 30.3x faster inference than UME-R1 (298ms vs 9023ms per sample)
  • 61.6 overall on the 78-task MMEB-v2 benchmark, surpassing UME-R1 (60.1) and VLM2Vec-V2 (58.0)
  • Particularly strong on Video (+1.9 vs UME-R1) and Visual Document (+3.6 vs UME-R1) retrieval

Results on MMEB-v2

Model Image Video VisDoc All
VLM2Vec-V2 64.9 34.9 65.4 58.0
UME-R1 66.6 42.2 63.9 60.1
PLUME 66.3 44.1 67.5 61.6

Usage

See the full training and evaluation pipeline at: https://github.com/haoxiangzhao12138/PLUME

Download

# Option 1: huggingface-cli
huggingface-cli download CUDAOUTOFMEMORY/PLUME-Qwen2-VL-2B --local-dir /path/to/model

# Option 2: git clone (requires git-lfs)
git lfs install
git clone https://huggingface.co/CUDAOUTOFMEMORY/PLUME-Qwen2-VL-2B

Citation

@misc{he2026plumelatentreasoningbased,
      title={PLUME: Latent Reasoning Based Universal Multimodal Embedding},
      author={Chenwei He and Xiangzhao Hao and Tianyu Yang and Yuxiang Ma and Yuheng Jia and Lingxiang Wu and Chaoyang Zhao and Haiyun Guo and Jinqiao Wang},
      year={2026},
      eprint={2604.02073},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2604.02073},
}