PLUME-Qwen2-VL-2B

PLUME: Latent Reasoning Based Universal Multimodal Embedding

PLUME is a latent reasoning framework for universal multimodal embedding (UME). It replaces explicit chain-of-thought (CoT) generation with a short autoregressive rollout of continuous latent states, achieving stronger retrieval performance while delivering over 30x faster inference compared to explicit-CoT methods.

Project Page | Paper | Code

Highlights

  • Replaces hundreds of explicit reasoning tokens with only 8 latent steps
  • 30.3x faster inference than UME-R1 (298ms vs 9023ms per sample)
  • 61.6 overall on the 78-task MMEB-v2 benchmark, surpassing UME-R1 (60.1) and VLM2Vec-V2 (58.0)
  • Particularly strong on Video (+1.9 vs UME-R1) and Visual Document (+3.6 vs UME-R1) retrieval

Results on MMEB-v2

Model Image Video VisDoc All
VLM2Vec-V2 64.9 34.9 65.4 58.0
UME-R1 66.6 42.2 63.9 60.1
PLUME 66.3 44.1 67.5 61.6

Usage

See the full training and evaluation pipeline at: https://github.com/haoxiangzhao12138/PLUME

Download

# Option 1: huggingface-cli
huggingface-cli download CUDAOUTOFMEMORY/PLUME-Qwen2-VL-2B --local-dir /path/to/model

# Option 2: git clone (requires git-lfs)
git lfs install
git clone https://huggingface.co/CUDAOUTOFMEMORY/PLUME-Qwen2-VL-2B

Citation

@misc{he2026plumelatentreasoningbased,
      title={PLUME: Latent Reasoning Based Universal Multimodal Embedding},
      author={Chenwei He and Xiangzhao Hao and Tianyu Yang and Yuxiang Ma and Yuheng Jia and Lingxiang Wu and Chaoyang Zhao and Haiyun Guo and Jinqiao Wang},
      year={2026},
      eprint={2604.02073},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2604.02073},
}
Downloads last month
51
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CUDAOUTOFMEMORY/PLUME-Qwen2-VL-2B

Base model

Qwen/Qwen2-VL-2B
Finetuned
(344)
this model

Dataset used to train CUDAOUTOFMEMORY/PLUME-Qwen2-VL-2B

Paper for CUDAOUTOFMEMORY/PLUME-Qwen2-VL-2B