PLUME: Latent Reasoning Based Universal Multimodal Embedding
Paper • 2604.02073 • Published • 10
PLUME: Latent Reasoning Based Universal Multimodal Embedding
PLUME is a latent reasoning framework for universal multimodal embedding (UME). It replaces explicit chain-of-thought (CoT) generation with a short autoregressive rollout of continuous latent states, achieving stronger retrieval performance while delivering over 30x faster inference compared to explicit-CoT methods.
Project Page | Paper | Code
| Model | Image | Video | VisDoc | All |
|---|---|---|---|---|
| VLM2Vec-V2 | 64.9 | 34.9 | 65.4 | 58.0 |
| UME-R1 | 66.6 | 42.2 | 63.9 | 60.1 |
| PLUME | 66.3 | 44.1 | 67.5 | 61.6 |
See the full training and evaluation pipeline at: https://github.com/haoxiangzhao12138/PLUME
# Option 1: huggingface-cli
huggingface-cli download CUDAOUTOFMEMORY/PLUME-Qwen2-VL-2B --local-dir /path/to/model
# Option 2: git clone (requires git-lfs)
git lfs install
git clone https://huggingface.co/CUDAOUTOFMEMORY/PLUME-Qwen2-VL-2B
@misc{he2026plumelatentreasoningbased,
title={PLUME: Latent Reasoning Based Universal Multimodal Embedding},
author={Chenwei He and Xiangzhao Hao and Tianyu Yang and Yuxiang Ma and Yuheng Jia and Lingxiang Wu and Chaoyang Zhao and Haiyun Guo and Jinqiao Wang},
year={2026},
eprint={2604.02073},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2604.02073},
}