V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval

The model was presented in the paper V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval.

Paper abstract

The abstract of the paper is the following:

Multimodal Large Language Models (MLLMs) have recently been applied to universal multimodal retrieval, where Chain-of-Thought (CoT) reasoning improves candidate reranking. However, existing approaches remain largely language-driven, relying on static visual encodings and lacking the ability to actively verify fine-grained visual evidence, which often leads to speculative reasoning in visually ambiguous cases. We propose V-Retrver, an evidence-driven retrieval framework that reformulates multimodal retrieval as an agentic reasoning process grounded in visual inspection. V-Retrver enables an MLLM to selectively acquire visual evidence during reasoning via external visual tools, performing a multimodal interleaved reasoning process that alternates between hypothesis generation and targeted visual verification. To train such an evidence-gathering retrieval agent, we adopt a curriculum-based learning strategy combining supervised reasoning activation, rejection-based refinement, and reinforcement learning with an evidence-aligned objective. Experiments across multiple multimodal retrieval benchmarks demonstrate consistent improvements in retrieval accuracy (with 23.0% improvements on average), perception-driven reasoning reliability, and generalization.

Content

This is the repository for V-Retrver-SFT-7B (https://arxiv.org/pdf/2602.06034). For training and evaluation, please refer to the Code: V-Retrver.

Downloads last month: 16

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for V-Retrver/V-Retrver-SFT-7B

Quantizations

1 model

Paper for V-Retrver/V-Retrver-SFT-7B

V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval

Paper • 2602.06034 • Published Feb 5 • 8