Improve model card: add pipeline tag, library name, and HF paper link
#2
by
nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,14 +1,18 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
| 3 |
datasets:
|
| 4 |
- TIGER-Lab/M-BEIR
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
-
|
| 8 |
-
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
-
## U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding
|
|
|
|
|
|
|
| 12 |
|
| 13 |
Universal multimodal retrieval (UMR) addresses complex retrieval tasks involving diverse modalities for both queries and candidates. Despite the success of state-of-the-art methods based on multimodal large language models (MLLMs) using contrastive learning principles, the mechanisms underlying their retrieval capabilities remain largely unexplored. This gap potentially leads to suboptimal performance and limited generalization ability.
|
| 14 |
|
|
@@ -91,4 +95,15 @@ single-model architectures and recall-then-rerank approaches on M-BEIR benchmark
|
|
| 91 |
|
| 92 |
## Acknowledgements
|
| 93 |
|
| 94 |
-
Many thanks to the code bases from **[
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- Qwen/Qwen2-VL-7B-Instruct
|
| 4 |
datasets:
|
| 5 |
- TIGER-Lab/M-BEIR
|
| 6 |
language:
|
| 7 |
- en
|
| 8 |
+
license: apache-2.0
|
| 9 |
+
pipeline_tag: any-to-any
|
| 10 |
+
library_name: transformers
|
| 11 |
---
|
| 12 |
|
| 13 |
+
## U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs
|
| 14 |
+
|
| 15 |
+
This repository contains the official model checkpoints and inference code for **U-MARVEL**, presented in the paper [U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs](https://huggingface.co/papers/2507.14902).
|
| 16 |
|
| 17 |
Universal multimodal retrieval (UMR) addresses complex retrieval tasks involving diverse modalities for both queries and candidates. Despite the success of state-of-the-art methods based on multimodal large language models (MLLMs) using contrastive learning principles, the mechanisms underlying their retrieval capabilities remain largely unexplored. This gap potentially leads to suboptimal performance and limited generalization ability.
|
| 18 |
|
|
|
|
| 95 |
|
| 96 |
## Acknowledgements
|
| 97 |
|
| 98 |
+
Many thanks to the code bases from **[LamRA](https://github.com/Code-kunkun/LamRA)** .
|
| 99 |
+
|
| 100 |
+
## Citation
|
| 101 |
+
If you use this code for your research or project, please cite:
|
| 102 |
+
```latex
|
| 103 |
+
@article{li2025umarvel,
|
| 104 |
+
title={U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs},
|
| 105 |
+
author={Li, Xiaojie and Li, Chu and Chen, Shi-Zhe and Chen, Xi},
|
| 106 |
+
journal={arXiv preprint arXiv:2507.14902},
|
| 107 |
+
year={2025}
|
| 108 |
+
}
|
| 109 |
+
```
|