Improve model card: add pipeline tag, library name, and HF paper link
Browse filesThis PR enhances the model card by:
- Adding the `pipeline_tag: any-to-any` to accurately reflect its capabilities in universal multimodal retrieval.
- Specifying `library_name: transformers` as the model is compatible with the Hugging Face Transformers library.
- Linking directly to the Hugging Face paper page ([U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs](https://huggingface.co/papers/2507.14902)) for easier access and discoverability within the Hugging Face Hub.
- Including the LaTeX citation information from the project's GitHub repository for proper attribution.
These additions improve the model's discoverability on the Hub and provide more comprehensive information to users.
README.md
CHANGED
|
@@ -1,14 +1,18 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
| 3 |
datasets:
|
| 4 |
- TIGER-Lab/M-BEIR
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
-
|
| 8 |
-
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
-
## U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding
|
|
|
|
|
|
|
| 12 |
|
| 13 |
Universal multimodal retrieval (UMR) addresses complex retrieval tasks involving diverse modalities for both queries and candidates. Despite the success of state-of-the-art methods based on multimodal large language models (MLLMs) using contrastive learning principles, the mechanisms underlying their retrieval capabilities remain largely unexplored. This gap potentially leads to suboptimal performance and limited generalization ability.
|
| 14 |
|
|
@@ -91,4 +95,15 @@ single-model architectures and recall-then-rerank approaches on M-BEIR benchmark
|
|
| 91 |
|
| 92 |
## Acknowledgements
|
| 93 |
|
| 94 |
-
Many thanks to the code bases from **[
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- Qwen/Qwen2-VL-7B-Instruct
|
| 4 |
datasets:
|
| 5 |
- TIGER-Lab/M-BEIR
|
| 6 |
language:
|
| 7 |
- en
|
| 8 |
+
license: apache-2.0
|
| 9 |
+
pipeline_tag: any-to-any
|
| 10 |
+
library_name: transformers
|
| 11 |
---
|
| 12 |
|
| 13 |
+
## U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs
|
| 14 |
+
|
| 15 |
+
This repository contains the official model checkpoints and inference code for **U-MARVEL**, presented in the paper [U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs](https://huggingface.co/papers/2507.14902).
|
| 16 |
|
| 17 |
Universal multimodal retrieval (UMR) addresses complex retrieval tasks involving diverse modalities for both queries and candidates. Despite the success of state-of-the-art methods based on multimodal large language models (MLLMs) using contrastive learning principles, the mechanisms underlying their retrieval capabilities remain largely unexplored. This gap potentially leads to suboptimal performance and limited generalization ability.
|
| 18 |
|
|
|
|
| 95 |
|
| 96 |
## Acknowledgements
|
| 97 |
|
| 98 |
+
Many thanks to the code bases from **[LamRA](https://github.com/Code-kunkun/LamRA)** .
|
| 99 |
+
|
| 100 |
+
## Citation
|
| 101 |
+
If you use this code for your research or project, please cite:
|
| 102 |
+
```latex
|
| 103 |
+
@article{li2025umarvel,
|
| 104 |
+
title={U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs},
|
| 105 |
+
author={Li, Xiaojie and Li, Chu and Chen, Shi-Zhe and Chen, Xi},
|
| 106 |
+
journal={arXiv preprint arXiv:2507.14902},
|
| 107 |
+
year={2025}
|
| 108 |
+
}
|
| 109 |
+
```
|