|
|
--- |
|
|
library_name: transformers |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- aimagelab/ReT-M2KR |
|
|
base_model: |
|
|
- laion/CLIP-ViT-H-14-laion2B-s32B-b79K |
|
|
pipeline_tag: visual-document-retrieval |
|
|
--- |
|
|
|
|
|
# Model Card: ReT-2 |
|
|
|
|
|
Official implementation of ReT-2: Recurrence Meets Transformers for Universal Multimodal Retrieval. |
|
|
|
|
|
This model features visual and textual backbones based on [laion/CLIP-ViT-H-14-laion2B-s32B-b79K](https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K). |
|
|
<br>The backbones have been fine-tuned on the M2KR dataset. |
|
|
|
|
|
|
|
|
### Model Sources |
|
|
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
|
|
- **Repository:** https://github.com/aimagelab/ReT-2 |
|
|
- **Paper:** [Recurrence Meets Transformers for Universal Multimodal Retrieval](https://arxiv.org/abs/2509.08897) |
|
|
|
|
|
|
|
|
### Training Data |
|
|
[aimagelab/ReT-M2KR](https://huggingface.co/datasets/aimagelab/ReT-M2KR) |
|
|
|
|
|
|
|
|
## Citation |
|
|
``` |
|
|
@article{caffagni2025recurrencemeetstransformers, |
|
|
title={{Recurrence Meets Transformers for Universal Multimodal Retrieval}}, |
|
|
author={Davide Caffagni and Sara Sarto and Marcella Cornia and Lorenzo Baraldi and Rita Cucchiara}, |
|
|
journal={arXiv preprint arXiv:2509.08897}, |
|
|
year={2025} |
|
|
} |
|
|
``` |