aimagelab
/

ReT2-M2KR-OpenCLIP-ViT-H

Visual Document Retrieval

Model card Files Files and versions

ReT2-M2KR-OpenCLIP-ViT-H / README.md

dcaffo's picture

Update README.md

c1b36aa verified 3 months ago

|

history blame contribute delete

1.18 kB

	---
	library_name: transformers
	license: apache-2.0
	datasets:
	- aimagelab/ReT-M2KR
	base_model:
	- laion/CLIP-ViT-H-14-laion2B-s32B-b79K
	pipeline_tag: visual-document-retrieval
	---

	# Model Card: ReT-2

	Official implementation of ReT-2: Recurrence Meets Transformers for Universal Multimodal Retrieval.

	This model features visual and textual backbones based on [laion/CLIP-ViT-H-14-laion2B-s32B-b79K](https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K).
	<br>The backbones have been fine-tuned on the M2KR dataset.


	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: https://github.com/aimagelab/ReT-2
	- Paper: [Recurrence Meets Transformers for Universal Multimodal Retrieval](https://arxiv.org/abs/2509.08897)


	### Training Data
	[aimagelab/ReT-M2KR](https://huggingface.co/datasets/aimagelab/ReT-M2KR)


	## Citation
	```
	@article{caffagni2025recurrencemeetstransformers,
	title={{Recurrence Meets Transformers for Universal Multimodal Retrieval}},
	author={Davide Caffagni and Sara Sarto and Marcella Cornia and Lorenzo Baraldi and Rita Cucchiara},
	journal={arXiv preprint arXiv:2509.08897},
	year={2025}
	}
	```