--- pipeline_tag: visual-document-retrieval library_name: transformers license: apache-2.0 --- # ModernVBERT: Towards Smaller Visual Document Retrievers 👁️ [](https://huggingface.co/papers/2510.01149) [](https://huggingface.co/ModernVBERT) [](https://github.com/illuin-tech/modernvbert) [](https://huggingface.co/blog/paultltc/modernvbert) This repository contains the **ModernVBERT** model, a compact 250M-parameter vision-language encoder designed for efficient Visual Document Retrieval (VDR). As presented in the paper "[ModernVBERT: Towards Smaller Visual Document Retrievers](https://huggingface.co/papers/2510.01149)", this model establishes a principled recipe for improving VDR models by revisiting the entire training pipeline. It outperforms models up to 10 times larger while enabling efficient inference on CPU hardware, significantly reducing latency and costs. Key factors measured for improvement include attention masking, image resolution, modality alignment data regimes, and late interaction-centered contrastive objectives.