|
|
--- |
|
|
license: mit |
|
|
library_name: colpali |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- colpali |
|
|
- vidore-experimental |
|
|
- vidore |
|
|
pipeline_tag: visual-document-retrieval |
|
|
--- |
|
|
|
|
|
|
|
|
# ColModernVBERT |
|
|
|
|
|
 |
|
|
|
|
|
## Usage |
|
|
|
|
|
> [!WARNING] |
|
|
> This version should not be used: it is solely the base version useful for deterministic LoRA initialization. |
|
|
> |
|
|
|
|
|
## Table of Contents |
|
|
1. [Overview](#overview) |
|
|
2. [Usage](#Usage) |
|
|
3. [Evaluation](#Evaluation) |
|
|
4. [License](#license) |
|
|
5. [Citation](#citation) |
|
|
|
|
|
## Overview |
|
|
|
|
|
The [ModernVBERT](https://arxiv.org/abs/2510.01149) suite is a suite of compact 250M-parameter vision-language encoders, achieving state-of-the-art performance in this size class, matching the performance of models up to 10x larger. |
|
|
|
|
|
For more information about ModernVBERT, please check the [arXiv](https://arxiv.org/abs/2510.01149) preprint. |
|
|
|
|
|
### Models |
|
|
- `ColModernVBERT` is the late-interaction version that is fine-tuned for visual document retrieval tasks, our most performant model on this task. |
|
|
- `BiModernVBERT` is the bi-encoder version that is fine-tuned for visual document retrieval tasks. |
|
|
- `ModernVBERT-embed` is the bi-encoder version after modality alignment (using a MLM objective) and contrastive learning, without document specialization. |
|
|
- `ModernVBERT` is the base model after modality alignment (using a MLM objective). |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
 |
|
|
|
|
|
ColModernVBERT matches the performance of models nearly 10x larger on visual document benchmarks. Additionally, it provides an interesting inference speed on CPU compared to the models of similar performance. |
|
|
|
|
|
## License |
|
|
|
|
|
We release the ModernVBERT model architectures, model weights, and training codebase under the MIT license. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use ModernVBERT in your work, please cite: |
|
|
|
|
|
``` |
|
|
@misc{teiletche2025modernvbertsmallervisualdocument, |
|
|
title={ModernVBERT: Towards Smaller Visual Document Retrievers}, |
|
|
author={Paul Teiletche and Quentin Macé and Max Conti and Antonio Loison and Gautier Viaud and Pierre Colombo and Manuel Faysse}, |
|
|
year={2025}, |
|
|
eprint={2510.01149}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.IR}, |
|
|
url={https://arxiv.org/abs/2510.01149}, |
|
|
} |
|
|
``` |
|
|
[More Information Needed] |