File size: 2,387 Bytes
42b0ee2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
license: mit
library_name: colpali
language:
- en
tags:
- colpali
- vidore-experimental
- vidore
pipeline_tag: visual-document-retrieval
---


# ColModernVBERT

![bg](https://cdn-uploads.huggingface.co/production/uploads/661e945eebe3616a1b09e279/QfGYAqoq_TGcXRHh6UMaq.png)

## Usage

> [!WARNING]
> This version should not be used: it is solely the base version useful for deterministic LoRA initialization.
>

## Table of Contents
1. [Overview](#overview)
2. [Usage](#Usage)
3. [Evaluation](#Evaluation)
4. [License](#license)
5. [Citation](#citation)

## Overview

The [ModernVBERT](https://arxiv.org/abs/2510.01149) suite is a suite of compact 250M-parameter vision-language encoders, achieving state-of-the-art performance in this size class, matching the performance of models up to 10x larger.

For more information about ModernVBERT, please check the [arXiv](https://arxiv.org/abs/2510.01149) preprint.

### Models
- `ColModernVBERT` is the late-interaction version that is fine-tuned for visual document retrieval tasks, our most performant model on this task.
- `BiModernVBERT` is the bi-encoder version that is fine-tuned for visual document retrieval tasks.
- `ModernVBERT-embed` is the bi-encoder version after modality alignment (using a MLM objective) and contrastive learning, without document specialization.
- `ModernVBERT` is the base model after modality alignment (using a MLM objective).

## Evaluation

![table](https://cdn-uploads.huggingface.co/production/uploads/661e945eebe3616a1b09e279/NLB0bdE8tAAWXnCK6vjjS.png)

ColModernVBERT matches the performance of models nearly 10x larger on visual document benchmarks. Additionally, it provides an interesting inference speed on CPU compared to the models of similar performance.

## License

We release the ModernVBERT model architectures, model weights, and training codebase under the MIT license.

## Citation

If you use ModernVBERT in your work, please cite:

```
@misc{teiletche2025modernvbertsmallervisualdocument,
      title={ModernVBERT: Towards Smaller Visual Document Retrievers}, 
      author={Paul Teiletche and Quentin Macé and Max Conti and Antonio Loison and Gautier Viaud and Pierre Colombo and Manuel Faysse},
      year={2025},
      eprint={2510.01149},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2510.01149}, 
}
```
[More Information Needed]