File size: 5,289 Bytes
eb454ee |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
---
license: mit
base_model: ModernVBERT/colmodernvbert
library_name: fastembed
tags:
- colpali
- vision-language
- multimodal
- onnx
- fastembed
- document-retrieval
pipeline_tag: feature-extraction
---
# ColModernVBERT ONNX
This repository contains an ONNX export of the [ModernVBERT/colmodernvbert](https://huggingface.co/ModernVBERT/colmodernvbert) model, converted by Qdrant for efficient inference with [FastEmbed](https://github.com/qdrant/fastembed).
**Original Model Authors:** Paul Teiletche, Quentin Macé, Max Conti, Antonio Loison, Gautier Viaud, Pierre Colombo, Manuel Faysse
**ONNX Conversion:** Kacper Łukawski (Qdrant)
## About This Repository
This is an ONNX-converted version of ColModernVBERT, optimized for deployment with FastEmbed. The original model was developed by the authors listed above and is described in detail in their [paper](https://arxiv.org/abs/2510.01149).
**What's Different:**
- Model format: PyTorch → ONNX
- Optimization: Configured for efficient CPU/GPU inference
- Integration: Ready for use with FastEmbed's multimodal embedding APIs
**What's Preserved:**
- Model architecture and weights
- Tokenizer and processor configurations
- Image preprocessing pipeline
- Embedding dimensions and behavior
## Model Description
ColModernVBERT is a vision-language model based on the ModernVBERT architecture, designed for document retrieval tasks. The [ModernVBERT paper](https://arxiv.org/abs/2510.01149) demonstrates that this 250M-parameter model achieves state-of-the-art performance in its size class, matching models up to 10x larger on visual document retrieval benchmarks.
It combines:
- ModernBERT text encoder (jhu-clsp/ettin-encoder-150m)
- SigLIP vision encoder (google/siglip2-base-patch16-512)
- Custom projection layer reducing to 128 dimensions
## Model Details
- **Architecture**: ColModernVBert
- **Projection Dimension**: 128
- **Max Sequence Length**: 8192 tokens
- **Image Size**: 512x512 (resized by processor)
- **Image Token ID**: 50407
- **Format**: ONNX (exported from PyTorch)
## Usage
This ONNX model is designed for use with FastEmbed for efficient document and query encoding.
### With FastEmbed
```python
from fastembed import LateInteractionMultimodalEmbedding
from PIL import Image
# Initialize the model
model = LateInteractionMultimodalEmbedding(
model_name="Qdrant/colmodernvbert"
)
# Encode documents (images)
images = [Image.open("document1.png"), Image.open("document2.png")]
doc_embeddings = model.embed_image(images)
# Encode queries (text)
queries = ["What is the total amount?", "Find the invoice date"]
query_embeddings = model.embed_text(queries)
```
For more details on using this model with FastEmbed, see the [FastEmbed documentation](https://github.com/qdrant/fastembed).
### Original PyTorch Version
For the original PyTorch implementation with additional features, please refer to the [original model repository](https://huggingface.co/ModernVBERT/colmodernvbert) and the [colpali-engine library](https://github.com/illuin-tech/colpali).
## Performance
ColModernVBERT achieves competitive performance with models up to 10x larger on visual document retrieval tasks. For detailed benchmarks and evaluation results, please refer to the [original paper](https://arxiv.org/abs/2510.01149) and [model card](https://huggingface.co/ModernVBERT/colmodernvbert).
**Note:** ONNX inference performance may differ slightly from PyTorch due to optimizations and runtime differences. The model architecture and weights are preserved to maintain embedding quality.
## License
This model is released under the MIT License, consistent with the original ColModernVBERT model.
**Original Model License:**
- Copyright (c) 2025 Paul Teiletche, Quentin Macé, Max Conti, Antonio Loison, Gautier Viaud, Pierre Colombo, Manuel Faysse
- Licensed under MIT License
**ONNX Conversion:**
- Copyright (c) 2025 Qdrant
- Licensed under MIT License
See the [LICENSE](LICENSE) file for the full license text.
## Citation
If you use this model in your work, please cite the original ModernVBERT paper:
```bibtex
@misc{teiletche2025modernvbertsmallervisualdocument,
title={ModernVBERT: Towards Smaller Visual Document Retrievers},
author={Paul Teiletche and Quentin Macé and Max Conti and Antonio Loison and Gautier Viaud and Pierre Colombo and Manuel Faysse},
year={2025},
eprint={2510.01149},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2510.01149},
}
```
If you use this ONNX conversion specifically, you may also acknowledge:
```bibtex
@software{qdrant_colmodernvbert_onnx,
title={ColModernVBERT ONNX - FastEmbed Integration},
author={Qdrant},
year={2025},
url={https://huggingface.co/Qdrant/colmodernvbert},
note={ONNX conversion of ModernVBERT/colmodernvbert}
}
```
## Acknowledgments
This work builds upon the excellent ColModernVBERT model developed by Paul Teiletche, Quentin Macé, Max Conti, Antonio Loison, Gautier Viaud, Pierre Colombo, and Manuel Faysse. We thank them for releasing their work under the MIT License, enabling derivative works like this ONNX conversion.
The ONNX conversion was performed by the Qdrant team to enable efficient deployment through the FastEmbed library.
|