--- license: mit base_model: ModernVBERT/colmodernvbert library_name: fastembed tags: - colpali - vision-language - multimodal - onnx - fastembed - document-retrieval pipeline_tag: feature-extraction --- # ColModernVBERT ONNX This repository contains an ONNX export of the [ModernVBERT/colmodernvbert](https://huggingface.co/ModernVBERT/colmodernvbert) model, converted by Qdrant for efficient inference with [FastEmbed](https://github.com/qdrant/fastembed). **Original Model Authors:** Paul Teiletche, Quentin Macé, Max Conti, Antonio Loison, Gautier Viaud, Pierre Colombo, Manuel Faysse **ONNX Conversion:** Kacper Łukawski (Qdrant) ## About This Repository This is an ONNX-converted version of ColModernVBERT, optimized for deployment with FastEmbed. The original model was developed by the authors listed above and is described in detail in their [paper](https://arxiv.org/abs/2510.01149). **What's Different:** - Model format: PyTorch → ONNX - Optimization: Configured for efficient CPU/GPU inference - Integration: Ready for use with FastEmbed's multimodal embedding APIs **What's Preserved:** - Model architecture and weights - Tokenizer and processor configurations - Image preprocessing pipeline - Embedding dimensions and behavior ## Model Description ColModernVBERT is a vision-language model based on the ModernVBERT architecture, designed for document retrieval tasks. The [ModernVBERT paper](https://arxiv.org/abs/2510.01149) demonstrates that this 250M-parameter model achieves state-of-the-art performance in its size class, matching models up to 10x larger on visual document retrieval benchmarks. It combines: - ModernBERT text encoder (jhu-clsp/ettin-encoder-150m) - SigLIP vision encoder (google/siglip2-base-patch16-512) - Custom projection layer reducing to 128 dimensions ## Model Details - **Architecture**: ColModernVBert - **Projection Dimension**: 128 - **Max Sequence Length**: 8192 tokens - **Image Size**: 512x512 (resized by processor) - **Image Token ID**: 50407 - **Format**: ONNX (exported from PyTorch) ## Usage This ONNX model is designed for use with FastEmbed for efficient document and query encoding. ### With FastEmbed ```python from fastembed import LateInteractionMultimodalEmbedding from PIL import Image # Initialize the model model = LateInteractionMultimodalEmbedding( model_name="Qdrant/colmodernvbert" ) # Encode documents (images) images = [Image.open("document1.png"), Image.open("document2.png")] doc_embeddings = model.embed_image(images) # Encode queries (text) queries = ["What is the total amount?", "Find the invoice date"] query_embeddings = model.embed_text(queries) ``` For more details on using this model with FastEmbed, see the [FastEmbed documentation](https://github.com/qdrant/fastembed). ### Original PyTorch Version For the original PyTorch implementation with additional features, please refer to the [original model repository](https://huggingface.co/ModernVBERT/colmodernvbert) and the [colpali-engine library](https://github.com/illuin-tech/colpali). ## Performance ColModernVBERT achieves competitive performance with models up to 10x larger on visual document retrieval tasks. For detailed benchmarks and evaluation results, please refer to the [original paper](https://arxiv.org/abs/2510.01149) and [model card](https://huggingface.co/ModernVBERT/colmodernvbert). **Note:** ONNX inference performance may differ slightly from PyTorch due to optimizations and runtime differences. The model architecture and weights are preserved to maintain embedding quality. ## License This model is released under the MIT License, consistent with the original ColModernVBERT model. **Original Model License:** - Copyright (c) 2025 Paul Teiletche, Quentin Macé, Max Conti, Antonio Loison, Gautier Viaud, Pierre Colombo, Manuel Faysse - Licensed under MIT License **ONNX Conversion:** - Copyright (c) 2025 Qdrant - Licensed under MIT License See the [LICENSE](LICENSE) file for the full license text. ## Citation If you use this model in your work, please cite the original ModernVBERT paper: ```bibtex @misc{teiletche2025modernvbertsmallervisualdocument, title={ModernVBERT: Towards Smaller Visual Document Retrievers}, author={Paul Teiletche and Quentin Macé and Max Conti and Antonio Loison and Gautier Viaud and Pierre Colombo and Manuel Faysse}, year={2025}, eprint={2510.01149}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2510.01149}, } ``` If you use this ONNX conversion specifically, you may also acknowledge: ```bibtex @software{qdrant_colmodernvbert_onnx, title={ColModernVBERT ONNX - FastEmbed Integration}, author={Qdrant}, year={2025}, url={https://huggingface.co/Qdrant/colmodernvbert}, note={ONNX conversion of ModernVBERT/colmodernvbert} } ``` ## Acknowledgments This work builds upon the excellent ColModernVBERT model developed by Paul Teiletche, Quentin Macé, Max Conti, Antonio Loison, Gautier Viaud, Pierre Colombo, and Manuel Faysse. We thank them for releasing their work under the MIT License, enabling derivative works like this ONNX conversion. The ONNX conversion was performed by the Qdrant team to enable efficient deployment through the FastEmbed library.