vidore
/

ColSmolVLM-base

Model card Files Files and versions

manu commited on Nov 27, 2024

Commit

f2d881f

·

verified ·

1 Parent(s): eb3e4d0

Create README.md

Files changed (1) hide show

README.md +43 -0

README.md ADDED Viewed

	@@ -0,0 +1,43 @@

+---
+base_model: HuggingFaceTB/SmolVLM-Instruct
+language:
+- en
+library_name: colpali
+license: apache-2.0
+---
+# ColSmolVLM: Visual Retriever based on PaliGemma-3B with ColBERT strategy
+ColSmolVLM is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features.
+It is a SmolVLM extension that generates [ColBERT](https://arxiv.org/abs/2004.12832)- style multi-vector representations of text and images.
+It was introduced in the paper [ColPali: Efficient Document Retrieval with Vision Language Models](https://arxiv.org/abs/2407.01449) and first released in [this repository](https://github.com/ManuelFay/colpali)
+This version is the untrained base version to guarantee deterministic projection layer initialization.
+## Usage
+> [!WARNING]
+> This version should not be used: it is solely the base version useful for deterministic LoRA initialization.
+## Contact
+- Manuel Faysse: manuel.faysse@illuin.tech
+- Hugues Sibille: hugues.sibille@illuin.tech
+- Tony Wu: tony.wu@illuin.tech
+## Citation
+If you use any datasets or models from this organization in your research, please cite the original dataset as follows:
+```bibtex
+@misc{faysse2024colpaliefficientdocumentretrieval,
+  title={ColPali: Efficient Document Retrieval with Vision Language Models},
+  author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo},
+  year={2024},
+  eprint={2407.01449},
+  archivePrefix={arXiv},
+  primaryClass={cs.IR},
+  url={https://arxiv.org/abs/2407.01449},
+}
+```