Update README.md
Browse files
README.md
CHANGED
|
@@ -10,7 +10,7 @@ tags:
|
|
| 10 |
|
| 11 |
ColPali is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features.
|
| 12 |
It is a [PaliGemma-3B](https://huggingface.co/google/paligemma-3b-mix-448) extension that generates [ColBERT](https://arxiv.org/abs/2004.12832)- style multi-vector representations of text and images.
|
| 13 |
-
It was introduced in the paper [ColPali: Efficient Document Retrieval with Vision Language Models[
|
| 14 |
|
| 15 |
## Model Description
|
| 16 |
|
|
@@ -36,8 +36,12 @@ with `alpha=32` and `r=32` on the transformer layers from the language model,
|
|
| 36 |
as well as the final randomly initialized projection layer, and use a `paged_adamw_8bit` optimizer.
|
| 37 |
We train on an 8 GPU setup with data parallelism, a learning rate of 5e-5 with linear decay with 2.5% warmup steps, and a batch size of 32.
|
| 38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
-
## Intended uses & limitations
|
| 41 |
- **Focus**: The model primarily focuses on PDF-type documents and high-ressources languages, potentially limiting its generalization to other document types or less represented languages.
|
| 42 |
- **Support**: The model relies on multi-vector retreiving derived from the ColBERT late interaction mechanism, which may require engineering efforts to adapt to widely used vector retrieval frameworks that lack native multi-vector support.
|
| 43 |
|
|
|
|
| 10 |
|
| 11 |
ColPali is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features.
|
| 12 |
It is a [PaliGemma-3B](https://huggingface.co/google/paligemma-3b-mix-448) extension that generates [ColBERT](https://arxiv.org/abs/2004.12832)- style multi-vector representations of text and images.
|
| 13 |
+
It was introduced in the paper [ColPali: Efficient Document Retrieval with Vision Language Models[add link]]() and first released in [this repository](https://github.com/ManuelFay/colpali)
|
| 14 |
|
| 15 |
## Model Description
|
| 16 |
|
|
|
|
| 36 |
as well as the final randomly initialized projection layer, and use a `paged_adamw_8bit` optimizer.
|
| 37 |
We train on an 8 GPU setup with data parallelism, a learning rate of 5e-5 with linear decay with 2.5% warmup steps, and a batch size of 32.
|
| 38 |
|
| 39 |
+
## Intended uses
|
| 40 |
+
|
| 41 |
+
#TODO
|
| 42 |
+
|
| 43 |
+
## Limitations
|
| 44 |
|
|
|
|
| 45 |
- **Focus**: The model primarily focuses on PDF-type documents and high-ressources languages, potentially limiting its generalization to other document types or less represented languages.
|
| 46 |
- **Support**: The model relies on multi-vector retreiving derived from the ColBERT late interaction mechanism, which may require engineering efforts to adapt to widely used vector retrieval frameworks that lack native multi-vector support.
|
| 47 |
|