Update README.md
Browse files
README.md
CHANGED
|
@@ -20,7 +20,7 @@ DSE-Phi3-Vidore-ft is a bi-encoder model designed to encode document screenshots
|
|
| 20 |
The model, `Tevatron/dse-phi35-vidore-ft`, is trained using 1/10 of the `Tevatron/docmatix-ir` dataset, a variant of `HuggingFaceM4/Docmatix` specifically adapted for training PDF retrievers with Vision Language Models in open-domain question answering scenarios. For more information on dataset filtering and hard negative mining, refer to the [docmatix-ir](https://huggingface.co/datasets/Tevatron/docmatix-ir/blob/main/README.md) dataset page.
|
| 21 |
Followed by finetuning on the (vidore)[https://huggingface.co/datasets/vidore/colpali_train_set] training set. The checkpoint is warmed up by text retrieval and webpage retrieval.
|
| 22 |
|
| 23 |
-
For example, DSE-Phi3-
|
| 24 |
|
| 25 |
## How to train the model from scratch
|
| 26 |
|
|
|
|
| 20 |
The model, `Tevatron/dse-phi35-vidore-ft`, is trained using 1/10 of the `Tevatron/docmatix-ir` dataset, a variant of `HuggingFaceM4/Docmatix` specifically adapted for training PDF retrievers with Vision Language Models in open-domain question answering scenarios. For more information on dataset filtering and hard negative mining, refer to the [docmatix-ir](https://huggingface.co/datasets/Tevatron/docmatix-ir/blob/main/README.md) dataset page.
|
| 21 |
Followed by finetuning on the (vidore)[https://huggingface.co/datasets/vidore/colpali_train_set] training set. The checkpoint is warmed up by text retrieval and webpage retrieval.
|
| 22 |
|
| 23 |
+
For example, DSE-Phi3-Vidore-V2 achieves **82.9** nDCG@5 on [ViDoRE](https://huggingface.co/spaces/vidore/vidore-leaderboard) leaderboard.
|
| 24 |
|
| 25 |
## How to train the model from scratch
|
| 26 |
|