Spaces:

vidore
/

README

Running

App Files Files Community

HugSib commited on Jun 24, 2024

Commit

55c51c6

verified ·

1 Parent(s): 9cee48c

Update README.md

Browse files

Files changed (1) hide show

README.md +23 -9

README.md CHANGED Viewed

@@ -1,20 +1,34 @@
----
-title: README
-emoji: 👀
-colorFrom: indigo
-colorTo: red
-sdk: static
-pinned: true
----
 # 👀ColPali: Efficient Document Retrieval with Vision Language Models
 Visualisation?
 ## Description
-[Add Abstract]
 ## Organisation
 - **Datasets**: [add description of each collection + link]
 - **Models**: [add description of released model]
 ## Autorship + Citation

+---
+title: README
+emoji: 👀
+colorFrom: indigo
+colorTo: red
+sdk: static
+pinned: true
+---
 # 👀ColPali: Efficient Document Retrieval with Vision Language Models
 Visualisation?
 ## Description
+This Organisation contains all artefacts released with the paper ColPali: Efficient Document Retrieval with Vision Language Models. [[add link to arxiv]](), including datasets and models.
+### Abstract
+Documents are visually rich structures that convey information through text, as well as tables, figures, page layouts, or fonts.
+While modern document retrieval systems exhibit strong performance on query-to-text matching, they struggle to exploit visual cues efficiently, hindering their performance on practical document retrieval applications such as Retrieval Augmented Generation.
+To benchmark current systems on visually rich document retrieval, we introduce the Visual Document Retrieval Benchmark *ViDoRe*, composed of various page-level retrieving tasks spanning multiple domains, languages, and settings.
+The inherent shortcomings of modern systems motivate the introduction of a new retrieval model architecture, *ColPali*, which leverages the document understanding capabilities of recent Vision Language Models to produce high-quality contextualized embeddings solely from images of document pages.
+Combined with a late interaction matching mechanism, *ColPali* largely outperforms modern document retrieval pipelines while being drastically faster and end-to-end trainable.
 ## Organisation
 - **Datasets**: [add description of each collection + link]
+  - [*ViDoRe Benchmark*](https://huggingface.co/collections/vidore/vidore-benchmark-667173f98e70a1c0fa4db00d): collection regrouping all datasets constituting the ViDoRe benchmark. It includes the test sets from different academic
+    datasets (ArXiVQA, DocVQA, InfoVQA, TATDQA, TabFQuAD) and from datasets synthetically generated spanning various themes and industrial application:
+    (Artificial Intelligence, Government Reports, Healthcare Industry, Energy and Shift Project).
+  - [*OCR Baseline*](https://huggingface.co/collections/vidore/vidore-chunk-ocr-baseline-666acce88c294ef415548a56)
+  - [*Captioning Baseline*](https://huggingface.co/collections/vidore/vidore-captioning-baseline-6658a2a62d857c7a345195fd)
 - **Models**: [add description of released model]
 ## Autorship + Citation