Update README.md
Browse files
README.md
CHANGED
|
@@ -41,7 +41,8 @@ In the paper, We use MiniCPM-V 2.0, MiniCPM-V 2.6 and GPT-4o as the generators.
|
|
| 41 |
## Training
|
| 42 |
|
| 43 |
### VisRAG-Ret
|
| 44 |
-
Our training dataset of 362,110 Query-Document (Q-D) Pairs for **VisRAG-Ret** is comprised of train sets of openly available academic datasets (34%) and a synthetic dataset made up of pages from web-crawled PDF documents and augmented with VLM-generated (GPT-4o) pseudo-queries (66%).
|
|
|
|
| 45 |
|
| 46 |
### VisRAG-Gen
|
| 47 |
The generation part does not use any fine-tuning; we directly use off-the-shelf LLMs/VLMs for generation.
|
|
|
|
| 41 |
## Training
|
| 42 |
|
| 43 |
### VisRAG-Ret
|
| 44 |
+
Our training dataset of 362,110 Query-Document (Q-D) Pairs for **VisRAG-Ret** is comprised of train sets of openly available academic datasets (34%) and a synthetic dataset made up of pages from web-crawled PDF documents and augmented with VLM-generated (GPT-4o) pseudo-queries (66%). It can be found in the `VisRAG` Collection on Hugging Face, which is referenced at the beginning of this page.
|
| 45 |
+
|
| 46 |
|
| 47 |
### VisRAG-Gen
|
| 48 |
The generation part does not use any fine-tuning; we directly use off-the-shelf LLMs/VLMs for generation.
|