AaronCIH's picture
Upload folder using huggingface_hub
4f523e3 verified
# datasets
## load datasets(train)的方法:
```
from datasets import load_dataset
db = load_dataset(...)["train"]
for x in db:
# x 是一個 set{}, , e.g.
# {"corpus-id": "6519.png", "image": <PIL.PngImagePlugin.PngImageFile\
# image mode=RGBA size=1263x700 at 0x7F0303CD6AD0>}
...
```
## load datasets(test)的方法:
```
from datasets import load_dataset
dbcorpus = load_dataset(..., "corpus")["train"]
dbqrels = load_dataset(..., "qrels")["train"]
dbqueries = load_dataset(..., "queries")["train"]
```
## 如果是圖片集合
```
for x in dbcorpus:
# x 是一個 set{}, , e.g.
# {"corpus-id": "圖片的id", "image": <PIL.PngImagePlugin.PngImageFile\
# ex. {'corpus-id': '2010.05458_3.jpg', 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1122x551 at 0x7F57A3667790>}
...
for x in dbqrels:
# x 是一個 set{}, , e.g.
# {"query-id": "問題的id", "corpus-id": "圖片的id",}
# ex. {'query-id': '1508.06771_0.jpg-1', 'corpus-id': '1508.06771_0.jpg', 'score': 1}
...
for x in dbqueries:
# x 是一個 set{}, , e.g.
# {"query-id": "問題的id", "query": "問題", "answer":"問題的答案"}
# ex. {'query-id': '1508.06771_0.jpg-1',
'query': "Which statement best describes the relationship between the components labeled 'Myosin' and 'Actin filament'?", 'answer': 'C',
'options': ['A) Myosin binds directly to crosslinkers.', 'B) Actin filaments are independent of myosin.', 'C) Myosin heads are bound to actin filaments.', 'D) Crosslinkers prevent the interaction between myosin and actin filaments.'], 'is_numerical': 0}
...
```
## 如果是OCR資料集
```
for x in dbcorpus:
# x 是一個 set{}, , e.g.
# {"corpus-id": "6519.png", "text": "string to describe a photo"}
...
for x in dbqrels:
# x 是一個 set{}, , e.g.
# {"query-id": "問題的id", "corpus-id": "圖片的id",}
...
for x in dbqueries:
# x 是一個 set{}, , e.g.
# {"query-id": "問題的id", "query": "問題", "answer":"問題的答案"}
...
```
# Dataset
=========================================================================
Training:
openbmb/VisRAG-Ret-Train-In-domain-data # 122,752
openbmb/VisRAG-Ret-Train-Synthetic-data # 239,358
Testing:
--Clean:
openbmb/VisRAG-Ret-Test-PlotQA (corpus, qrels, queries)
openbmb/VisRAG-Ret-Test-SlideVQA (corpus, qrels, queries)
openbmb/VisRAG-Ret-Test-InfoVQA (corpus, qrels, queries)
openbmb/VisRAG-Ret-Test-ArxivQA (corpus, qrels, queries)
openbmb/VisRAG-Ret-Test-ChartQA (corpus, qrels, queries)
openbmb/VisRAG-Ret-Test-MP-DocVQA (corpus, qrels, queries)
--Degradation Image (using clean qrels & queries)
rweics5cs7/exo3-original-PlotQA-deg (corpus)
rweics5cs7/exo3-original-SlideVQA-deg (corpus)
rweics5cs7/exo3-original-InfoVQA-deg (corpus)
rweics5cs7/exo3-original-ArxivQA-deg (corpus)
rweics5cs7/exo3-original-ChartQA-deg (corpus)
rweics5cs7/exo3-original-MP-DocVQA-deg (corpus)
--Real-World
rweics5cs7/exo7-realworld-db-combined (corpus, qrels, queries) rvl cdip (3k) 乾淨的
rweics5cs7/exo7-realworld-db-combined-deg (corpus, qrels, queries) rvl cdip (REALWORLD) (3k) degraded
rweics5cs7/exo9-realworld-db-combined (corpus, qrels, queries) MP-DocVQA (REALWORLD) (741) degraded
rweics5cs7/exo10-realworld-db-combined (corpus, qrels, queries) ArxivQA (REALWORLD) (3000) degraded
=========================================================================
# Train datasets:
## arxiv, plotqa, ... 的122k的indomain資料集
```
load_dataset("openbmb/VisRAG-Ret-Train-In-domain-data", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## 合成的239k的資料集
```
load_dataset("openbmb/VisRAG-Ret-Train-Synthetic-data", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
# Test datasets: (每個test datasets分3個split)(有圖片版本 跟 OCR版本)
# 圖片版本
## 乾淨的PlotQA
```
load_dataset("openbmb/VisRAG-Ret-Test-PlotQA", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-PlotQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-PlotQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## 乾淨的SlideVQA
```
load_dataset("openbmb/VisRAG-Ret-Test-SlideVQA", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-SlideVQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-SlideVQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## 乾淨的InfoVQA
```
load_dataset("openbmb/VisRAG-Ret-Test-InfoVQA", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-InfoVQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-InfoVQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## 乾淨的ArxivQA
```
oad_dataset("openbmb/VisRAG-Ret-Test-ArxivQA", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ArxivQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ArxivQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## 乾淨的ChartQA
```
load_dataset("openbmb/VisRAG-Ret-Test-ChartQA", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ChartQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ChartQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## 乾淨的MP-DocVQA
```
load_dataset("openbmb/VisRAG-Ret-Test-MP-DocVQA", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-MP-DocVQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-MP-DocVQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## PlotQA (degraded(synthetic)), 跟乾淨的共用 "quels" 跟 "queries"
```
load_dataset("rweics5cs7/exo3-original-PlotQA-deg", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-PlotQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-PlotQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## SlideVQA (degraded(synthetic)), 跟乾淨的共用 "quels" 跟 "queries"
```
load_dataset("rweics5cs7/exo3-original-SlideVQA-deg", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-SlideVQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-SlideVQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## InfoVQA (degraded(synthetic)), 跟乾淨的共用 "quels" 跟 "queries"
```
load_dataset("rweics5cs7/exo3-original-InfoVQA-deg", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-InfoVQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-InfoVQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## ArxivQA (degraded(synthetic)), 跟乾淨的共用 "quels" 跟 "queries"
```
load_dataset("rweics5cs7/exo3-original-ArxivQA-deg", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ArxivQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ArxivQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## ChartQA (degraded(synthetic)), 跟乾淨的共用 "quels" 跟 "queries"
```
load_dataset("rweics5cs7/exo3-original-ChartQA-deg", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ChartQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ChartQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## MP-DocVQA (degraded(synthetic)), 跟乾淨的共用 "quels" 跟 "queries"
```
load_dataset("rweics5cs7/exo3-original-MP-DocVQA-deg", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-MP-DocVQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-MP-DocVQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## rvl cdip (3k) 乾淨的
```
load_dataset("rweics5cs7/exo7-realworld-db-combined", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo7-realworld-db-combined", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo7-realworld-db-combined", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## rvl cdip (REALWORLD) (3k) degraded realworld
```
load_dataset("rweics5cs7/exo7-realworld-db-combined-deg", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo7-realworld-db-combined-deg", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo7-realworld-db-combined-deg", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## MP-DocVQA (REALWORLD) (741) degraded realworld
```
load_dataset("rweics5cs7/exo9-realworld-db-combined", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo9-realworld-db-combined", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo9-realworld-db-combined", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## ArxivQA (REALWORLD) (3000) degraded realworld
```
load_dataset("rweics5cs7/exo10-realworld-db-combined", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo10-realworld-db-combined", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo10-realworld-db-combined", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
# OCR版本 (PPOCR-v5)
## 乾淨的PlotQA
```
load_dataset("rweics5cs7/exo3-original-PlotQA-text", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-PlotQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-PlotQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## 乾淨的SlideVQA
```
load_dataset("rweics5cs7/exo3-original-SlideVQA-text", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-SlideVQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-SlideVQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## 乾淨的InfoVQA
```
load_dataset("rweics5cs7/exo3-original-InfoVQA-text", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-InfoVQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-InfoVQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## 乾淨的ArxivQA
```
oad_dataset("rweics5cs7/exo3-original-ArxivQA-text", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ArxivQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ArxivQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## 乾淨的ChartQA
```
load_dataset("rweics5cs7/exo3-original-ChartQA-text", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ChartQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ChartQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## 乾淨的MP-DocVQA
```
load_dataset("rweics5cs7/exo3-original-MP-DocVQA-text", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-MP-DocVQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-MP-DocVQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## PlotQA (degraded(synthetic)), 跟乾淨的共用 "quels" 跟 "queries"
```
load_dataset("rweics5cs7/exo3-original-PlotQA-text-deg", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-PlotQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-PlotQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## SlideVQA (degraded(synthetic)), 跟乾淨的共用 "quels" 跟 "queries"
```
load_dataset("rweics5cs7/exo3-original-SlideVQA-text-deg", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-SlideVQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-SlideVQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## InfoVQA (degraded(synthetic)), 跟乾淨的共用 "quels" 跟 "queries"
```
load_dataset("rweics5cs7/exo3-original-InfoVQA-text-deg", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-InfoVQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-InfoVQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## ArxivQA (degraded(synthetic)), 跟乾淨的共用 "quels" 跟 "queries"
```
load_dataset("rweics5cs7/exo3-original-ArxivQA-text-deg", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ArxivQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ArxivQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## ChartQA (degraded(synthetic)), 跟乾淨的共用 "quels" 跟 "queries"
```
load_dataset("rweics5cs7/exo3-original-ChartQA-text-deg", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ChartQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ChartQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## MP-DocVQA (degraded(synthetic)), 跟乾淨的共用 "quels" 跟 "queries"
```
load_dataset("rweics5cs7/exo3-original-MP-DocVQA-text-deg", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-MP-DocVQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-MP-DocVQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## rvl cdip (3k) 乾淨的
```
load_dataset("rweics5cs7/exo8-realworld-db-combined-text", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo8-realworld-db-combined-text", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo8-realworld-db-combined-text", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## rvl cdip (REALWORLD) (3k) degraded realworld
```
load_dataset("rweics5cs7/exo8-realworld-db-combined-text-deg", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo8-realworld-db-combined-text-deg", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo8-realworld-db-combined-text-deg", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## MP-DocVQA (REALWORLD) (741) degraded realworld
```
load_dataset("rweics5cs7/exo9-realworld-db-combined-text", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo9-realworld-db-combined-text", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo9-realworld-db-combined-text", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## ArxivQA (REALWORLD) (3000) degraded realworld
```
load_dataset("rweics5cs7/exo10-realworld-db-combined-text", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo10-realworld-db-combined-text", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo10-realworld-db-combined-text", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
# OCR版本 (PPOCR-v3)
## 乾淨的PlotQA
```
load_dataset("rweics5cs7/exo3-original-PlotQA-text-v3", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-PlotQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-PlotQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## 乾淨的SlideVQA
```
load_dataset("rweics5cs7/exo3-original-SlideVQA-text-v3", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-SlideVQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-SlideVQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## 乾淨的InfoVQA
```
load_dataset("rweics5cs7/exo3-original-InfoVQA-text-v3", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-InfoVQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-InfoVQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## 乾淨的ArxivQA
```
oad_dataset("rweics5cs7/exo3-original-ArxivQA-text-v3", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ArxivQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ArxivQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## 乾淨的ChartQA
```
load_dataset("rweics5cs7/exo3-original-ChartQA-text-v3", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ChartQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ChartQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## 乾淨的MP-DocVQA
```
load_dataset("rweics5cs7/exo3-original-MP-DocVQA-text-v3", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-MP-DocVQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-MP-DocVQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## PlotQA (degraded(synthetic)), 跟乾淨的共用 "quels" 跟 "queries"
```
load_dataset("rweics5cs7/exo3-original-PlotQA-text-deg-v3", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-PlotQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-PlotQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## SlideVQA (degraded(synthetic)), 跟乾淨的共用 "quels" 跟 "queries"
```
load_dataset("rweics5cs7/exo3-original-SlideVQA-text-deg-v3", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-SlideVQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-SlideVQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## InfoVQA (degraded(synthetic)), 跟乾淨的共用 "quels" 跟 "queries"
```
load_dataset("rweics5cs7/exo3-original-InfoVQA-text-deg-v3", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-InfoVQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-InfoVQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## ArxivQA (degraded(synthetic)), 跟乾淨的共用 "quels" 跟 "queries"
```
load_dataset("rweics5cs7/exo3-original-ArxivQA-text-deg-v3", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ArxivQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ArxivQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## ChartQA (degraded(synthetic)), 跟乾淨的共用 "quels" 跟 "queries"
```
load_dataset("rweics5cs7/exo3-original-ChartQA-text-deg-v3", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ChartQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-ChartQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## MP-DocVQA (degraded(synthetic)), 跟乾淨的共用 "quels" 跟 "queries"
```
load_dataset("rweics5cs7/exo3-original-MP-DocVQA-text-deg-v3", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-MP-DocVQA", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("openbmb/VisRAG-Ret-Test-MP-DocVQA", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## rvl cdip (3k) 乾淨的
```
load_dataset("rweics5cs7/exo8-realworld-db-combined-text-v3", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo8-realworld-db-combined-text-v3", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo8-realworld-db-combined-text-v3", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## rvl cdip (REALWORLD) (3k) degraded realworld
```
load_dataset("rweics5cs7/exo8-realworld-db-combined-text-deg-v3", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo8-realworld-db-combined-text-deg-v3", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo8-realworld-db-combined-text-deg-v3", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## MP-DocVQA (REALWORLD) (741) degraded realworld
```
load_dataset("rweics5cs7/exo9-realworld-db-combined-text-v3", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo9-realworld-db-combined-text-v3", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo9-realworld-db-combined-text-v3", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```
## ArxivQA (REALWORLD) (3000) degraded realworld
```
load_dataset("rweics5cs7/exo10-realworld-db-combined-text-v3", "corpus", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo10-realworld-db-combined-text-v3", "qrels", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
load_dataset("rweics5cs7/exo10-realworld-db-combined-text-v3", "queries", cache_dir="/mnt/191/a/lyw/VisRAG/Alldatasets")["train"]
```