File size: 13,321 Bytes
4848a9b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
# conda activate VisRAG

# To-Do List
========================================================================
1. Baseline: clear model to clear, degra, real datasets (Ret -> Gen)
2. Training Code: dataloader, synthetic function

# Environments
=========================================================================
git clone https://github.com/OpenBMB/VisRAG.git
conda create --name VisRAG python==3.10.8
conda activate VisRAG
conda install nvidia/label/cuda-11.8.0::cuda-toolkit
cd VisRAG
pip install -r requirements.txt
pip install -e .
cd timm_modified
pip install -e .
pip skimage
pip install peft=0.7.1
cd ..
% Download & Change cache_dir
/home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/src/openmatch/arguments.py L136.
% soft link pre-trained models
ln -s /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/* /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints

# Dataset format
========================================================================= 
qrels: ['query_id', 'corpus_id', 'score']
* clear: (queries, qrels, corpus)
openbmb/VisRAG-Ret-Test-PlotQA 
openbmb/VisRAG-Ret-Test-SlideVQA 
openbmb/VisRAG-Ret-Test-InfoVQA 
openbmb/VisRAG-Ret-Test-ArxivQA 
openbmb/VisRAG-Ret-Test-ChartQA 
openbmb/VisRAG-Ret-Test-MP-DocVQA

* synthetic-degradation: (corpus) -> !!! please use clear queries and qrels
openbmb/VisRAG-Ret-Test-PlotQA 
openbmb/VisRAG-Ret-Test-SlideVQA 
openbmb/VisRAG-Ret-Test-InfoVQA 
openbmb/VisRAG-Ret-Test-ArxivQA 
openbmb/VisRAG-Ret-Test-ChartQA 
openbmb/VisRAG-Ret-Test-MP-DocVQA

* real-world: (queries, qrels, corpus)
rweics5cs7/exo7-realworld-db-combined (RVL-CDIP-clear) 
rweics5cs7/exo7-realworld-db-combined-deg (RVL-CDIP-degra) 
rweics5cs7/exo7-realworld-db-combined-deg-fixed (RVL-CDIP-filter) 
rweics5cs7/exo9-realworld-db-combined (MP-DocVQA) 
rweics5cs7/exo10-realworld-db-combined (ArxivQA) 
========================================================================= 

# Training
========================================================================= 
1. Template:
    # MAX_SEQ_LEN, PER_DEV_BATCH_SIZE, GPUS_PER_NODE, SOFTMAX_TEMPERATURE, EPOCH, QUERY_INSTRUCTION, CORPUS_INSTRUCTION, DEEPSPEED, LR, MAPPING, POOLING, ATTENTION, NPASSAGE, GRADCACHE, GRADCACHE_MICRO, PASSAGE_STOP_GRAD, MODEL_PATH, DATASET_PATH, SYNTHETIC_DISTORTION TAG LoRA
    bash scripts/train_retriever/train.sh 2048 32 8 0.02 5 true false config/deepspeed.json 1e-5 true wmean causal 1 true 4 false ./checkpoints/openbmb/VisRAG-Ret all true Robust false

    bash scripts/train_retriever/train.sh 2048 32 8 0.02 2 true false config/deepspeed.json 1e-5 true wmean causal 1 true 4 false /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/Robust-train-2025-11-09-181056-lr-1e-5-temp-0.02-bsz32-ngpus8-nnodes1-inbatch--nepoch-3-pooling-wmean-attention-causal-qinstruct-true-cinstruct-false-gradcache-true-passage-stopgrad-false-npassage-1/checkpoint-2500 all true RobustRet false


    ## Configuration
    ### Model: 
        source: /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/
        "SigLIP": "google/SigLIP", "CPM-2B", "VisRAG": "openbmb/VisRAG-Ret"
    ### Data: (try real)
        1. Datasets: "out_domain", "in_domain", "real", "all", "openbmb/VisRAG-Ret-Train-In-domain-data", "openbmb/VisRAG-Ret-Train-Synthetic-data" or any huggingface data.
        2. Loader: MAPPING=True, use_mapping_dataset -> MappingMMDRTrainDataset  (not use StreamMMDRTrainDataset (only support single dataset))
            * columns = [image, source, query]
        #### -> If you wish to train using your own datasets, remove the `--from_hf_repo` line from the `train.sh` script. Additionally, ensure that your dataset directory contains a `metadata.json` file, which must include a `length` field specifying the total number of samples in the dataset.

    * src/openmatch/driver/train.py


# Evaluation
=========================================================================
## Ret: 
====================
* source /home/work/shared-fi-datasets-01/users/hsiang.chen/.bashrc && conda activate VisRAG && cd /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG
### Testing for baseline
1. (Clear)
    bash scripts/eval_retriever/eval.sh 512 2048 16 1 wmean causal ArxivQA,ChartQA,MP-DocVQA,InfoVQA,PlotQA,SlideVQA /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/openbmb/VisRAG-Ret 0 ./results/retrieval_clear false
2. (Degra)
    bash scripts/eval_retriever/eval_degra.sh 512 2048 16 1 wmean causal ArxivQA,ChartQA,MP-DocVQA,InfoVQA,PlotQA,SlideVQA /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/openbmb/VisRAG-Ret 0 ./results/retrieval_clear false
2. (Real-World)
    * exo7-realworld-db-combined-deg (rvl-cdip), exo9-realworld-db-combined (MP-DocVQA), exo10-realworld-db-combined (ArxivQA)
    bash scripts/eval_retriever/eval_real.sh 512 2048 16 1 wmean causal exo7-realworld-db-combined-deg,exo9-realworld-db-combined,exo10-realworld-db-combined /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/openbmb/VisRAG-Ret 0 ./results/retrieval_clear false

rweics5cs7/exo7-realworld-db-combined-deg-fixed

bash scripts/eval_retriever/eval_real.sh 512 2048 16 4 wmean causal exo7-realworld-db-combined-deg /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/openbmb/VisRAG-Ret 0 ./results/retrieval_clear false 

bash scripts/eval_retriever/eval_real.sh 512 2048 16 4 wmean causal exo7-realworld-db-combined-deg /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/fft_syn 0 ./results/retrieval_fft false 

bash scripts/eval_retriever/eval_real.sh 512 2048 16 4 wmean causal exo7-realworld-db-combined-deg /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/lora 0 ./results/retrieval_lora true

### Testing for SFT
1. (clear)
    bash scripts/eval_retriever/eval.sh 512 2048 16 1 wmean causal ArxivQA,ChartQA,MP-DocVQA,InfoVQA,PlotQA,SlideVQA /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/fft_syn 0 ./results/retrieval_fft false
2. (degra)
    bash scripts/eval_retriever/eval_degra.sh 512 2048 16 1 wmean causal ArxivQA,ChartQA,MP-DocVQA,InfoVQA,PlotQA,SlideVQA /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/fft_syn 0 ./results/retrieval_fft false
3. (real-world)
    bash scripts/eval_retriever/eval_real.sh 512 2048 16 1 wmean causal exo7-realworld-db-combined-deg,exo9-realworld-db-combined,exo10-realworld-db-combined /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/fft_syn 0 ./results/retrieval_fft false



### Testing for LoRA
1. (clear)
    bash scripts/eval_retriever/eval.sh 512 2048 16 1 wmean causal ArxivQA,ChartQA,MP-DocVQA,InfoVQA,PlotQA,SlideVQA /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/lora 0 ./results/retrieval_lora true
2. (degra)
    bash scripts/eval_retriever/eval_degra.sh 512 2048 16 1 wmean causal ArxivQA,ChartQA,MP-DocVQA,InfoVQA,PlotQA,SlideVQA /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/lora 0 ./results/retrieval_lora true
3. (real-world)
    bash scripts/eval_retriever/eval_real.sh 512 2048 16 1 wmean causal exo7-realworld-db-combined-deg,exo9-realworld-db-combined,exo10-realworld-db-combined /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/lora 0 ./results/retrieval_lora true


### Testing for Ours
1. (clear)
    bash scripts/eval_retriever/eval.sh 512 2048 16 1 wmean causal ChartQA,ArxivQA,MP-DocVQA,InfoVQA,PlotQA,SlideVQA /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/Robust-train-2025-11-09-181056-lr-1e-5-temp-0.02-bsz32-ngpus8-nnodes1-inbatch--nepoch-3-pooling-wmean-attention-causal-qinstruct-true-cinstruct-false-gradcache-true-passage-stopgrad-false-npassage-1/checkpoint-2500 0 ./results/retrieval_robust false
2. (degra)
    bash scripts/eval_retriever/eval_degra.sh 512 2048 16 1 wmean causal ChartQA,ArxivQA,MP-DocVQA,InfoVQA,PlotQA,SlideVQA /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/Robust-train-2025-11-09-181056-lr-1e-5-temp-0.02-bsz32-ngpus8-nnodes1-inbatch--nepoch-3-pooling-wmean-attention-causal-qinstruct-true-cinstruct-false-gradcache-true-passage-stopgrad-false-npassage-1/checkpoint-2500 0 ./results/retrieval_robust false
3. (real-world)
    bash scripts/eval_retriever/eval_real.sh 512 2048 16 1 wmean causal exo7-realworld-db-combined-deg,exo9-realworld-db-combined,exo10-realworld-db-combined /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/Robust-train-2025-11-09-181056-lr-1e-5-temp-0.02-bsz32-ngpus8-nnodes1-inbatch--nepoch-3-pooling-wmean-attention-causal-qinstruct-true-cinstruct-false-gradcache-true-passage-stopgrad-false-npassage-1/checkpoint-2500 0 ./results/retrieval_robust false

    bash scripts/eval_retriever/eval_real.sh 512 2048 16 1 wmean causal exo7-realworld-db-combined-deg,exo9-realworld-db-combined,exo10-realworld-db-combined /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/Robust-train-2025-11-09-181056-lr-1e-5-temp-0.02-bsz32-ngpus8-nnodes1-inbatch--nepoch-3-pooling-wmean-attention-causal-qinstruct-true-cinstruct-false-gradcache-true-passage-stopgrad-false-npassage-1/checkpoint-2500 0 ./results/retrieval_robust false

-> USE /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/stastic_ret.py stastic results.
======================


## Gen: (should run Ret first and get the embedding and reterival results)
======================
1. (Clear)
python scripts/generate/generate.py \
    --model_name MiniCPMV2.6 \
    --model_name_or_path /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/openbmb/MiniCPM-V-2_6 \
    --dataset_name ChartQA \
    --dataset_name_or_path openbmb/VisRAG-Ret-Test-ChartQA \
    --rank 0 \
    --world_size 1 \
    --topk 3 \
    --results_root_dir /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/data/checkpoints/clear \
    --task_type multi_image \
    --concatenate_type horizontal \
    --output_dir /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/data/checkpoints/generate 

    * model_name: ['MiniCPM', 'MiniCPMV2.0', 'MiniCPMV2.6', 'gpt4o']
    * model_name_or_path: /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/openbmb/MiniCPM-V-2_6
    * dataset_name: ['ArxivQA', 'ChartQA', 'PlotQA', 'MP-DocVQA', 'SlideVQA', 'InfoVQA']
    * dataset_name_or_path: 
    * task_type: ['text', 'page_concatenation', 'weighted_selection', 'multi_image']
    * concatenate: ['horizontal', 'vertical']

2. (Ret)
python scripts/generate/generate.py \
    --model_name MiniCPMV2.6 \
    --model_name_or_path /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/openbmb/MiniCPM-V-2_6 \
    --dataset_name ChartQA \
    --dataset_name_or_path openbmb/VisRAG-Ret-Test-ChartQA \
    --rank 0 \
    --world_size 1 \
    --topk 3 \
    --results_root_dir /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/data/checkpoints/clear \
    --task_type multi_image \
    --concatenate_type horizontal \
    --output_dir /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/data/checkpoints/generate 



# config
====================================================================
Evaluating: ArxivQA
This dataset result dir: /data/checkpoints/eval-2025-09-28-182703-maxq-512-maxp-2048-bsz-16-pooling-wmean-attention-causal-gpus-per-node-1/ArxivQA
CORPUS_PATH: openbmb/VisRAG-Ret-Test-ArxivQA
QUERY_PATH: openbmb/VisRAG-Ret-Test-ArxivQA
QRELS_PATH: openbmb/VisRAG-Ret-Test-ArxivQA
====================================================================================================
model_args: ModelArguments(model_name_or_path='/home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/VisRAG-Ret', target_model_path=None, config_name=None, tokenizer_name=None, processor_name=None, cache_dir='/home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo', untie_encoder=False, feature='last_hidden_state', pooling='wmean', attention='causal', add_linear_head=False, projection_in_dim=768, projection_out_dim=768, dtype='float16', encoder_only=False, pos_token=None, neg_token=None, normalize=True, lora=False, lora_r=32, attn_implementation='sdpa')

data_args: DataArguments(train_dir=None, train_path=None, eval_path=None, qrels_path='openbmb/VisRAG-Ret-Test-MP-DocVQA', query_path='openbmb/VisRAG-Ret-Test-MP-DocVQA', corpus_path='openbmb/VisRAG-Ret-Test-MP-DocVQA', data_dir=None, train_n_passages=8, positive_passage_no_shuffle=False, negative_passage_no_shuffle=False, q_max_len=512, p_max_len=2048, query_instruction=True, corpus_instruction=False, data_cache_dir='/home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/Dataset', query_template='Represent this query for retrieving relevant documents: <query>', query_column_names=None, doc_template='<text>', doc_column_names=None, all_markers=None, encode_as_text_pair=False, from_hf_repo=True)