| # conda activate VisRAG | |
| # To-Do List | |
| ======================================================================== | |
| 1. Baseline: clear model to clear, degra, real datasets (Ret -> Gen) | |
| 2. Training Code: dataloader, synthetic function | |
| # Environments | |
| ========================================================================= | |
| git clone https://github.com/OpenBMB/VisRAG.git | |
| conda create --name VisRAG python==3.10.8 | |
| conda activate VisRAG | |
| conda install nvidia/label/cuda-11.8.0::cuda-toolkit | |
| cd VisRAG | |
| pip install -r requirements.txt | |
| pip install -e . | |
| cd timm_modified | |
| pip install -e . | |
| pip skimage | |
| pip install peft=0.7.1 | |
| cd .. | |
| % Download & Change cache_dir | |
| /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/src/openmatch/arguments.py L136. | |
| % soft link pre-trained models | |
| ln -s /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/* /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints | |
| # Dataset format | |
| ========================================================================= | |
| qrels: ['query_id', 'corpus_id', 'score'] | |
| * clear: (queries, qrels, corpus) | |
| openbmb/VisRAG-Ret-Test-PlotQA | |
| openbmb/VisRAG-Ret-Test-SlideVQA | |
| openbmb/VisRAG-Ret-Test-InfoVQA | |
| openbmb/VisRAG-Ret-Test-ArxivQA | |
| openbmb/VisRAG-Ret-Test-ChartQA | |
| openbmb/VisRAG-Ret-Test-MP-DocVQA | |
| * synthetic-degradation: (corpus) -> !!! please use clear queries and qrels | |
| openbmb/VisRAG-Ret-Test-PlotQA | |
| openbmb/VisRAG-Ret-Test-SlideVQA | |
| openbmb/VisRAG-Ret-Test-InfoVQA | |
| openbmb/VisRAG-Ret-Test-ArxivQA | |
| openbmb/VisRAG-Ret-Test-ChartQA | |
| openbmb/VisRAG-Ret-Test-MP-DocVQA | |
| * real-world: (queries, qrels, corpus) | |
| rweics5cs7/exo7-realworld-db-combined (RVL-CDIP-clear) | |
| rweics5cs7/exo7-realworld-db-combined-deg (RVL-CDIP-degra) | |
| rweics5cs7/exo7-realworld-db-combined-deg-fixed (RVL-CDIP-filter) | |
| rweics5cs7/exo9-realworld-db-combined (MP-DocVQA) | |
| rweics5cs7/exo10-realworld-db-combined (ArxivQA) | |
| ========================================================================= | |
| # Training | |
| ========================================================================= | |
| 1. Template: | |
| # MAX_SEQ_LEN, PER_DEV_BATCH_SIZE, GPUS_PER_NODE, SOFTMAX_TEMPERATURE, EPOCH, QUERY_INSTRUCTION, CORPUS_INSTRUCTION, DEEPSPEED, LR, MAPPING, POOLING, ATTENTION, NPASSAGE, GRADCACHE, GRADCACHE_MICRO, PASSAGE_STOP_GRAD, MODEL_PATH, DATASET_PATH, SYNTHETIC_DISTORTION TAG LoRA | |
| bash scripts/train_retriever/train.sh 2048 32 8 0.02 5 true false config/deepspeed.json 1e-5 true wmean causal 1 true 4 false ./checkpoints/openbmb/VisRAG-Ret all true Robust false | |
| bash scripts/train_retriever/train.sh 2048 32 8 0.02 2 true false config/deepspeed.json 1e-5 true wmean causal 1 true 4 false /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/Robust-train-2025-11-09-181056-lr-1e-5-temp-0.02-bsz32-ngpus8-nnodes1-inbatch--nepoch-3-pooling-wmean-attention-causal-qinstruct-true-cinstruct-false-gradcache-true-passage-stopgrad-false-npassage-1/checkpoint-2500 all true RobustRet false | |
| ## Configuration | |
| ### Model: | |
| source: /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/ | |
| "SigLIP": "google/SigLIP", "CPM-2B", "VisRAG": "openbmb/VisRAG-Ret" | |
| ### Data: (try real) | |
| 1. Datasets: "out_domain", "in_domain", "real", "all", "openbmb/VisRAG-Ret-Train-In-domain-data", "openbmb/VisRAG-Ret-Train-Synthetic-data" or any huggingface data. | |
| 2. Loader: MAPPING=True, use_mapping_dataset -> MappingMMDRTrainDataset (not use StreamMMDRTrainDataset (only support single dataset)) | |
| * columns = [image, source, query] | |
| #### -> If you wish to train using your own datasets, remove the `--from_hf_repo` line from the `train.sh` script. Additionally, ensure that your dataset directory contains a `metadata.json` file, which must include a `length` field specifying the total number of samples in the dataset. | |
| * src/openmatch/driver/train.py | |
| # Evaluation | |
| ========================================================================= | |
| ## Ret: | |
| ==================== | |
| * source /home/work/shared-fi-datasets-01/users/hsiang.chen/.bashrc && conda activate VisRAG && cd /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG | |
| ### Testing for baseline | |
| 1. (Clear) | |
| bash scripts/eval_retriever/eval.sh 512 2048 16 1 wmean causal ArxivQA,ChartQA,MP-DocVQA,InfoVQA,PlotQA,SlideVQA /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/openbmb/VisRAG-Ret 0 ./results/retrieval_clear false | |
| 2. (Degra) | |
| bash scripts/eval_retriever/eval_degra.sh 512 2048 16 1 wmean causal ArxivQA,ChartQA,MP-DocVQA,InfoVQA,PlotQA,SlideVQA /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/openbmb/VisRAG-Ret 0 ./results/retrieval_clear false | |
| 2. (Real-World) | |
| * exo7-realworld-db-combined-deg (rvl-cdip), exo9-realworld-db-combined (MP-DocVQA), exo10-realworld-db-combined (ArxivQA) | |
| bash scripts/eval_retriever/eval_real.sh 512 2048 16 1 wmean causal exo7-realworld-db-combined-deg,exo9-realworld-db-combined,exo10-realworld-db-combined /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/openbmb/VisRAG-Ret 0 ./results/retrieval_clear false | |
| rweics5cs7/exo7-realworld-db-combined-deg-fixed | |
| bash scripts/eval_retriever/eval_real.sh 512 2048 16 4 wmean causal exo7-realworld-db-combined-deg /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/openbmb/VisRAG-Ret 0 ./results/retrieval_clear false | |
| bash scripts/eval_retriever/eval_real.sh 512 2048 16 4 wmean causal exo7-realworld-db-combined-deg /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/fft_syn 0 ./results/retrieval_fft false | |
| bash scripts/eval_retriever/eval_real.sh 512 2048 16 4 wmean causal exo7-realworld-db-combined-deg /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/lora 0 ./results/retrieval_lora true | |
| ### Testing for SFT | |
| 1. (clear) | |
| bash scripts/eval_retriever/eval.sh 512 2048 16 1 wmean causal ArxivQA,ChartQA,MP-DocVQA,InfoVQA,PlotQA,SlideVQA /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/fft_syn 0 ./results/retrieval_fft false | |
| 2. (degra) | |
| bash scripts/eval_retriever/eval_degra.sh 512 2048 16 1 wmean causal ArxivQA,ChartQA,MP-DocVQA,InfoVQA,PlotQA,SlideVQA /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/fft_syn 0 ./results/retrieval_fft false | |
| 3. (real-world) | |
| bash scripts/eval_retriever/eval_real.sh 512 2048 16 1 wmean causal exo7-realworld-db-combined-deg,exo9-realworld-db-combined,exo10-realworld-db-combined /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/fft_syn 0 ./results/retrieval_fft false | |
| ### Testing for LoRA | |
| 1. (clear) | |
| bash scripts/eval_retriever/eval.sh 512 2048 16 1 wmean causal ArxivQA,ChartQA,MP-DocVQA,InfoVQA,PlotQA,SlideVQA /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/lora 0 ./results/retrieval_lora true | |
| 2. (degra) | |
| bash scripts/eval_retriever/eval_degra.sh 512 2048 16 1 wmean causal ArxivQA,ChartQA,MP-DocVQA,InfoVQA,PlotQA,SlideVQA /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/lora 0 ./results/retrieval_lora true | |
| 3. (real-world) | |
| bash scripts/eval_retriever/eval_real.sh 512 2048 16 1 wmean causal exo7-realworld-db-combined-deg,exo9-realworld-db-combined,exo10-realworld-db-combined /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/lora 0 ./results/retrieval_lora true | |
| ### Testing for Ours | |
| 1. (clear) | |
| bash scripts/eval_retriever/eval.sh 512 2048 16 1 wmean causal ChartQA,ArxivQA,MP-DocVQA,InfoVQA,PlotQA,SlideVQA /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/Robust-train-2025-11-09-181056-lr-1e-5-temp-0.02-bsz32-ngpus8-nnodes1-inbatch--nepoch-3-pooling-wmean-attention-causal-qinstruct-true-cinstruct-false-gradcache-true-passage-stopgrad-false-npassage-1/checkpoint-2500 0 ./results/retrieval_robust false | |
| 2. (degra) | |
| bash scripts/eval_retriever/eval_degra.sh 512 2048 16 1 wmean causal ChartQA,ArxivQA,MP-DocVQA,InfoVQA,PlotQA,SlideVQA /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/Robust-train-2025-11-09-181056-lr-1e-5-temp-0.02-bsz32-ngpus8-nnodes1-inbatch--nepoch-3-pooling-wmean-attention-causal-qinstruct-true-cinstruct-false-gradcache-true-passage-stopgrad-false-npassage-1/checkpoint-2500 0 ./results/retrieval_robust false | |
| 3. (real-world) | |
| bash scripts/eval_retriever/eval_real.sh 512 2048 16 1 wmean causal exo7-realworld-db-combined-deg,exo9-realworld-db-combined,exo10-realworld-db-combined /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/Robust-train-2025-11-09-181056-lr-1e-5-temp-0.02-bsz32-ngpus8-nnodes1-inbatch--nepoch-3-pooling-wmean-attention-causal-qinstruct-true-cinstruct-false-gradcache-true-passage-stopgrad-false-npassage-1/checkpoint-2500 0 ./results/retrieval_robust false | |
| bash scripts/eval_retriever/eval_real.sh 512 2048 16 1 wmean causal exo7-realworld-db-combined-deg,exo9-realworld-db-combined,exo10-realworld-db-combined /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/checkpoints/Robust-train-2025-11-09-181056-lr-1e-5-temp-0.02-bsz32-ngpus8-nnodes1-inbatch--nepoch-3-pooling-wmean-attention-causal-qinstruct-true-cinstruct-false-gradcache-true-passage-stopgrad-false-npassage-1/checkpoint-2500 0 ./results/retrieval_robust false | |
| -> USE /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/stastic_ret.py stastic results. | |
| ====================== | |
| ## Gen: (should run Ret first and get the embedding and reterival results) | |
| ====================== | |
| 1. (Clear) | |
| python scripts/generate/generate.py \ | |
| --model_name MiniCPMV2.6 \ | |
| --model_name_or_path /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/openbmb/MiniCPM-V-2_6 \ | |
| --dataset_name ChartQA \ | |
| --dataset_name_or_path openbmb/VisRAG-Ret-Test-ChartQA \ | |
| --rank 0 \ | |
| --world_size 1 \ | |
| --topk 3 \ | |
| --results_root_dir /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/data/checkpoints/clear \ | |
| --task_type multi_image \ | |
| --concatenate_type horizontal \ | |
| --output_dir /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/data/checkpoints/generate | |
| * model_name: ['MiniCPM', 'MiniCPMV2.0', 'MiniCPMV2.6', 'gpt4o'] | |
| * model_name_or_path: /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/openbmb/MiniCPM-V-2_6 | |
| * dataset_name: ['ArxivQA', 'ChartQA', 'PlotQA', 'MP-DocVQA', 'SlideVQA', 'InfoVQA'] | |
| * dataset_name_or_path: | |
| * task_type: ['text', 'page_concatenation', 'weighted_selection', 'multi_image'] | |
| * concatenate: ['horizontal', 'vertical'] | |
| 2. (Ret) | |
| python scripts/generate/generate.py \ | |
| --model_name MiniCPMV2.6 \ | |
| --model_name_or_path /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/openbmb/MiniCPM-V-2_6 \ | |
| --dataset_name ChartQA \ | |
| --dataset_name_or_path openbmb/VisRAG-Ret-Test-ChartQA \ | |
| --rank 0 \ | |
| --world_size 1 \ | |
| --topk 3 \ | |
| --results_root_dir /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/data/checkpoints/clear \ | |
| --task_type multi_image \ | |
| --concatenate_type horizontal \ | |
| --output_dir /home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/VisRAG/data/checkpoints/generate | |
| # config | |
| ==================================================================== | |
| Evaluating: ArxivQA | |
| This dataset result dir: /data/checkpoints/eval-2025-09-28-182703-maxq-512-maxp-2048-bsz-16-pooling-wmean-attention-causal-gpus-per-node-1/ArxivQA | |
| CORPUS_PATH: openbmb/VisRAG-Ret-Test-ArxivQA | |
| QUERY_PATH: openbmb/VisRAG-Ret-Test-ArxivQA | |
| QRELS_PATH: openbmb/VisRAG-Ret-Test-ArxivQA | |
| ==================================================================================================== | |
| model_args: ModelArguments(model_name_or_path='/home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo/VisRAG-Ret', target_model_path=None, config_name=None, tokenizer_name=None, processor_name=None, cache_dir='/home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/ModelZoo', untie_encoder=False, feature='last_hidden_state', pooling='wmean', attention='causal', add_linear_head=False, projection_in_dim=768, projection_out_dim=768, dtype='float16', encoder_only=False, pos_token=None, neg_token=None, normalize=True, lora=False, lora_r=32, attn_implementation='sdpa') | |
| data_args: DataArguments(train_dir=None, train_path=None, eval_path=None, qrels_path='openbmb/VisRAG-Ret-Test-MP-DocVQA', query_path='openbmb/VisRAG-Ret-Test-MP-DocVQA', corpus_path='openbmb/VisRAG-Ret-Test-MP-DocVQA', data_dir=None, train_n_passages=8, positive_passage_no_shuffle=False, negative_passage_no_shuffle=False, q_max_len=512, p_max_len=2048, query_instruction=True, corpus_instruction=False, data_cache_dir='/home/work/shared-fi-datasets-01/users/hsiang.chen/Project/Robust/Dataset', query_template='Represent this query for retrieving relevant documents: <query>', query_column_names=None, doc_template='<text>', doc_column_names=None, all_markers=None, encode_as_text_pair=False, from_hf_repo=True) | |