| Benchmark
|
|
|
|
|
| We provide scripts for evaluating and training models on task datasets. The following benchmark results are included for reference.
|
|
|
|
|
| ALBEF
|
| *******
|
| .. list-table::
|
| :widths: 30 80 20
|
|
|
| * - **Pretraining**
|
| - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/pretrain.sh>`__
|
| * -
|
| - Visual Genome (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_vg.py>`__)
|
| -
|
| * -
|
| - SBU (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_sbu.py>`__)
|
| -
|
| * -
|
| - CC3M (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/DownloadConceptualCaptions/download_data_cc3m.py>`__)
|
| -
|
| * -
|
| - CC12M (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/DownloadConceptualCaptions/download_data_cc12m.py>`__)
|
| -
|
|
|
| .. list-table::
|
| :widths: 30 40 20 20 20 30 30
|
| :header-rows: 1
|
|
|
| * -
|
| - **Retrieval**
|
| - **R1**
|
| - **R5**
|
| - **R10**
|
| - **Training**
|
| - **Evaluation**
|
| * - TR
|
| - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
| - 77.6
|
| - 94.1
|
| - 97.2
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_coco_retrieval_albef.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_coco_retrieval.sh>`__
|
| * - IR
|
| - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
| - 61.0
|
| - 84.5
|
| - 90.7
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_coco_retrieval_albef.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_coco_retrieval.sh>`__
|
| * - TR
|
| - Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__)
|
| - 77.6
|
| - 94.1
|
| - 97.2
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_flickr30k_retrieval_albef.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_flickr30k_retrieval.sh>`__
|
| * - IR
|
| - Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__)
|
| - 61.0
|
| - 84.5
|
| - 90.7
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_flickr30k_retrieval_albef.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_flickr30k_retrieval.sh>`__
|
|
|
|
|
| .. list-table::
|
| :widths: 20 20 20 20 20
|
| :header-rows: 1
|
|
|
| * - **VQA**
|
| - **test-dev**
|
| - **test-std/test**
|
| - **Training**
|
| - **Evaluation**
|
| * - VQAv2 (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
| - 76.35
|
| - 76.54
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_vqa_albef.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/test_albef_vqa.sh>`__
|
| * - OKVQA (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
| - NA
|
| - 54.7
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_okvqa_albef.sh>`__
|
| - NA
|
| * - AOKVQA (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
| - 54.5
|
| - NA
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_aokvqa_albef.sh>`__
|
| - NA
|
|
|
|
|
| .. list-table::
|
| :widths: 20 20 20 20 20
|
| :header-rows: 1
|
|
|
| * - **Multimodal Classification**
|
| - **val**
|
| - **test**
|
| - **Training**
|
| - **Evaluation**
|
| * - SNLI-VE (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
| - 80.60
|
| - 81.04
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_ve_albef.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_albef_ve.sh>`__
|
| * - NLVR2 (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
| - 82.47
|
| - 82.91
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_nlvr_albef.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_albef_nlvr.sh>`__
|
|
|
| BLIP
|
| *******
|
| .. list-table::
|
| :widths: 30 80 20
|
|
|
| * - **Pretraining (14M)**
|
| - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/pretrain.sh>`__
|
| * -
|
| - Visual Genome (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_vg.py>`__)
|
| -
|
| * -
|
| - SBU (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_sbu.py>`__)
|
| -
|
| * -
|
| - CC3M (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/DownloadConceptualCaptions/download_data_cc3m.py>`__)
|
| -
|
| * -
|
| - CC12M (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/DownloadConceptualCaptions/download_data_cc12m.py>`__)
|
| -
|
|
|
| .. list-table::
|
| :widths: 30 40 20 20 20 30 30
|
| :header-rows: 1
|
|
|
| * - **Tasks**
|
| - **Retrieval**
|
| - **R1**
|
| - **R5**
|
| - **R10**
|
| - **Training**
|
| - **Evaluation**
|
| * - TR
|
| - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
| - 82.0
|
| - 95.8
|
| - 98.1
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_retrieval_coco.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_ret_coco.sh>`__
|
| * - IR
|
| - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
| - 64.5
|
| - 86.0
|
| - 91.7
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_retrieval_coco.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_ret_coco.sh>`__
|
| * - TR
|
| - Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__)
|
| - 96.9
|
| - 99.9
|
| - 100.0
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_retrieval_flickr.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_ret_flickr.sh>`__
|
| * - IR
|
| - Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__)
|
| - 87.5
|
| - 97.6
|
| - 98.9
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_retrieval_flickr.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_ret_flickr.sh>`__
|
|
|
|
|
| .. list-table::
|
| :widths: 20 20 20 20 20
|
| :header-rows: 1
|
|
|
| * - **VQA**
|
| - **test-dev**
|
| - **test-std/test**
|
| - **Training**
|
| - **Evaluation**
|
| * - VQAv2 (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
| - 78.23
|
| - 78.29
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_vqa_albef.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/test_albef_vqa.sh>`__
|
| * - OKVQA (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
| - NA
|
| - 55.4
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_okvqa.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_okvqa.sh>`__
|
| * - AOKVQA (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
| - 56.2
|
| - 50.1
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_aokvqa.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_aokvqa.sh>`__
|
|
|
|
|
| .. list-table::
|
| :widths: 20 20 20 20 20 20
|
| :header-rows: 1
|
|
|
| * - **Image Captioning**
|
| - **BLEU@4**
|
| - **CIDEr**
|
| - **SPICE**
|
| - **Training**
|
| - **Evaluation**
|
| * - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
| - 39.9
|
| - 133.5
|
| - 23.7
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_caption_coco.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_coco_cap.sh>`__
|
| * - NoCaps (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_nocaps.py>`__)
|
| - 31.9
|
| - 109.1
|
| - 14.7
|
| - NA
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_nocaps.sh>`__
|
|
|
|
|
| .. list-table::
|
| :widths: 20 20 20 20 20
|
| :header-rows: 1
|
|
|
| * - **Multimodal Classification**
|
| - **val**
|
| - **test**
|
| - **Training**
|
| - **Evaluation**
|
| * - NLVR2 (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
| - 82.48
|
| - 83.25
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_nlvr.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_nlvr.sh>`__
|
|
|
| CLIP
|
| *******
|
| .. list-table::
|
| :widths: 30 40 20 20 20 30
|
| :header-rows: 1
|
|
|
| * - **Tasks**
|
| - **Retrieval (Zero-shot)**
|
| - **R1**
|
| - **R5**
|
| - **R10**
|
| - **Evaluation**
|
| * - TR
|
| - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
| - 57.2
|
| - 80.5
|
| - 87.8
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_ret_coco.sh>`__
|
| * - IR
|
| - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
| - 36.5
|
| - 60.8
|
| - 71.0
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_ret_coco.sh>`__
|
| * - TR
|
| - Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__)
|
| - 86.5
|
| - 98.0
|
| - 99.1
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_ret_flickr.sh>`__
|
| * - IR
|
| - Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__)
|
| - 67.0
|
| - 88.9
|
| - 93.3
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_ret_flickr.sh>`__
|
|
|
| .. list-table::
|
| :widths: 20 20 20
|
| :header-rows: 1
|
|
|
| * - **Multimodal Classification**
|
| - **val**
|
| - **Evaluation**
|
| * - ImageNet
|
| - 76.5
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_zs_imnet.sh>`__
|
|
|
|
|
| ALPRO
|
| *******
|
| .. list-table::
|
| :widths: 30 40 20 20 20 20 30
|
| :header-rows: 1
|
|
|
| * - **Tasks**
|
| - **Retrieval**
|
| - **R1**
|
| - **R5**
|
| - **R10**
|
| - **Training**
|
| - **Evaluation**
|
| * - TR
|
| - MSRVTT (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_msrvtt.py>`__)
|
| - 33.2
|
| - 60.5
|
| - 71.7
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_msrvtt_ret.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_msrvtt_ret.sh>`__
|
| * - VR
|
| - MSRVTT (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_msrvtt.py>`__)
|
| - 33.8
|
| - 61.4
|
| - 72.7
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_msrvtt_ret.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_msrvtt_ret.sh>`__
|
| * - TR
|
| - DiDeMo (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_didemo.py>`__)
|
| - 38.8
|
| - 66.4
|
| - 76.8
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_didemo_ret.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_didemo_ret.sh>`__
|
| * - VR
|
| - DiDeMo (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_didemo.py>`__)
|
| - 36.6
|
| - 67.5
|
| - 77.9
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_didemo_ret.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_didemo_ret.sh>`__
|
|
|
| .. list-table::
|
| :widths: 20 20 20 20
|
| :header-rows: 1
|
|
|
| * - **Video QA**
|
| - **test**
|
| - **Training**
|
| - **Evaluation**
|
| * - MSRVTT
|
| - 42.1
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_msrvtt_qa.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_msrvtt_qa.sh>`__
|
| * - MSVD
|
| - 46.0
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_msvd_qa.sh>`__
|
| - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_msvd_qa.sh>`__ |