| | Benchmark |
| | |
| |
|
| | We provide scripts for evaluating and training models on task datasets. The following benchmark results are included for reference. |
| |
|
| |
|
| | ALBEF |
| | ******* |
| | .. list-table:: |
| | :widths: 30 80 20 |
| |
|
| | * - **Pretraining** |
| | - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__) |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/pretrain.sh>`__ |
| | * - |
| | - Visual Genome (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_vg.py>`__) |
| | - |
| | * - |
| | - SBU (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_sbu.py>`__) |
| | - |
| | * - |
| | - CC3M (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/DownloadConceptualCaptions/download_data_cc3m.py>`__) |
| | - |
| | * - |
| | - CC12M (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/DownloadConceptualCaptions/download_data_cc12m.py>`__) |
| | - |
| |
|
| | .. list-table:: |
| | :widths: 30 40 20 20 20 30 30 |
| | :header-rows: 1 |
| |
|
| | * - |
| | - **Retrieval** |
| | - **R1** |
| | - **R5** |
| | - **R10** |
| | - **Training** |
| | - **Evaluation** |
| | * - TR |
| | - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__) |
| | - 77.6 |
| | - 94.1 |
| | - 97.2 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_coco_retrieval_albef.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_coco_retrieval.sh>`__ |
| | * - IR |
| | - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__) |
| | - 61.0 |
| | - 84.5 |
| | - 90.7 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_coco_retrieval_albef.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_coco_retrieval.sh>`__ |
| | * - TR |
| | - Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__) |
| | - 77.6 |
| | - 94.1 |
| | - 97.2 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_flickr30k_retrieval_albef.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_flickr30k_retrieval.sh>`__ |
| | * - IR |
| | - Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__) |
| | - 61.0 |
| | - 84.5 |
| | - 90.7 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_flickr30k_retrieval_albef.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_flickr30k_retrieval.sh>`__ |
| |
|
| |
|
| | .. list-table:: |
| | :widths: 20 20 20 20 20 |
| | :header-rows: 1 |
| |
|
| | * - **VQA** |
| | - **test-dev** |
| | - **test-std/test** |
| | - **Training** |
| | - **Evaluation** |
| | * - VQAv2 (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__) |
| | - 76.35 |
| | - 76.54 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_vqa_albef.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/test_albef_vqa.sh>`__ |
| | * - OKVQA (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__) |
| | - NA |
| | - 54.7 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_okvqa_albef.sh>`__ |
| | - NA |
| | * - AOKVQA (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__) |
| | - 54.5 |
| | - NA |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_aokvqa_albef.sh>`__ |
| | - NA |
| |
|
| | |
| | .. list-table:: |
| | :widths: 20 20 20 20 20 |
| | :header-rows: 1 |
| |
|
| | * - **Multimodal Classification** |
| | - **val** |
| | - **test** |
| | - **Training** |
| | - **Evaluation** |
| | * - SNLI-VE (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__) |
| | - 80.60 |
| | - 81.04 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_ve_albef.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_albef_ve.sh>`__ |
| | * - NLVR2 (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__) |
| | - 82.47 |
| | - 82.91 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_nlvr_albef.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_albef_nlvr.sh>`__ |
| | |
| | BLIP |
| | ******* |
| | .. list-table:: |
| | :widths: 30 80 20 |
| |
|
| | * - **Pretraining (14M)** |
| | - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__) |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/pretrain.sh>`__ |
| | * - |
| | - Visual Genome (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_vg.py>`__) |
| | - |
| | * - |
| | - SBU (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_sbu.py>`__) |
| | - |
| | * - |
| | - CC3M (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/DownloadConceptualCaptions/download_data_cc3m.py>`__) |
| | - |
| | * - |
| | - CC12M (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/DownloadConceptualCaptions/download_data_cc12m.py>`__) |
| | - |
| |
|
| | .. list-table:: |
| | :widths: 30 40 20 20 20 30 30 |
| | :header-rows: 1 |
| |
|
| | * - **Tasks** |
| | - **Retrieval** |
| | - **R1** |
| | - **R5** |
| | - **R10** |
| | - **Training** |
| | - **Evaluation** |
| | * - TR |
| | - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__) |
| | - 82.0 |
| | - 95.8 |
| | - 98.1 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_retrieval_coco.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_ret_coco.sh>`__ |
| | * - IR |
| | - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__) |
| | - 64.5 |
| | - 86.0 |
| | - 91.7 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_retrieval_coco.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_ret_coco.sh>`__ |
| | * - TR |
| | - Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__) |
| | - 96.9 |
| | - 99.9 |
| | - 100.0 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_retrieval_flickr.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_ret_flickr.sh>`__ |
| | * - IR |
| | - Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__) |
| | - 87.5 |
| | - 97.6 |
| | - 98.9 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_retrieval_flickr.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_ret_flickr.sh>`__ |
| |
|
| |
|
| | .. list-table:: |
| | :widths: 20 20 20 20 20 |
| | :header-rows: 1 |
| |
|
| | * - **VQA** |
| | - **test-dev** |
| | - **test-std/test** |
| | - **Training** |
| | - **Evaluation** |
| | * - VQAv2 (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__) |
| | - 78.23 |
| | - 78.29 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_vqa_albef.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/test_albef_vqa.sh>`__ |
| | * - OKVQA (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__) |
| | - NA |
| | - 55.4 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_okvqa.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_okvqa.sh>`__ |
| | * - AOKVQA (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__) |
| | - 56.2 |
| | - 50.1 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_aokvqa.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_aokvqa.sh>`__ |
| |
|
| |
|
| | .. list-table:: |
| | :widths: 20 20 20 20 20 20 |
| | :header-rows: 1 |
| |
|
| | * - **Image Captioning** |
| | - **BLEU@4** |
| | - **CIDEr** |
| | - **SPICE** |
| | - **Training** |
| | - **Evaluation** |
| | * - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__) |
| | - 39.9 |
| | - 133.5 |
| | - 23.7 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_caption_coco.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_coco_cap.sh>`__ |
| | * - NoCaps (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_nocaps.py>`__) |
| | - 31.9 |
| | - 109.1 |
| | - 14.7 |
| | - NA |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_nocaps.sh>`__ |
| |
|
| |
|
| | .. list-table:: |
| | :widths: 20 20 20 20 20 |
| | :header-rows: 1 |
| |
|
| | * - **Multimodal Classification** |
| | - **val** |
| | - **test** |
| | - **Training** |
| | - **Evaluation** |
| | * - NLVR2 (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__) |
| | - 82.48 |
| | - 83.25 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_nlvr.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_nlvr.sh>`__ |
| |
|
| | CLIP |
| | ******* |
| | .. list-table:: |
| | :widths: 30 40 20 20 20 30 |
| | :header-rows: 1 |
| |
|
| | * - **Tasks** |
| | - **Retrieval (Zero-shot)** |
| | - **R1** |
| | - **R5** |
| | - **R10** |
| | - **Evaluation** |
| | * - TR |
| | - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__) |
| | - 57.2 |
| | - 80.5 |
| | - 87.8 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_ret_coco.sh>`__ |
| | * - IR |
| | - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__) |
| | - 36.5 |
| | - 60.8 |
| | - 71.0 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_ret_coco.sh>`__ |
| | * - TR |
| | - Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__) |
| | - 86.5 |
| | - 98.0 |
| | - 99.1 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_ret_flickr.sh>`__ |
| | * - IR |
| | - Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__) |
| | - 67.0 |
| | - 88.9 |
| | - 93.3 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_ret_flickr.sh>`__ |
| |
|
| | .. list-table:: |
| | :widths: 20 20 20 |
| | :header-rows: 1 |
| |
|
| | * - **Multimodal Classification** |
| | - **val** |
| | - **Evaluation** |
| | * - ImageNet |
| | - 76.5 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_zs_imnet.sh>`__ |
| |
|
| |
|
| | ALPRO |
| | ******* |
| | .. list-table:: |
| | :widths: 30 40 20 20 20 20 30 |
| | :header-rows: 1 |
| |
|
| | * - **Tasks** |
| | - **Retrieval** |
| | - **R1** |
| | - **R5** |
| | - **R10** |
| | - **Training** |
| | - **Evaluation** |
| | * - TR |
| | - MSRVTT (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_msrvtt.py>`__) |
| | - 33.2 |
| | - 60.5 |
| | - 71.7 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_msrvtt_ret.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_msrvtt_ret.sh>`__ |
| | * - VR |
| | - MSRVTT (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_msrvtt.py>`__) |
| | - 33.8 |
| | - 61.4 |
| | - 72.7 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_msrvtt_ret.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_msrvtt_ret.sh>`__ |
| | * - TR |
| | - DiDeMo (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_didemo.py>`__) |
| | - 38.8 |
| | - 66.4 |
| | - 76.8 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_didemo_ret.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_didemo_ret.sh>`__ |
| | * - VR |
| | - DiDeMo (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_didemo.py>`__) |
| | - 36.6 |
| | - 67.5 |
| | - 77.9 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_didemo_ret.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_didemo_ret.sh>`__ |
| |
|
| | .. list-table:: |
| | :widths: 20 20 20 20 |
| | :header-rows: 1 |
| |
|
| | * - **Video QA** |
| | - **test** |
| | - **Training** |
| | - **Evaluation** |
| | * - MSRVTT |
| | - 42.1 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_msrvtt_qa.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_msrvtt_qa.sh>`__ |
| | * - MSVD |
| | - 46.0 |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_msvd_qa.sh>`__ |
| | - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_msvd_qa.sh>`__ |