Spaces:
Configuration error
Configuration error
Upload 3 files
Browse files- .gitattributes +1 -0
- MastersThesis_475703.pdf +3 -0
- README.md +405 -10
- requirements.txt +164 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
MastersThesis_475703.pdf filter=lfs diff=lfs merge=lfs -text
|
MastersThesis_475703.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fcecb1b0a417a603e5848ccabbd8f5b65a27a0a3f2a0ff6c9969e1a99ba3c394
|
| 3 |
+
size 7791434
|
README.md
CHANGED
|
@@ -1,13 +1,408 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
---
|
| 12 |
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Robustness of Multi-Modal Foundational Models
|
| 2 |
+
|
| 3 |
+
Research code for evaluating the robustness of multi-modal foundational models (MMFMs) against adversarial attacks. This repository contains implementations for testing vision-language models like OpenFlamingo against sparse and non-sparse adversarial perturbations, as well as fine-tuning CLIP models on adversarial examples and COCO counterfactuals.
|
| 4 |
+
|
| 5 |
+
**Code adapted from:** [RobustVLM](https://github.com/chs20/RobustVLM)
|
| 6 |
+
|
| 7 |
+
## Table of Contents
|
| 8 |
+
|
| 9 |
+
- [Robustness of Multi-Modal Foundational Models](#robustness-of-multi-modal-foundational-models)
|
| 10 |
+
- [Table of Contents](#table-of-contents)
|
| 11 |
+
- [Prerequisites](#prerequisites)
|
| 12 |
+
- [Installation](#installation)
|
| 13 |
+
- [Dataset Setup](#dataset-setup)
|
| 14 |
+
- [VLM Evaluation Datasets](#vlm-evaluation-datasets)
|
| 15 |
+
- [1. VizWiz Dataset](#1-vizwiz-dataset)
|
| 16 |
+
- [2. OK-VQA Dataset](#2-ok-vqa-dataset)
|
| 17 |
+
- [3. Flickr30k Dataset](#3-flickr30k-dataset)
|
| 18 |
+
- [4. COCO Dataset (2014)](#4-coco-dataset-2014)
|
| 19 |
+
- [CLIP Fine-tuning Datasets](#clip-fine-tuning-datasets)
|
| 20 |
+
- [1. COCO Counterfactuals (COCO-CFs)](#1-coco-counterfactuals-coco-cfs)
|
| 21 |
+
- [2. APGD Adversarial Images](#2-apgd-adversarial-images)
|
| 22 |
+
- [3. COCO 2017 Validation Set](#3-coco-2017-validation-set)
|
| 23 |
+
- [4. COCO Captions and Classification Datasets](#4-coco-captions-and-classification-datasets)
|
| 24 |
+
- [Usage](#usage)
|
| 25 |
+
- [Sparse vs Non-Sparse Attacks Evaluation](#sparse-vs-non-sparse-attacks-evaluation)
|
| 26 |
+
- [Configuration Options](#configuration-options)
|
| 27 |
+
- [Running the Scripts](#running-the-scripts)
|
| 28 |
+
- [Fine-tuning CLIP Models](#fine-tuning-clip-models)
|
| 29 |
+
- [Parameters](#parameters)
|
| 30 |
+
- [Running the Scripts](#running-the-scripts-1)
|
| 31 |
+
- [Zero-Shot Image Classification](#zero-shot-image-classification)
|
| 32 |
+
- [Parameters](#parameters-1)
|
| 33 |
+
- [Running the Scripts](#running-the-scripts-2)
|
| 34 |
+
- [Image-Text Retrieval](#image-text-retrieval)
|
| 35 |
+
- [Parameters](#parameters-2)
|
| 36 |
+
- [License](#license)
|
| 37 |
+
- [Acknowledgments](#acknowledgments)
|
| 38 |
+
|
| 39 |
+
## Prerequisites
|
| 40 |
+
|
| 41 |
+
- **Python version:** 3.11.x
|
| 42 |
+
- **Java:** JDK 1.8.0_202 (required for CIDEr score computation)
|
| 43 |
+
- **CUDA-compatible GPU** (for model training and inference)
|
| 44 |
+
|
| 45 |
+
## Installation
|
| 46 |
+
|
| 47 |
+
1. Clone the repository and navigate to the project directory:
|
| 48 |
+
```bash
|
| 49 |
+
cd Robust_mmfm
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
2. Install required Python packages:
|
| 53 |
+
```bash
|
| 54 |
+
pip install -r requirements.txt
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
3. Download the OpenFlamingo 9B model from [HuggingFace](https://huggingface.co/openflamingo/OpenFlamingo-9B-vitl-mpt7b). After downloading, it should be located in `$HOME/.cache/huggingface/hub/` with the name `models--openflamingo--OpenFlamingo-9B-vitl-mpt7b`.
|
| 58 |
+
|
| 59 |
+
4. Install [JDK 1.8.0_202](https://tubcloud.tu-berlin.de/s/YdRcyp888N5qwkx) and add it to your PATH:
|
| 60 |
+
```bash
|
| 61 |
+
# Add to ~/.bashrc or ~/.zshrc
|
| 62 |
+
export PATH=$PATH:/path/to/jdk1.8.0_202/bin
|
| 63 |
+
export LANG=en_US.UTF-8
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
## Dataset Setup
|
| 67 |
+
|
| 68 |
+
### VLM Evaluation Datasets
|
| 69 |
+
|
| 70 |
+
#### 1. VizWiz Dataset
|
| 71 |
+
- Download the [VizWiz VQA dataset](https://vizwiz.org/tasks-and-datasets/vqa/) (train and validation sets)
|
| 72 |
+
- Annotation files are included in the repository, but can be re-downloaded if corrupted
|
| 73 |
+
- Place images in:
|
| 74 |
+
- `./open_flamingo_datasets/VizWiz/train`
|
| 75 |
+
- `./open_flamingo_datasets/VizWiz/val`
|
| 76 |
+
|
| 77 |
+
#### 2. OK-VQA Dataset
|
| 78 |
+
- Download the [OK-VQA dataset](https://okvqa.allenai.org/download.html) (training and testing images)
|
| 79 |
+
- Annotation files are included in the repository
|
| 80 |
+
- Place all images in: `./open_flamingo_datasets/OKVQA`
|
| 81 |
+
|
| 82 |
+
#### 3. Flickr30k Dataset
|
| 83 |
+
- Download using instructions from [awsaf49/flickr-dataset](https://github.com/awsaf49/flickr-dataset)
|
| 84 |
+
- Annotation files (`karpathy_flickr30k.json`, `dataset_flickr30k_coco_style.json`) are included
|
| 85 |
+
- Alternative annotation download: [TU Berlin Cloud](https://nc.mlcloud.uni-tuebingen.de/index.php/s/mtRnQFaZJkR9zaX)
|
| 86 |
+
- Place images in: `./open_flamingo_datasets/Flickr30k/Images`
|
| 87 |
+
|
| 88 |
+
#### 4. COCO Dataset (2014)
|
| 89 |
+
- Download [COCO 2014](https://cocodataset.org/#download) train and validation sets
|
| 90 |
+
- Annotation files are included in the repository
|
| 91 |
+
- Alternative annotation downloads:
|
| 92 |
+
- [karpathy_coco.json](https://nc.mlcloud.uni-tuebingen.de/index.php/s/mtRnQFaZJkR9zaX)
|
| 93 |
+
- [captions_val2014.json](https://github.com/tylin/coco-caption/blob/master/annotations/captions_val2014.json)
|
| 94 |
+
- Place images in:
|
| 95 |
+
- `./open_flamingo_datasets/COCO/train2014`
|
| 96 |
+
- `./open_flamingo_datasets/COCO/val2014`
|
| 97 |
+
### CLIP Fine-tuning Datasets
|
| 98 |
+
|
| 99 |
+
#### 1. COCO Counterfactuals (COCO-CFs)
|
| 100 |
+
- Download `images.zip` from [HuggingFace COCO-Counterfactuals](https://huggingface.co/datasets/Intel/COCO-Counterfactuals/tree/main/data)
|
| 101 |
+
- Unzip and place images in:
|
| 102 |
+
- `./open_flamingo_datasets/COCO_CF/images`
|
| 103 |
+
- `./clip_train_datasets/MS_COCO_COCO_CF/images`
|
| 104 |
+
- Copy original images (ending with `_0.jpg`) to:
|
| 105 |
+
```bash
|
| 106 |
+
cp ./open_flamingo_datasets/COCO_CF/images/*_0.jpg ./clip_train_datasets/MS_COCO_APGD_4/images
|
| 107 |
+
cp ./open_flamingo_datasets/COCO_CF/images/*_0.jpg ./clip_train_datasets/MS_COCO_APGD_1/images
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
#### 2. APGD Adversarial Images
|
| 111 |
+
- Download from [TU Berlin Cloud](https://tubcloud.tu-berlin.de/s/YdRcyp888N5qwkx):
|
| 112 |
+
- `apgd_1_images.zip` → `./clip_train_datasets/MS_COCO_APGD_1/images`
|
| 113 |
+
- `apgd_4_images.zip` → `./clip_train_datasets/MS_COCO_APGD_4/images`
|
| 114 |
+
|
| 115 |
+
#### 3. COCO 2017 Validation Set
|
| 116 |
+
- Download from [COCO website](https://cocodataset.org/#download)
|
| 117 |
+
- Copy images to all CLIP training dataset folders:
|
| 118 |
+
- `./clip_train_datasets/MS_COCO/images`
|
| 119 |
+
- `./clip_train_datasets/MS_COCO_APGD_4/images`
|
| 120 |
+
- `./clip_train_datasets/MS_COCO_APGD_1/images`
|
| 121 |
+
- `./clip_train_datasets/MS_COCO_COCO_CF/images`
|
| 122 |
+
|
| 123 |
+
#### 4. COCO Captions and Classification Datasets
|
| 124 |
+
- Download `ms_coco_captions.json` from [TU Berlin Cloud](https://tubcloud.tu-berlin.de/s/YdRcyp888N5qwkx)
|
| 125 |
+
- Place in: `./clip_train_datasets/MS_COCO`
|
| 126 |
+
- Download classification datasets from [TU Berlin Cloud](https://tubcloud.tu-berlin.de/s/YdRcyp888N5qwkx):
|
| 127 |
+
- `Caltech101.zip` → unzip in `./image_classification_datasets`
|
| 128 |
+
- `Caltech256.zip` → unzip in `./image_classification_datasets`
|
| 129 |
+
- For ImageNet: Download externally and set path in `vlm_eval/clip_classification.py` line 52
|
| 130 |
+
|
| 131 |
---
|
| 132 |
+
|
| 133 |
+
## Usage
|
| 134 |
+
|
| 135 |
+
### Sparse vs Non-Sparse Attacks Evaluation
|
| 136 |
+
|
| 137 |
+
Evaluate vision-language models against adversarial attacks. The following command demonstrates the evaluation setup (available in `bash/run_script.sh` and `bash/run_script_slurm.sh`):
|
| 138 |
+
```bash
|
| 139 |
+
python -m vlm_eval.run_evaluation \
|
| 140 |
+
--eval_flickr30 \
|
| 141 |
+
--dont_save_adv \
|
| 142 |
+
--verbose \
|
| 143 |
+
--attack saif --eps 255 --steps 100 --mask_out none --mu 1.5 --search_steps 2 --lam 0.005 --k 1000 \
|
| 144 |
+
--pert_factor_graph 0 \
|
| 145 |
+
--itr 0 \
|
| 146 |
+
--itr_clip 0 \
|
| 147 |
+
--itr_dataset base \
|
| 148 |
+
--itr_method APGD_1 \
|
| 149 |
+
--vision_encoder_pretrained openai \
|
| 150 |
+
--num_samples 8 \
|
| 151 |
+
--trial_seeds 42 \
|
| 152 |
+
--num_trials 1 \
|
| 153 |
+
--shots 0 \
|
| 154 |
+
--batch_size 1 \
|
| 155 |
+
--results_file res9B \
|
| 156 |
+
--model open_flamingo \
|
| 157 |
+
--out_base_path /PATH/TO/Robust_mmfm/Results/open_flamingo \
|
| 158 |
+
--vision_encoder_path ViT-L-14 \
|
| 159 |
+
--checkpoint_path /PATH/TO/HUGGINGFACE/hub/models--openflamingo--OpenFlamingo-9B-vitl-mpt7b/snapshots/7e36809c73d038829ad5fba9d0cc949b4e180562/checkpoint.pt \
|
| 160 |
+
--lm_path anas-awadalla/mpt-7b \
|
| 161 |
+
--lm_tokenizer_path anas-awadalla/mpt-7b \
|
| 162 |
+
--precision float16 \
|
| 163 |
+
--cross_attn_every_n_layers 4 \
|
| 164 |
+
--coco_train_image_dir_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/COCO/train2014 \
|
| 165 |
+
--coco_val_image_dir_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/COCO/val2014 \
|
| 166 |
+
--coco_karpathy_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/COCO/karpathy_coco.json \
|
| 167 |
+
--coco_annotations_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/COCO/captions_val2014.json \
|
| 168 |
+
--coco_cf_image_dir_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/COCO_CF \
|
| 169 |
+
--flickr_image_dir_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/Flickr30k/Images \
|
| 170 |
+
--flickr_karpathy_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/Flickr30k/karpathy_flickr30k.json \
|
| 171 |
+
--flickr_annotations_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/Flickr30k/dataset_flickr30k_coco_style.json \
|
| 172 |
+
--vizwiz_train_image_dir_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/VizWiz/train \
|
| 173 |
+
--vizwiz_test_image_dir_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/VizWiz/val \
|
| 174 |
+
--vizwiz_train_questions_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/VizWiz/train_questions_vqa_format.json \
|
| 175 |
+
--vizwiz_train_annotations_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/VizWiz/train_annotations_vqa_format.json \
|
| 176 |
+
--vizwiz_test_questions_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/VizWiz/val_questions_vqa_format.json \
|
| 177 |
+
--vizwiz_test_annotations_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/VizWiz/val_annotations_vqa_format.json \
|
| 178 |
+
--vqav2_train_image_dir_path /home/htc/kchitranshi/SCRATCH/COCO/train2014 \
|
| 179 |
+
--vqav2_train_questions_json_path /home/htc/kchitranshi/SCRATCH/vqav2/v2_OpenEnded_mscoco_train2014_questions.json \
|
| 180 |
+
--vqav2_train_annotations_json_path /home/htc/kchitranshi/SCRATCH/vqav2/v2_mscoco_train2014_annotations.json \
|
| 181 |
+
--vqav2_test_image_dir_path /home/htc/kchitranshi/SCRATCH/COCO/val2014 \
|
| 182 |
+
--vqav2_test_questions_json_path /home/htc/kchitranshi/SCRATCH/vqav2/v2_OpenEnded_mscoco_val2014_questions.json \
|
| 183 |
+
--vqav2_test_annotations_json_path /home/htc/kchitranshi/SCRATCH/vqav2/v2_mscoco_val2014_annotations.json \
|
| 184 |
+
--textvqa_image_dir_path /mnt/datasets/textvqa/train_images \
|
| 185 |
+
--textvqa_train_questions_json_path /home/htc/kchitranshi/SCRATCH/RobustVLM/textvqa/train_questions_vqa_format.json \
|
| 186 |
+
--textvqa_train_annotations_json_path /home/htc/kchitranshi/SCRATCH/RobustVLM/textvqa/train_annotations_vqa_format.json \
|
| 187 |
+
--textvqa_test_questions_json_path /home/htc/kchitranshi/SCRATCH/RobustVLM/textvqa/val_questions_vqa_format.json \
|
| 188 |
+
--textvqa_test_annotations_json_path /home/htc/kchitranshi/RobustVLM/textvqa/val_annotations_vqa_format.json \
|
| 189 |
+
--ok_vqa_train_image_dir_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/COCO/train2014 \
|
| 190 |
+
--ok_vqa_train_questions_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/OKVQA/OpenEnded_mscoco_train2014_questions.json \
|
| 191 |
+
--ok_vqa_train_annotations_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/OKVQA/mscoco_train2014_annotations.json \
|
| 192 |
+
--ok_vqa_test_image_dir_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/COCO/val2014 \
|
| 193 |
+
--ok_vqa_test_questions_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/OKVQA/OpenEnded_mscoco_val2014_questions.json \
|
| 194 |
+
--ok_vqa_test_annotations_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/OKVQA/mscoco_val2014_annotations.json
|
| 195 |
+
```
|
| 196 |
+
|
| 197 |
+
#### Configuration Options
|
| 198 |
+
|
| 199 |
+
**Attack Types:**
|
| 200 |
+
- APGD attack: `--attack apgd --eps <epsilon>`
|
| 201 |
+
- SAIF attack: `--attack saif --eps <epsilon> --k <k_value>`
|
| 202 |
+
- No attack (clean): `--attack none`
|
| 203 |
+
- Targeted attack (COCO only): `--targeted --target_str "TARGET_STRING"`
|
| 204 |
+
|
| 205 |
+
**Shot Settings:**
|
| 206 |
+
- 0-shot: `--shots 0`
|
| 207 |
+
- 4-shot: `--shots 4`
|
| 208 |
+
- Query mode: `--mask_out context`
|
| 209 |
+
- All mode: `--mask_out none`
|
| 210 |
+
|
| 211 |
+
**Evaluation Tasks:**
|
| 212 |
+
- Image Captioning:
|
| 213 |
+
- COCO: `--eval_coco`
|
| 214 |
+
- Flickr30k: `--eval_flickr30`
|
| 215 |
+
- Visual Question Answering:
|
| 216 |
+
- VizWiz: `--eval_vizwiz`
|
| 217 |
+
- OK-VQA: `--eval_ok_vqa`
|
| 218 |
+
|
| 219 |
+
**Other Options:**
|
| 220 |
+
- Save adversarial samples as `.pt` files: remove `--dont_save_adv`
|
| 221 |
+
- Generate perturbation factor graph (0-shot only): `--pert_factor_graph 1`
|
| 222 |
+
|
| 223 |
+
#### Running the Scripts
|
| 224 |
+
|
| 225 |
+
```bash
|
| 226 |
+
# Make scripts executable
|
| 227 |
+
chmod +x ./bash/run_script.sh
|
| 228 |
+
chmod +x ./bash/run_script_slurm.sh
|
| 229 |
+
|
| 230 |
+
# Run locally or remotely
|
| 231 |
+
./bash/run_script.sh
|
| 232 |
+
|
| 233 |
+
# Run on SLURM cluster
|
| 234 |
+
sbatch ./bash/run_script_slurm.sh
|
| 235 |
+
```
|
| 236 |
+
|
| 237 |
+
### Fine-tuning CLIP Models
|
| 238 |
+
|
| 239 |
+
Fine-tune CLIP models on adversarial examples (APGD) and COCO counterfactuals. Example command (available in `bash/train_clip.sh` and `bash/train_clip_slurm.sh`):
|
| 240 |
+
|
| 241 |
+
```bash
|
| 242 |
+
python vlm_eval/clip_train.py \
|
| 243 |
+
--num_epochs 20 \
|
| 244 |
+
--data_seeds 112 113 114 115 \
|
| 245 |
+
--data_name base \
|
| 246 |
+
--method APGD_4 \
|
| 247 |
+
--batch_size 128 \
|
| 248 |
+
--learning_rate 5e-7 \
|
| 249 |
+
--save_model \
|
| 250 |
+
--save_model_path ./fine_tuned_clip_models/APGD_4/
|
| 251 |
+
```
|
| 252 |
+
|
| 253 |
+
This fine-tunes CLIP for 20 epochs on the `base` dataset with APGD attack (ε=4/255).
|
| 254 |
+
|
| 255 |
+
#### Parameters
|
| 256 |
+
|
| 257 |
+
- `--data_name`: Dataset size variant
|
| 258 |
+
- `MS_COCO`: Standard MS COCO (see thesis appendix)
|
| 259 |
+
- `base`: Base subset
|
| 260 |
+
- `medium`: Medium subset
|
| 261 |
+
- `all`: Complete dataset
|
| 262 |
+
|
| 263 |
+
- `--method`: Training method
|
| 264 |
+
- `APGD_4`: APGD with ε=4/255
|
| 265 |
+
- `APGD_1`: APGD with ε=1/255
|
| 266 |
+
- `COCO_CF`: COCO Counterfactuals
|
| 267 |
+
- `NONE`: Clean MS COCO (no perturbations)
|
| 268 |
+
|
| 269 |
+
- `--data_seeds`: Random seeds for dataset sampling (e.g., `112 113 114 115`)
|
| 270 |
+
|
| 271 |
+
#### Running the Scripts
|
| 272 |
+
|
| 273 |
+
```bash
|
| 274 |
+
# Make scripts executable
|
| 275 |
+
chmod +x ./bash/clip_train.sh
|
| 276 |
+
chmod +x ./bash/clip_train_slurm.sh
|
| 277 |
+
|
| 278 |
+
# Run locally or remotely
|
| 279 |
+
./bash/clip_train.sh
|
| 280 |
+
|
| 281 |
+
# Run on SLURM cluster
|
| 282 |
+
sbatch ./bash/clip_train_slurm.sh
|
| 283 |
+
```
|
| 284 |
+
|
| 285 |
+
### Zero-Shot Image Classification
|
| 286 |
+
|
| 287 |
+
Evaluate fine-tuned CLIP models on image classification tasks. Example command (available in `bash/clip_classification.sh` and `bash/clip_classification_slurm.sh`):
|
| 288 |
+
|
| 289 |
+
```bash
|
| 290 |
+
python vlm_eval/clip_classification.py \
|
| 291 |
+
--data base \
|
| 292 |
+
--method COCO_CF \
|
| 293 |
+
--dataset Caltech101
|
| 294 |
+
```
|
| 295 |
+
|
| 296 |
+
This performs zero-shot classification on Caltech101 using a CLIP model fine-tuned on the `base` COCO counterfactuals dataset.
|
| 297 |
+
|
| 298 |
+
#### Parameters
|
| 299 |
+
|
| 300 |
+
- `--data`: Dataset variant
|
| 301 |
+
- `MS_COCO`, `base`, `medium`, `all`: Fine-tuned models
|
| 302 |
+
- `non_fine_tuned`: Pre-trained CLIP only (no fine-tuning)
|
| 303 |
+
|
| 304 |
+
- `--method`: `APGD_4`, `APGD_1`, `COCO_CF`, `NONE`
|
| 305 |
+
|
| 306 |
+
- `--dataset`: Classification dataset
|
| 307 |
+
- `Food101`, `CIFAR10`, `CIFAR100`, `ImageNet`, `Caltech101`, `Caltech256`
|
| 308 |
+
|
| 309 |
+
**Note:** Evaluation is hardcoded to 20 epochs.
|
| 310 |
+
|
| 311 |
+
#### Running the Scripts
|
| 312 |
+
|
| 313 |
+
```bash
|
| 314 |
+
chmod +x ./bash/clip_classification.sh
|
| 315 |
+
chmod +x ./bash/clip_classification_slurm.sh
|
| 316 |
+
|
| 317 |
+
# Run locally or remotely
|
| 318 |
+
./bash/clip_classification.sh
|
| 319 |
+
|
| 320 |
+
# Run on SLURM cluster
|
| 321 |
+
sbatch ./bash/clip_classification_slurm.sh
|
| 322 |
+
```
|
| 323 |
+
|
| 324 |
+
---
|
| 325 |
+
|
| 326 |
+
### Image-Text Retrieval
|
| 327 |
+
|
| 328 |
+
Perform image-to-text (i2t) and text-to-image (t2i) retrieval tasks:
|
| 329 |
+
```bash
|
| 330 |
+
python -m vlm_eval.run_evaluation \
|
| 331 |
+
--eval_flickr30 \
|
| 332 |
+
--dont_save_adv \
|
| 333 |
+
--verbose \
|
| 334 |
+
--attack none --eps 255 --steps 100 --mask_out none --mu 1.5 --search_steps 2 --lam 0.005 --k 1000 \
|
| 335 |
+
--pert_factor_graph 0 \
|
| 336 |
+
--itr 1 \
|
| 337 |
+
--itr_clip 0 \
|
| 338 |
+
--itr_dataset base \
|
| 339 |
+
--itr_method APGD_1 \
|
| 340 |
+
--vision_encoder_pretrained openai \
|
| 341 |
+
--num_samples 1000 \
|
| 342 |
+
--trial_seeds 42 \
|
| 343 |
+
--num_trials 1 \
|
| 344 |
+
--shots 0 \
|
| 345 |
+
--batch_size 1 \
|
| 346 |
+
--results_file res9B \
|
| 347 |
+
--model open_flamingo \
|
| 348 |
+
--out_base_path /PATH/TO/Robust_mmfm/Results/open_flamingo \
|
| 349 |
+
--vision_encoder_path ViT-L-14 \
|
| 350 |
+
--checkpoint_path /PATH/TO/HUGGINGFACE/hub/models--openflamingo--OpenFlamingo-9B-vitl-mpt7b/snapshots/7e36809c73d038829ad5fba9d0cc949b4e180562/checkpoint.pt \
|
| 351 |
+
--lm_path anas-awadalla/mpt-7b \
|
| 352 |
+
--lm_tokenizer_path anas-awadalla/mpt-7b \
|
| 353 |
+
--precision float16 \
|
| 354 |
+
--cross_attn_every_n_layers 4 \
|
| 355 |
+
--coco_train_image_dir_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/COCO/train2014 \
|
| 356 |
+
--coco_val_image_dir_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/COCO/val2014 \
|
| 357 |
+
--coco_karpathy_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/COCO/karpathy_coco.json \
|
| 358 |
+
--coco_annotations_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/COCO/captions_val2014.json \
|
| 359 |
+
--coco_cf_image_dir_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/COCO_CF \
|
| 360 |
+
--flickr_image_dir_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/Flickr30k/Images \
|
| 361 |
+
--flickr_karpathy_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/Flickr30k/karpathy_flickr30k.json \
|
| 362 |
+
--flickr_annotations_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/Flickr30k/dataset_flickr30k_coco_style.json \
|
| 363 |
+
--vizwiz_train_image_dir_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/VizWiz/train \
|
| 364 |
+
--vizwiz_test_image_dir_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/VizWiz/val \
|
| 365 |
+
--vizwiz_train_questions_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/VizWiz/train_questions_vqa_format.json \
|
| 366 |
+
--vizwiz_train_annotations_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/VizWiz/train_annotations_vqa_format.json \
|
| 367 |
+
--vizwiz_test_questions_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/VizWiz/val_questions_vqa_format.json \
|
| 368 |
+
--vizwiz_test_annotations_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/VizWiz/val_annotations_vqa_format.json \
|
| 369 |
+
--vqav2_train_image_dir_path /home/htc/kchitranshi/SCRATCH/COCO/train2014 \
|
| 370 |
+
--vqav2_train_questions_json_path /home/htc/kchitranshi/SCRATCH/vqav2/v2_OpenEnded_mscoco_train2014_questions.json \
|
| 371 |
+
--vqav2_train_annotations_json_path /home/htc/kchitranshi/SCRATCH/vqav2/v2_mscoco_train2014_annotations.json \
|
| 372 |
+
--vqav2_test_image_dir_path /home/htc/kchitranshi/SCRATCH/COCO/val2014 \
|
| 373 |
+
--vqav2_test_questions_json_path /home/htc/kchitranshi/SCRATCH/vqav2/v2_OpenEnded_mscoco_val2014_questions.json \
|
| 374 |
+
--vqav2_test_annotations_json_path /home/htc/kchitranshi/SCRATCH/vqav2/v2_mscoco_val2014_annotations.json \
|
| 375 |
+
--textvqa_image_dir_path /mnt/datasets/textvqa/train_images \
|
| 376 |
+
--textvqa_train_questions_json_path /home/htc/kchitranshi/SCRATCH/RobustVLM/textvqa/train_questions_vqa_format.json \
|
| 377 |
+
--textvqa_train_annotations_json_path /home/htc/kchitranshi/SCRATCH/RobustVLM/textvqa/train_annotations_vqa_format.json \
|
| 378 |
+
--textvqa_test_questions_json_path /home/htc/kchitranshi/SCRATCH/RobustVLM/textvqa/val_questions_vqa_format.json \
|
| 379 |
+
--textvqa_test_annotations_json_path /home/htc/kchitranshi/RobustVLM/textvqa/val_annotations_vqa_format.json \
|
| 380 |
+
--ok_vqa_train_image_dir_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/COCO/train2014 \
|
| 381 |
+
--ok_vqa_train_questions_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/OKVQA/OpenEnded_mscoco_train2014_questions.json \
|
| 382 |
+
--ok_vqa_train_annotations_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/OKVQA/mscoco_train2014_annotations.json \
|
| 383 |
+
--ok_vqa_test_image_dir_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/COCO/val2014 \
|
| 384 |
+
--ok_vqa_test_questions_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/OKVQA/OpenEnded_mscoco_val2014_questions.json \
|
| 385 |
+
--ok_vqa_test_annotations_json_path /PATH/TO/Robust_mmfm/open_flamingo_datasets/OKVQA/mscoco_val2014_annotations.json
|
| 386 |
+
```
|
| 387 |
+
|
| 388 |
+
This evaluates i2t and t2i on the Flickr30k 1K test set (1000 samples) using a CLIP model fine-tuned on the `base` APGD dataset (ε=1/255).
|
| 389 |
+
|
| 390 |
+
#### Parameters
|
| 391 |
+
|
| 392 |
+
- `--itr_dataset`: Dataset for fine-tuned CLIP model
|
| 393 |
+
- `MS_COCO`, `base`, `medium`, `all`: Fine-tuned variants
|
| 394 |
+
- `non_fine_tuned`: Pre-trained CLIP only
|
| 395 |
+
|
| 396 |
+
**Note:** Image-text retrieval does not support targeted attacks or 4-shot settings.
|
| 397 |
+
|
| 398 |
---
|
| 399 |
|
| 400 |
+
|
| 401 |
+
|
| 402 |
+
## License
|
| 403 |
+
|
| 404 |
+
Please refer to the original [RobustVLM repository](https://github.com/chs20/RobustVLM) for licensing information.
|
| 405 |
+
|
| 406 |
+
## Acknowledgments
|
| 407 |
+
|
| 408 |
+
This code is adapted from the [RobustVLM](https://github.com/chs20/RobustVLM) repository. We thank the original authors for their foundational work.
|
requirements.txt
ADDED
|
@@ -0,0 +1,164 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
accelerate==0.24.0
|
| 2 |
+
aiofiles==22.1.0
|
| 3 |
+
aiohttp==3.8.4
|
| 4 |
+
aiosignal==1.3.1
|
| 5 |
+
aiosqlite==0.19.0
|
| 6 |
+
anyio==3.6.2
|
| 7 |
+
appdirs==1.4.4
|
| 8 |
+
argon2-cffi==21.3.0
|
| 9 |
+
argon2-cffi-bindings==21.2.0
|
| 10 |
+
arrow==1.2.3
|
| 11 |
+
asttokens==2.2.1
|
| 12 |
+
async-timeout==4.0.2
|
| 13 |
+
attrs==23.1.0
|
| 14 |
+
Babel==2.12.1
|
| 15 |
+
backcall==0.2.0
|
| 16 |
+
beautifulsoup4==4.12.2
|
| 17 |
+
bleach==6.0.0
|
| 18 |
+
braceexpand==0.1.7
|
| 19 |
+
certifi==2023.5.7
|
| 20 |
+
cffi==1.15.1
|
| 21 |
+
chardet==4.0.0
|
| 22 |
+
charset-normalizer==3.1.0
|
| 23 |
+
click==8.1.3
|
| 24 |
+
cmake==3.26.3
|
| 25 |
+
comm==0.1.3
|
| 26 |
+
contourpy==1.0.7
|
| 27 |
+
cycler==0.11.0
|
| 28 |
+
datasets==2.12.0
|
| 29 |
+
debugpy==1.6.7
|
| 30 |
+
decorator==5.1.1
|
| 31 |
+
defusedxml==0.7.1
|
| 32 |
+
dill==0.3.6
|
| 33 |
+
docker-pycreds==0.4.0
|
| 34 |
+
einops==0.6.1
|
| 35 |
+
einops-exts==0.0.4
|
| 36 |
+
executing==1.2.0
|
| 37 |
+
fastjsonschema==2.16.3
|
| 38 |
+
filelock==3.12.0
|
| 39 |
+
fonttools==4.39.3
|
| 40 |
+
fqdn==1.5.1
|
| 41 |
+
frozenlist==1.3.3
|
| 42 |
+
fsspec==2023.5.0
|
| 43 |
+
ftfy==6.1.1
|
| 44 |
+
geotorch==0.3.0
|
| 45 |
+
gitdb==4.0.10
|
| 46 |
+
GitPython==3.1.31
|
| 47 |
+
huggingface-hub==0.14.1
|
| 48 |
+
idna==2.10
|
| 49 |
+
inflection==0.5.1
|
| 50 |
+
ipykernel==6.23.0
|
| 51 |
+
ipython==8.13.2
|
| 52 |
+
ipython-genutils==0.2.0
|
| 53 |
+
isoduration==20.11.0
|
| 54 |
+
jedi==0.18.2
|
| 55 |
+
Jinja2==3.1.2
|
| 56 |
+
joblib==1.2.0
|
| 57 |
+
json5==0.9.11
|
| 58 |
+
jsonpointer==2.3
|
| 59 |
+
jsonschema==4.17.3
|
| 60 |
+
kiwisolver==1.4.4
|
| 61 |
+
lit==16.0.3
|
| 62 |
+
MarkupSafe==2.1.2
|
| 63 |
+
matplotlib==3.7.1
|
| 64 |
+
matplotlib-inline==0.1.6
|
| 65 |
+
mistune==2.0.5
|
| 66 |
+
more-itertools==9.1.0
|
| 67 |
+
mpmath==1.3.0
|
| 68 |
+
multidict==6.0.4
|
| 69 |
+
multiprocess==0.70.14
|
| 70 |
+
nbclassic==1.0.0
|
| 71 |
+
nbclient==0.7.4
|
| 72 |
+
nbconvert==7.4.0
|
| 73 |
+
nbformat==5.8.0
|
| 74 |
+
nest-asyncio==1.5.6
|
| 75 |
+
networkx==3.1
|
| 76 |
+
nltk==3.8.1
|
| 77 |
+
notebook==6.5.4
|
| 78 |
+
notebook_shim==0.2.3
|
| 79 |
+
numpy==1.24.2
|
| 80 |
+
nvidia-cublas-cu11==11.10.3.66
|
| 81 |
+
nvidia-cuda-cupti-cu11==11.7.101
|
| 82 |
+
nvidia-cuda-nvrtc-cu11==11.7.99
|
| 83 |
+
nvidia-cuda-runtime-cu11==11.7.99
|
| 84 |
+
nvidia-cudnn-cu11==8.5.0.96
|
| 85 |
+
nvidia-cufft-cu11==10.9.0.58
|
| 86 |
+
nvidia-curand-cu11==10.2.10.91
|
| 87 |
+
nvidia-cusolver-cu11==11.4.0.1
|
| 88 |
+
nvidia-cusparse-cu11==11.7.4.91
|
| 89 |
+
nvidia-nccl-cu11==2.14.3
|
| 90 |
+
nvidia-nvtx-cu11==11.7.91
|
| 91 |
+
open-clip-torch==2.19.0
|
| 92 |
+
overrides==7.4.0
|
| 93 |
+
packaging==23.1
|
| 94 |
+
pandas==1.3.5
|
| 95 |
+
pandocfilters==1.5.0
|
| 96 |
+
parso==0.8.3
|
| 97 |
+
pathtools==0.1.2
|
| 98 |
+
pexpect==4.8.0
|
| 99 |
+
pickleshare==0.7.5
|
| 100 |
+
Pillow==9.5.0
|
| 101 |
+
platformdirs==3.5.0
|
| 102 |
+
prometheus-client==0.16.0
|
| 103 |
+
prompt-toolkit==3.0.38
|
| 104 |
+
protobuf==3.20.3
|
| 105 |
+
psutil==5.9.5
|
| 106 |
+
ptyprocess==0.7.0
|
| 107 |
+
pure-eval==0.2.2
|
| 108 |
+
pyarrow==12.0.0
|
| 109 |
+
pycocoevalcap==1.2
|
| 110 |
+
pycocotools==2.0.6
|
| 111 |
+
pycparser==2.21
|
| 112 |
+
Pygments==2.15.1
|
| 113 |
+
pyparsing==3.0.9
|
| 114 |
+
pyrsistent==0.19.3
|
| 115 |
+
python-dateutil==2.8.2
|
| 116 |
+
python-json-logger==2.0.7
|
| 117 |
+
pytz==2023.3
|
| 118 |
+
PyYAML==6.0
|
| 119 |
+
pyzmq==25.0.2
|
| 120 |
+
regex==2023.5.5
|
| 121 |
+
requests==2.25.1
|
| 122 |
+
responses==0.18.0
|
| 123 |
+
rfc3339-validator==0.1.4
|
| 124 |
+
rfc3986-validator==0.1.1
|
| 125 |
+
robustbench @ git+https://github.com/RobustBench/robustbench.git@e67e4225facde47be6a41ed78b576076e8b90cc5
|
| 126 |
+
scikit-learn==1.3.2
|
| 127 |
+
scipy==1.10.1
|
| 128 |
+
Send2Trash==1.8.2
|
| 129 |
+
sentencepiece==0.1.98
|
| 130 |
+
sentry-sdk==1.22.2
|
| 131 |
+
setproctitle==1.3.2
|
| 132 |
+
shortuuid==1.0.11
|
| 133 |
+
six==1.16.0
|
| 134 |
+
smmap==5.0.0
|
| 135 |
+
sniffio==1.3.0
|
| 136 |
+
soupsieve==2.4.1
|
| 137 |
+
stack-data==0.6.2
|
| 138 |
+
sympy==1.11.1
|
| 139 |
+
terminado==0.17.1
|
| 140 |
+
timm==0.6.13
|
| 141 |
+
tinycss2==1.2.1
|
| 142 |
+
tokenizers==0.13.3
|
| 143 |
+
torch==2.0.1
|
| 144 |
+
torchdiffeq==0.2.3
|
| 145 |
+
torchvision==0.15.2
|
| 146 |
+
tornado==6.3.1
|
| 147 |
+
tqdm==4.65.0
|
| 148 |
+
traitlets==5.9.0
|
| 149 |
+
transformers @ git+https://github.com/huggingface/transformers@d3cbc997a231098cca81ac27fd3028a5536abe67
|
| 150 |
+
triton==2.0.0
|
| 151 |
+
typing_extensions==4.5.0
|
| 152 |
+
tzdata==2023.3
|
| 153 |
+
uri-template==1.2.0
|
| 154 |
+
urllib3==1.26.15
|
| 155 |
+
wandb==0.15.2
|
| 156 |
+
wcwidth==0.2.6
|
| 157 |
+
webcolors==1.13
|
| 158 |
+
webdataset==0.2.48
|
| 159 |
+
webencodings==0.5.1
|
| 160 |
+
websocket-client==1.5.1
|
| 161 |
+
xxhash==3.2.0
|
| 162 |
+
y-py==0.5.9
|
| 163 |
+
yarl==1.9.2
|
| 164 |
+
ypy-websocket==0.8.2
|