File size: 5,598 Bytes
002bd9b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
## Experiments

Please check `amlt_configs/` for the experiments configs.

## Performance

The major results can be found in [docs/MODEL_ZOO.md](./MODEL_ZOO.md) and our [Project Page](https://xk-huang.github.io/segment-caption-anything).

We also provide evaluation code of our baseline ([Promptable-GRiT](https://github.com/xk-huang/Promptable-GRiT)) and [benchmark referring VLLMs](https://github.com/xk-huang/benchmark-referring-vllm).

## Evaluate with `vdtk`

### Install `vdtk`

Support CLIP computation with images encoded by base64.

https://github.com/xk-huang/vdtk/tree/dev

- data (e.g., jar files): https://huggingface.co/xk-huang/vdtk-data

Install with external data:

#### Docker

```shell
alias=`whoami | cut -d'.' -f2`
docker run -itd --runtime=nvidia --ipc=host --privileged -v /home/${alias}:/home/${alias} -w `pwd` --name sca nvcr.io/nvidia/pytorch:22.10-py3 bash
docker exec -it sca bash

# In the docker container
# cd to the code dir
. amlt_configs/setup.sh
source ~/.bashrc
pip install pydantic==1.10.8  # https://github.com/pydantic/pydantic/issues/545#issuecomment-1573776471
. amlt_configs/setup_eval_suite.sh
```

#### Conda

```shell
# Install env first
# conda create -n sca -y python=3.9 
# conda activate sca
# conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia

ORIGINAL_DIR="$(pwd)"
REPO_DIR=/tmp/vdtk
git clone --recursive https://github.com/xk-huang/vdtk.git $REPO_DIR -b dev
cd $REPO_DIR
git submodule update --init --recursive

apt-get update
sudo apt-get update
apt-get install git-lfs
sudo apt-get install git-lfs

git lfs install
git clone https://huggingface.co/xk-huang/vdtk-data
# git submodule init && git submodule update

rsync -avP ./vdtk-data/vdtk .
rm -rf vdtk-data

pip install --upgrade pip
pip install -e . POT==0.9.0  # POT=0.9.1 will take up all the memory with tf backend
pip install tensorflow==2.12.1  # Just fix one version of tf
pip install levenshtein==0.21.1
pip install openpyxl==3.1.2

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
cd "$ORIGINAL_DIR"
```

Potential Problems:

- About Tensorflow: TF does not support CUDA 12 now (08/15/23). So we use `nvcr.io/nvidia/pytorch:22.12-py3` which contains CUDA 11.8.
- Encoding in docker image: `import locale;locale.getpreferredencoding()` is `ANSI_X3.4-1968` rather than `UTF-8` which causes error in file writing.
  - change `vdtk/metrics/tokenizer/ptbtokenizer.py:73`: `tmp_file = tempfile.NamedTemporaryFile(mode="w", delete=False, encoding="utf-8")`


### The format of input prediction json file

```json
[
    {
        "_id": 0,
        "split": "inference",
        "references": [
            "a man wearing a red and white shirt"
        ],
        "candidates": [
            "red and yellow",
            "red shirt guy",
            "red and yellow uniform"
        ],
        "metadata": {
            "metadata_input_boxes": [
                0,
                95,
                113,
                419
            ],
            "metadata_image_id": 266240,
            "metadata_region_id": 27287
        },
        "logits": {
            "iou_scores": [
                0.89990234375,
                0.994140625,
                0.99365234375
            ]
        }
    }
]
```

### The structure of files

```
$OUTPUT_DIR/infer/infer-visual_genome-densecap-local-densecap-test.json
# infer-{data_script_identifier}-{name}-{split}.json
```

### All-in-one script

Usage:

```shell
>>> bash scripts/tools/eval_suite.sh
# Env args:
#        DRY_RUN: 
#        ONLY_GATHER: 
#        ONLY_EVAL: 
#        SKIP_CLIP_RECALL: 
#        DEBUG: 
#         NO_POST_PROCESS: 
# Usage: [DRY_RUN=1] [ONLY_GATHER=1] [ONLY_EVAL=1] ./eval_suite.sh <INFERENCE_JSON_DIR> <JSON_FILE_NAME> <SPLIT> [<IMAGE_B64_TSV_PATH>] [<MERGE_TSV_INTO_JSON_FOR_VDTK_SCRIPT>] [<POST_PROCESS_MULTI_CANDIDATES_SCRIPT>]JSON_FILE_NAME is not used, use any string like 'xxx' for it.   
```

e.g.,

```bash
DRY_RUN=1 NO_POST_PROCESS=1 ONLY_EVAL=1 SKIP_CLIP_RECALL=1 bash scripts/tools/eval_suite.sh exp/ xxx inference
```

<details>
<summary>The details about the script.</summary>

1. Replace GT captions (the tokenizer processed ones) with the real GT (`scripts/tools/replace_references_in_json_for_vdtk.py`). Please prepare the folder structure correctly as in [this](The structure of files). It requires the `.hydra` config.
2. Remove multiple predictions but keep one based on IOU score (`scripts/tools/post_process_multi_candidates_for_vdtk.py`).

If there are multiple candidate preditions, we only choose **one candidates** with highest IOU for Meteor, CIDEr-D, ROUGE, etc.:

```shell
python scripts/tools/post_process_multi_candidates_for_vdtk.py -i $INFERENCE_JSON_PATH
```

Process multiple inference json file under a certain dirctory:

```shell
INFERENCE_JSON_DIR=
find $INFERENCE_JSON_DIR -name 'infer.json' -exec python scripts/tools/post_process_multi_candidates_for_vdtk.py -i {} \;
```

3. evaluate with vdtk, and save the results in `.log` file

You need to change `PRED_JSONS_BASE_DIR`, `JSON_FILE_NAME`, `SPLIT`, and `IMAGE_B64_TSV_PATH`.

If the `infer.json` file is too large to open in vscode, you can use vim to open it and change the above variables accordingly.

Currently, `JSON_FILE_NAME` is deprecated as we `find` the `*.json` in `PRED_JSONS_BASE_DIR`.

4. Parse the results for each `*.log` and gather to one xlsx by sheets.

Parse the log. Change the `PRED_JSONS_BASE_DIR` accordingly.

5. Merge each metric into one table with `scripts/tools/merge_sheets_xlsx.py`

</details>