File size: 5,598 Bytes
002bd9b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
## Experiments
Please check `amlt_configs/` for the experiments configs.
## Performance
The major results can be found in [docs/MODEL_ZOO.md](./MODEL_ZOO.md) and our [Project Page](https://xk-huang.github.io/segment-caption-anything).
We also provide evaluation code of our baseline ([Promptable-GRiT](https://github.com/xk-huang/Promptable-GRiT)) and [benchmark referring VLLMs](https://github.com/xk-huang/benchmark-referring-vllm).
## Evaluate with `vdtk`
### Install `vdtk`
Support CLIP computation with images encoded by base64.
https://github.com/xk-huang/vdtk/tree/dev
- data (e.g., jar files): https://huggingface.co/xk-huang/vdtk-data
Install with external data:
#### Docker
```shell
alias=`whoami | cut -d'.' -f2`
docker run -itd --runtime=nvidia --ipc=host --privileged -v /home/${alias}:/home/${alias} -w `pwd` --name sca nvcr.io/nvidia/pytorch:22.10-py3 bash
docker exec -it sca bash
# In the docker container
# cd to the code dir
. amlt_configs/setup.sh
source ~/.bashrc
pip install pydantic==1.10.8 # https://github.com/pydantic/pydantic/issues/545#issuecomment-1573776471
. amlt_configs/setup_eval_suite.sh
```
#### Conda
```shell
# Install env first
# conda create -n sca -y python=3.9
# conda activate sca
# conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
ORIGINAL_DIR="$(pwd)"
REPO_DIR=/tmp/vdtk
git clone --recursive https://github.com/xk-huang/vdtk.git $REPO_DIR -b dev
cd $REPO_DIR
git submodule update --init --recursive
apt-get update
sudo apt-get update
apt-get install git-lfs
sudo apt-get install git-lfs
git lfs install
git clone https://huggingface.co/xk-huang/vdtk-data
# git submodule init && git submodule update
rsync -avP ./vdtk-data/vdtk .
rm -rf vdtk-data
pip install --upgrade pip
pip install -e . POT==0.9.0 # POT=0.9.1 will take up all the memory with tf backend
pip install tensorflow==2.12.1 # Just fix one version of tf
pip install levenshtein==0.21.1
pip install openpyxl==3.1.2
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
cd "$ORIGINAL_DIR"
```
Potential Problems:
- About Tensorflow: TF does not support CUDA 12 now (08/15/23). So we use `nvcr.io/nvidia/pytorch:22.12-py3` which contains CUDA 11.8.
- Encoding in docker image: `import locale;locale.getpreferredencoding()` is `ANSI_X3.4-1968` rather than `UTF-8` which causes error in file writing.
- change `vdtk/metrics/tokenizer/ptbtokenizer.py:73`: `tmp_file = tempfile.NamedTemporaryFile(mode="w", delete=False, encoding="utf-8")`
### The format of input prediction json file
```json
[
{
"_id": 0,
"split": "inference",
"references": [
"a man wearing a red and white shirt"
],
"candidates": [
"red and yellow",
"red shirt guy",
"red and yellow uniform"
],
"metadata": {
"metadata_input_boxes": [
0,
95,
113,
419
],
"metadata_image_id": 266240,
"metadata_region_id": 27287
},
"logits": {
"iou_scores": [
0.89990234375,
0.994140625,
0.99365234375
]
}
}
]
```
### The structure of files
```
$OUTPUT_DIR/infer/infer-visual_genome-densecap-local-densecap-test.json
# infer-{data_script_identifier}-{name}-{split}.json
```
### All-in-one script
Usage:
```shell
>>> bash scripts/tools/eval_suite.sh
# Env args:
# DRY_RUN:
# ONLY_GATHER:
# ONLY_EVAL:
# SKIP_CLIP_RECALL:
# DEBUG:
# NO_POST_PROCESS:
# Usage: [DRY_RUN=1] [ONLY_GATHER=1] [ONLY_EVAL=1] ./eval_suite.sh <INFERENCE_JSON_DIR> <JSON_FILE_NAME> <SPLIT> [<IMAGE_B64_TSV_PATH>] [<MERGE_TSV_INTO_JSON_FOR_VDTK_SCRIPT>] [<POST_PROCESS_MULTI_CANDIDATES_SCRIPT>]JSON_FILE_NAME is not used, use any string like 'xxx' for it.
```
e.g.,
```bash
DRY_RUN=1 NO_POST_PROCESS=1 ONLY_EVAL=1 SKIP_CLIP_RECALL=1 bash scripts/tools/eval_suite.sh exp/ xxx inference
```
<details>
<summary>The details about the script.</summary>
1. Replace GT captions (the tokenizer processed ones) with the real GT (`scripts/tools/replace_references_in_json_for_vdtk.py`). Please prepare the folder structure correctly as in [this](The structure of files). It requires the `.hydra` config.
2. Remove multiple predictions but keep one based on IOU score (`scripts/tools/post_process_multi_candidates_for_vdtk.py`).
If there are multiple candidate preditions, we only choose **one candidates** with highest IOU for Meteor, CIDEr-D, ROUGE, etc.:
```shell
python scripts/tools/post_process_multi_candidates_for_vdtk.py -i $INFERENCE_JSON_PATH
```
Process multiple inference json file under a certain dirctory:
```shell
INFERENCE_JSON_DIR=
find $INFERENCE_JSON_DIR -name 'infer.json' -exec python scripts/tools/post_process_multi_candidates_for_vdtk.py -i {} \;
```
3. evaluate with vdtk, and save the results in `.log` file
You need to change `PRED_JSONS_BASE_DIR`, `JSON_FILE_NAME`, `SPLIT`, and `IMAGE_B64_TSV_PATH`.
If the `infer.json` file is too large to open in vscode, you can use vim to open it and change the above variables accordingly.
Currently, `JSON_FILE_NAME` is deprecated as we `find` the `*.json` in `PRED_JSONS_BASE_DIR`.
4. Parse the results for each `*.log` and gather to one xlsx by sheets.
Parse the log. Change the `PRED_JSONS_BASE_DIR` accordingly.
5. Merge each metric into one table with `scripts/tools/merge_sheets_xlsx.py`
</details> |