deepspeed / docs /EVAL.md

init

002bd9b about 1 year ago

5.6 kB

	## Experiments

	Please check `amlt_configs/` for the experiments configs.

	## Performance

	The major results can be found in [docs/MODEL_ZOO.md](./MODEL_ZOO.md) and our [Project Page](https://xk-huang.github.io/segment-caption-anything).

	We also provide evaluation code of our baseline ([Promptable-GRiT](https://github.com/xk-huang/Promptable-GRiT)) and [benchmark referring VLLMs](https://github.com/xk-huang/benchmark-referring-vllm).

	## Evaluate with `vdtk`

	### Install `vdtk`

	Support CLIP computation with images encoded by base64.

	https://github.com/xk-huang/vdtk/tree/dev

	- data (e.g., jar files): https://huggingface.co/xk-huang/vdtk-data

	Install with external data:

	#### Docker

	```shell
	alias=`whoami \| cut -d'.' -f2`
	docker run -itd --runtime=nvidia --ipc=host --privileged -v /home/${alias}:/home/${alias} -w `pwd` --name sca nvcr.io/nvidia/pytorch:22.10-py3 bash
	docker exec -it sca bash

	# In the docker container
	# cd to the code dir
	. amlt_configs/setup.sh
	source ~/.bashrc
	pip install pydantic==1.10.8 # https://github.com/pydantic/pydantic/issues/545#issuecomment-1573776471
	. amlt_configs/setup_eval_suite.sh
	```

	#### Conda

	```shell
	# Install env first
	# conda create -n sca -y python=3.9
	# conda activate sca
	# conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia

	ORIGINAL_DIR="$(pwd)"
	REPO_DIR=/tmp/vdtk
	git clone --recursive https://github.com/xk-huang/vdtk.git $REPO_DIR -b dev
	cd $REPO_DIR
	git submodule update --init --recursive

	apt-get update
	sudo apt-get update
	apt-get install git-lfs
	sudo apt-get install git-lfs

	git lfs install
	git clone https://huggingface.co/xk-huang/vdtk-data
	# git submodule init && git submodule update

	rsync -avP ./vdtk-data/vdtk .
	rm -rf vdtk-data

	pip install --upgrade pip
	pip install -e . POT==0.9.0 # POT=0.9.1 will take up all the memory with tf backend
	pip install tensorflow==2.12.1 # Just fix one version of tf
	pip install levenshtein==0.21.1
	pip install openpyxl==3.1.2

	python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
	cd "$ORIGINAL_DIR"
	```

	Potential Problems:

	- About Tensorflow: TF does not support CUDA 12 now (08/15/23). So we use `nvcr.io/nvidia/pytorch:22.12-py3` which contains CUDA 11.8.
	- Encoding in docker image: `import locale;locale.getpreferredencoding()` is `ANSI_X3.4-1968` rather than `UTF-8` which causes error in file writing.
	- change `vdtk/metrics/tokenizer/ptbtokenizer.py:73`: `tmp_file = tempfile.NamedTemporaryFile(mode="w", delete=False, encoding="utf-8")`


	### The format of input prediction json file

	```json
	[
	{
	"_id": 0,
	"split": "inference",
	"references": [
	"a man wearing a red and white shirt"
	],
	"candidates": [
	"red and yellow",
	"red shirt guy",
	"red and yellow uniform"
	],
	"metadata": {
	"metadata_input_boxes": [
	0,
	95,
	113,
	419
	],
	"metadata_image_id": 266240,
	"metadata_region_id": 27287
	},
	"logits": {
	"iou_scores": [
	0.89990234375,
	0.994140625,
	0.99365234375
	]
	}
	}
	]
	```

	### The structure of files

	```
	$OUTPUT_DIR/infer/infer-visual_genome-densecap-local-densecap-test.json
	# infer-{data_script_identifier}-{name}-{split}.json
	```

	### All-in-one script

	Usage:

	```shell
	>>> bash scripts/tools/eval_suite.sh
	# Env args:
	# DRY_RUN:
	# ONLY_GATHER:
	# ONLY_EVAL:
	# SKIP_CLIP_RECALL:
	# DEBUG:
	# NO_POST_PROCESS:
	# Usage: [DRY_RUN=1] [ONLY_GATHER=1] [ONLY_EVAL=1] ./eval_suite.sh <INFERENCE_JSON_DIR> <JSON_FILE_NAME> <SPLIT> [<IMAGE_B64_TSV_PATH>] [<MERGE_TSV_INTO_JSON_FOR_VDTK_SCRIPT>] [<POST_PROCESS_MULTI_CANDIDATES_SCRIPT>]JSON_FILE_NAME is not used, use any string like 'xxx' for it.
	```

	e.g.,

	```bash
	DRY_RUN=1 NO_POST_PROCESS=1 ONLY_EVAL=1 SKIP_CLIP_RECALL=1 bash scripts/tools/eval_suite.sh exp/ xxx inference
	```

	<details>
	<summary>The details about the script.</summary>

	1. Replace GT captions (the tokenizer processed ones) with the real GT (`scripts/tools/replace_references_in_json_for_vdtk.py`). Please prepare the folder structure correctly as in [this](The structure of files). It requires the `.hydra` config.
	2. Remove multiple predictions but keep one based on IOU score (`scripts/tools/post_process_multi_candidates_for_vdtk.py`).

	If there are multiple candidate preditions, we only choose one candidates with highest IOU for Meteor, CIDEr-D, ROUGE, etc.:

	```shell
	python scripts/tools/post_process_multi_candidates_for_vdtk.py -i $INFERENCE_JSON_PATH
	```

	Process multiple inference json file under a certain dirctory:

	```shell
	INFERENCE_JSON_DIR=
	find $INFERENCE_JSON_DIR -name 'infer.json' -exec python scripts/tools/post_process_multi_candidates_for_vdtk.py -i {} \;
	```

	3. evaluate with vdtk, and save the results in `.log` file

	You need to change `PRED_JSONS_BASE_DIR`, `JSON_FILE_NAME`, `SPLIT`, and `IMAGE_B64_TSV_PATH`.

	If the `infer.json` file is too large to open in vscode, you can use vim to open it and change the above variables accordingly.

	Currently, `JSON_FILE_NAME` is deprecated as we `find` the `*.json` in `PRED_JSONS_BASE_DIR`.

	4. Parse the results for each `*.log` and gather to one xlsx by sheets.

	Parse the log. Change the `PRED_JSONS_BASE_DIR` accordingly.

	5. Merge each metric into one table with `scripts/tools/merge_sheets_xlsx.py`

	</details>