| # Data Preparation | |
| Create a new directory `data` to store all the datasets. | |
| ## Ref-COCO | |
| Download the dataset from the official website [COCO](https://cocodataset.org/#download). | |
| RefCOCO/+/g use the COCO2014 train split. | |
| Download the annotation files from [github](https://github.com/lichengunc/refer). | |
| Convert the annotation files: | |
| ``` | |
| python3 tools/data/convert_refexp_to_coco.py | |
| ``` | |
| Finally, we expect the directory structure to be the following: | |
| ``` | |
| ReferFormer | |
| βββ data | |
| β βββ coco | |
| β β βββ train2014 | |
| β β βββ refcoco | |
| β β β βββ instances_refcoco_train.json | |
| β β β βββ instances_refcoco_val.json | |
| β β βββ refcoco+ | |
| β β β βββ instances_refcoco+_train.json | |
| β β β βββ instances_refcoco+_val.json | |
| β β βββ refcocog | |
| β β β βββ instances_refcocog_train.json | |
| β β β βββ instances_refcocog_val.json | |
| ``` | |
| ## Ref-Youtube-VOS | |
| Download the dataset from the competition's website [here](https://competitions.codalab.org/competitions/29139#participate-get_data). | |
| Then, extract and organize the file. We expect the directory structure to be the following: | |
| ``` | |
| ReferFormer | |
| βββ data | |
| β βββ ref-youtube-vos | |
| β β βββ meta_expressions | |
| β β βββ train | |
| β β β βββ JPEGImages | |
| β β β βββ Annotations | |
| β β β βββ meta.json | |
| β β βββ valid | |
| β β β βββ JPEGImages | |
| ``` | |
| ## Ref-DAVIS17 | |
| Downlaod the DAVIS2017 dataset from the [website](https://davischallenge.org/davis2017/code.html). Note that you only need to download the two zip files `DAVIS-2017-Unsupervised-trainval-480p.zip` and `DAVIS-2017_semantics-480p.zip`. | |
| Download the text annotations from the [website](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/video-segmentation/video-object-segmentation-with-language-referring-expressions). | |
| Then, put the zip files in the directory as follows. | |
| ``` | |
| ReferFormer | |
| βββ data | |
| β βββ ref-davis | |
| β β βββ DAVIS-2017_semantics-480p.zip | |
| β β βββ DAVIS-2017-Unsupervised-trainval-480p.zip | |
| β β βββ davis_text_annotations.zip | |
| ``` | |
| Unzip these zip files. | |
| ``` | |
| unzip -o davis_text_annotations.zip | |
| unzip -o DAVIS-2017_semantics-480p.zip | |
| unzip -o DAVIS-2017-Unsupervised-trainval-480p.zip | |
| ``` | |
| Preprocess the dataset to Ref-Youtube-VOS format. (Make sure you are in the main directory) | |
| ``` | |
| python tools/data/convert_davis_to_ytvos.py | |
| ``` | |
| Finally, unzip the file `DAVIS-2017-Unsupervised-trainval-480p.zip` again (since we use `mv` in preprocess for efficiency). | |
| ``` | |
| unzip -o DAVIS-2017-Unsupervised-trainval-480p.zip | |
| ``` | |
| ## A2D-Sentences | |
| Follow the instructions and download the dataset from the website [here](https://kgavrilyuk.github.io/publication/actor_action/). | |
| Then, extract the files. Additionally, we use the same json annotation files generated by [MTTR](https://github.com/mttr2021/MTTR). Please download these files from [onedrive](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/wjn922_connect_hku_hk/EnvcpWsMsY5NrMF5If3F6DwBseMrqmzQwpTtL8HXoLAChw?e=Vlv1et). | |
| We expect the directory structure to be the following: | |
| ``` | |
| ReferFormer | |
| βββ data | |
| β βββ a2d_sentences | |
| β β βββ Release | |
| β β βββ text_annotations | |
| β β β βββ a2d_annotation_with_instances | |
| β β β βββ a2d_annotation.txt | |
| β β β βββ a2d_missed_videos.txt | |
| β β βββ a2d_sentences_single_frame_test_annotations.json | |
| β β βββ a2d_sentences_single_frame_train_annotations.json | |
| β β βββ a2d_sentences_test_annotations_in_coco_format.json | |
| ``` | |
| ## JHMDB-Sentences | |
| Follow the instructions and download the dataset from the website [here](https://kgavrilyuk.github.io/publication/actor_action/). | |
| Then, extract the files. Additionally, we use the same json annotation files generated by [MTTR](https://github.com/mttr2021/MTTR). Please download these files from [onedrive](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/wjn922_connect_hku_hk/EjPyzXq93s5Jm4GU07JrWIMBb6nObY8fEmLyuiGg-0uBtg?e=GsZ6jP). | |
| We expect the directory structure to be the following: | |
| ``` | |
| ReferFormer | |
| βββ data | |
| β βββ jhmdb_sentences | |
| β β βββ Rename_Images | |
| β β βββ puppet_mask | |
| β β βββ jhmdb_annotation.txt | |
| β β βββ jhmdb_sentences_samples_metadata.json | |
| β β βββ jhmdb_sentences_gt_annotations_in_coco_format.json | |
| ``` |