| # RIS-DMMI | |
| This repository provides the PyTorch implementation of DMMI in the following papers:<br /> | |
| __Beyond One-to-One: Rethinking the Referring Image Segmentation (ICCV2023)__ <br /> | |
| # News | |
| * 2023.10.03-The final version of our dataset has been released. Please remember to download the latest version. | |
| * 2023.10.03-We release our code. | |
| # Dataset | |
| We collect a new comprehensive dataset Ref-ZOM (**Z**ero/**O**ne/**M**any), which contains image-text pairs in one-to-zero, one-to-one and one-to-many conditions. Similar to RefCOCO, RefCOCO+ and G-Ref, all the images in Ref-ZOM are selected from COCO dataset. Here, we provide the text, image and annotation information of Ref-ZOM, which should be utilized with COCO_trainval2014 together. <br /> | |
| Our dataset could be downloaded from:<br /> | |
| [[Baidu Cloud](https://pan.baidu.com/s/1CxPYGWEadHhcViTH2iI7jw?pwd=g7uu)] [[Google Drive](https://drive.google.com/drive/folders/1FaH6U5pywSf0Ufnn4lYIVaykYxqU2vrA?usp=sharing)] <br /> | |
| Remember to download original COCO dataset from:<br /> | |
| [[COCO Dowanload](https://cocodataset.org/#download)]<br /> | |
| # Code | |
| **Prepare**<br /> | |
| * Download the COCO_train2014 and COCO_val2014, and merge the two dataset as a new folder “trainval2014”. Then, in the Line-52 in `/refer/refer.py`, give the path of this folder to `self.Image_DIR`<br /> | |
| * Download and rename the "Ref-ZOM(final).p" as "refs(final).p". Then put refs(final).p and instances.json into `/refer/data/ref-zom/*`. <br /> | |
| * Prepare the Bert similar to [LAVT](https://github.com/yz93/LAVT-RIS) | |
| * Prepare the Refcoco, Refcoco+ and Refcocog similar to [LAVT](https://github.com/yz93/LAVT-RIS) | |
| **Train**<br /> | |
| * Remember to change `--output_dir` and `--pretrained_backbone` as your path.<br /> | |
| * Utilize `--model` to select the backbone. 'dmmi-swin' for Swin-Base and 'dmmi_res' for resnet-50.<br /> | |
| * Utilize `--dataset`, `--splitBy` and `--split` to select the dataset as follwos:<br /> | |
| ``` | |
| # Refcoco | |
| --dataset refcoco, --splitBy unc, --split val | |
| # Refcoco+ | |
| --dataset refcoco+, --splitBy unc, --split val | |
| # Refcocog(umd) | |
| --dataset refcocog, --splitBy umd, --split val | |
| # Refcocog(google) | |
| --dataset refcocog, --splitBy google, --split val | |
| # Ref-zom | |
| --dataset ref-zom, --splitBy final, --split test | |
| ``` | |
| * Begin training!!<br /> | |
| ``` | |
| sh train.sh | |
| ``` | |
| **Test** | |
| * Remember to change `--test_parameter` as your path. Meanwhile, set the `--model`, `--dataset`, `--splitBy` and `--split` properly. <br /> | |
| * Begin test!!<br /> | |
| ``` | |
| sh test.sh | |
| ``` | |
| # Parameter | |
| **Refcocog(umd)**<br /> | |
| | Backbone | oIoU | mIoU | Google Drive |Baidu Cloud | | |
| | ------------- | ------------- | ------------- | ------------- | ------------- | | |
| | ResNet-101 | 59.02 | 62.59 | [Link](https://drive.google.com/file/d/1ziDIeioglD08QQyL-_yGFFlao3PtcJJS/view?usp=drive_link) | [Link](https://pan.baidu.com/s/1uKJ-Wu5TtJhphXNOXo3mIA?pwd=6cgb) | | |
| | Swin-Base | 63.46 | 66.48 | [Link](https://drive.google.com/file/d/1uuGWSYLGYa_qMxTlnZxH6p9FMxQLOQfZ/view?usp=drive_link) | [Link](https://pan.baidu.com/s/1eAT0NgkID4qXpoXMf2bjEg?pwd=bq7w) | | |
| **Ref-ZOM**<br /> | |
| | Backbone | oIoU | mIoU | Google Drive |Baidu Cloud | | |
| | ------------- | ------------- | ------------- | ------------- | ------------- | | |
| | Swin-Base | 68.77 | 68.25 | [Link](https://drive.google.com/file/d/1Ut_E-Fru0bCmjtaC2YhgOLZ7eJorOOpi/view?usp=drive_link) | [Link](https://pan.baidu.com/s/1T-u55rpbc4_CNEXmsA-OJg?pwd=hc6e) | | |
| # Acknowledgements | |
| We strongly appreciate the wonderful work of [LAVT](https://github.com/yz93/LAVT-RIS). Our code is partially founded on this code-base. If you think our work is helpful, we suggest you refer to [LAVT](https://github.com/yz93/LAVT-RIS) and cite it as well.<br /> | |
| # Citation | |
| If you find our work is helpful and want to cite our work, please use the following citation info.<br /> | |
| ``` | |
| @InProceedings{Hu_2023_ICCV, | |
| author = {Hu, Yutao and Wang, Qixiong and Shao, Wenqi and Xie, Enze and Li, Zhenguo and Han, Jungong and Luo, Ping}, | |
| title = {Beyond One-to-One: Rethinking the Referring Image Segmentation}, | |
| booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, | |
| month = {October}, | |
| year = {2023}, | |
| pages = {4067-4077} | |
| } | |