YannQi
/

COMBO-AVS-checkpoints

English

Model card Files Files and versions

xet

Community

YannQi commited on Mar 19, 2024

Commit

dd86488

verified ·

1 Parent(s): 1cd65d4

Upload README.md

Browse files

Files changed (1) hide show

README.md +146 -3

README.md CHANGED Viewed

@@ -1,3 +1,146 @@
----
-license: apache-2.0
----

+# [Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation](https://yannqi.github.io/AVS-COMBO/)
+[Qi Yang](https://yannqi.github.io/), Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan and [Shiming Xiang](https://people.ucas.ac.cn/~xiangshiming)
+This repository provides the pretrained checkpoints for the paper "Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation" accepted by CVPR 2024.
+## 🔥What's New
+- (2024. 3.14) Our checkpoints are available to the public!
+- (2024. 3.12) Our code is available to the public🌲!
+- (2024. 2.27) Our paper(COMBO) is accepted by CVPR 2024!
+- (2023.11.17) We completed the implemention of COMBO and push the code.
+<!-- ## 🪵 TODO List -->
+## 🛠️ Getting Started
+### 1. Environments
+- Linux or macOS with Python ≥ 3.6
+```shell
+# recommended
+pip install -r requirements.txt
+pip install soundfile
+# build MSDeformAttention
+cd model/modeling/pixel_decoder/ops
+sh make.sh
+```
+- Preprocessing for detectron2
+  For using Siam-Encoder Module (SEM), we refine 1-line code of the detectron2.
+  The refined file that requires attention is located at:
+  `conda_envs/xxx/lib/python3.xx/site-packages/detectron2/checkpoint/c2_model_loading.py`
+  (refine the `xxx`  to your own environment)
+  Commenting out the following code in [L287](https://github.com/facebookresearch/detectron2/blob/cc9266c2396d5545315e3601027ba4bc28e8c95b/detectron2/checkpoint/c2_model_loading.py#L287) will allow the code to run without errors:
+```python
+# raise ValueError("Cannot match one checkpoint key to multiple keys in the model.")
+```
+- Install Semantic-SAM (Optional)
+```shell
+# Semantic-SAM
+pip install git+https://github.com/cocodataset/panopticapi.git
+git clone https://github.com/UX-Decoder/Semantic-SAM
+cd Semantic-SAM
+python -m pip install -r requirements.txt
+```
+Find out more at [Semantic-SAM](https://github.com/UX-Decoder/Semantic-SAM)
+### 2. Datasets
+Please refer to the link [AVSBenchmark](https://github.com/OpenNLPLab/AVSBench) to download the datasets. You can put the data under `data` folder or rename your own folder. Remember to modify the path in config files. The `data` directory is as bellow:
+```
+|--AVS_dataset
+   |--AVSBench_semantic/
+   |--AVSBench_object/Multi-sources/
+   |--AVSBench_object/Single-source/
+```
+### 3. Download Pre-Trained Models
+- The pretrained backbone is available from benchmark AVSBench pretrained backbones[TODO].
+```
+|--pretrained
+   |--detectron2/R-50.pkl
+   |--detectron2/d2_pvt_v2_b5.pkl
+   |--vggish-10086976.pth
+   |--vggish_pca_params-970ea276.pth
+```
+### 4. Maskiges pregeneration
+- Generate class-agnostic masks (Optional)
+```shell
+sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh train # or ms3, avss
+sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh val
+sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh test
+```
+- Generate Maskiges (Optional)
+```shell
+python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split train # or ms3, avss
+python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split val
+python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split test
+```
+- Move Maskiges to the following folder
+  Note: For convenience, we provide pre-generated Maskiges for S4\MS3\AVSS subset on the TODO hugging face link.
+```
+|--AVS_dataset
+    |--AVSBench_semantic/pre_SAM_mask/
+    |--AVSBench_object/Multi-sources/ms3_data/pre_SAM_mask/
+    |--AVSBench_object/Single-source/s4_data/pre_SAM_mask/
+```
+### 5. Train
+```shell
+# ResNet-50
+sh scripts/res_train_avs4.sh # or ms3, avss
+```
+```shell
+# PVTv2
+sh scripts/pvt_train_avs4.sh # or ms3, avss
+```
+### 6. Test
+```shell
+# ResNet-50
+sh scripts/res_test_avs4.sh # or ms3, avss
+```
+```shell
+# PVTv2
+sh scripts/pvt_test_avs4.sh # or ms3, avss
+```
+## 🤝 Citing COMBO
+```
+@misc{yang2023cooperation,
+      title={Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation},
+      author={Qi Yang and Xing Nie and Tong Li and Pengfei Gao and Ying Guo and Cheng Zhen and Pengfei Yan and Shiming Xiang},
+      year={2023},
+      eprint={2312.06462},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+```