Spaces:
Sleeping
Sleeping
| title: S MultiMAE | |
| emoji: π | |
| colorFrom: gray | |
| colorTo: blue | |
| sdk: streamlit | |
| sdk_version: 1.33.0 | |
| app_file: streamlit_apps/app.py | |
| pinned: false | |
| # S-MultiMAE | |
| This repository provides the official implementation of `S-MultiMAE A Multi-Ground Truth approach for RGB-D Saliency Detection` | |
| _Nguyen Truong Thinh Huynh, Van Linh Pham, Xuan Toan Mai and Tuan Anh Tran_ | |
|  | |
| ## Model weights | |
| | Backbone | #params | Training paradigm | Weights | Input size | | |
| | -------- | ----------- | ----------------- | ---------------------------------------------------------------------------------------------- | ---------- | | |
| | ViT-L | 328,318,529 | Multi-GT | [Download](https://drive.google.com/file/d/1YhAuu3DI2adPLQgbgoSt74ilZbpuKihh/view?usp=sharing) | 224x224 | | |
| | ViT-B | 107,654,977 | Multi-GT | [Download](https://drive.google.com/file/d/13Omafif3pvPKgg3Isp_srkHf8CSPx33d/view?usp=sharing) | 224x224 | | |
| ## Demo on HuggingFace | |
| - https://huggingface.co/spaces/RGBD-SOD/S-MultiMAE | |
|  | |
|  | |
| ## How to run locally | |
| ### Create a virtual environment | |
| We recommend using python 3.10 or higher. | |
| ```bash | |
| python3.10 -m venv env | |
| source env/bin/activate | |
| pip install -r requirements.txt | |
| ``` | |
| ### Download trained weights | |
| - Download model weights and put it in the folder `weights`. You may also need to download the weights of [DPT model](https://drive.google.com/file/d/1vU4G31_T2PJv1DkA8j-MLXfMjGa7kD3L/view?usp=sharing) (a rgb2depth model). The `weights` folder will look like this: | |
| ```bash | |
| βββ weights | |
| β βββ omnidata_rgb2depth_dpt_hybrid.pth | |
| β βββ s-multimae-cfgv4_0_2006-top1.pth | |
| β βββ s-multimae-cfgv4_0_2007-top1.pth | |
| ``` | |
| ### Run | |
| - Run streamlit app | |
| ``` | |
| streamlit run streamlit_apps/app.py --server.port 9113 --browser.gatherUsageStats False --server.fileWatcherType none | |
| ``` | |
| ## Datasets | |
| ### COME15K dataset | |
| | | 1 GT | 2 GTs | 3 GTs | 4 GTs | 5 GTs | | |
| | --------------------- | ------ | ----- | ------ | ----- | ----- | | |
| | COME8K (8025 samples) | 77.61% | 1.71% | 18.28% | 2.24% | 0.16% | | |
| | COME-E (4600 samples) | 70.5% | 1.87% | 21.15% | 5.70% | 0.78% | | |
| | COME8K (3000 samples) | 62.3% | 2.00% | 25.63% | 8.37% | 1.70% | | |
| ``` | |
| @inproceedings{cascaded_rgbd_sod, | |
| title={RGB-D Saliency Detection via Cascaded Mutual Information Minimization}, | |
| author={Zhang, Jing and Fan, Deng-Ping and Dai, Yuchao and Yu, Xin and Zhong, Yiran and Barnes, Nick and Shao, Ling}, | |
| booktitle={International Conference on Computer Vision (ICCV)}, | |
| year={2021} | |
| } | |
| ``` | |
| ## Acknowledgements | |
| S-MultiMAE is build on top of [MultiMAE](https://github.com/EPFL-VILAB/MultiMAE). We kindly thank the authors for releasing their code. | |
| ```bib | |
| @article{bachmann2022multimae, | |
| author = {Roman Bachmann and David Mizrahi and Andrei Atanov and Amir Zamir}, | |
| title = {{MultiMAE}: Multi-modal Multi-task Masked Autoencoders}, | |
| booktitle = {European Conference on Computer Vision}, | |
| year = {2022}, | |
| } | |
| ``` | |
| ## References | |
| All references are cited in these files: | |
| - [Datasets](./docs/references/Dataset.bib) | |
| - [SOTAs](./docs/references/SOTAs.bib) | |
| - [Others](./docs/references/References.bib) | |