Spaces:
Runtime error
Runtime error
| # Panoptic-DeepLab | |
| Panoptic-DeepLab is a state-of-the-art **box-free** system for panoptic | |
| segmentation [1], where the goal is to assign a unique value, encoding both | |
| semantic label (e.g., person, car) and instance ID (e.g., instance_1, | |
| instance_2), to every pixel in an image. | |
| Panoptic-DeepLab improves over the DeeperLab [6], which is one of the first | |
| box-free systems for panoptic segmentation combining DeepLabv3+ [7] and | |
| PersonLab [8], by simplifying the class-agnostic instance detection to only use | |
| a center keypoint. As a result, Panoptic-DeepLab predicts three outputs: (1) | |
| semantic segmentation, (2) instance center heatmap, and (3) instance center | |
| regression. | |
| The class-agnostic instance segmentation is first obtained by grouping | |
| the predicted foreground pixels (inferred by semantic segmentation) to their | |
| closest predicted instance centers [2]. To generate final panoptic segmentation, | |
| we then fuse the class-agnostic instance segmentation with semantic segmentation | |
| by the efficient majority-vote scheme [6]. | |
| <p align="center"> | |
| <img src="../img/panoptic_deeplab.png" width=800> | |
| </p> | |
| ## Prerequisite | |
| 1. Make sure the software is properly [installed](../setup/installation.md). | |
| 2. Make sure the target dataset is correctly prepared (e.g., | |
| [Cityscapes](../setup/cityscapes.md), [COCO](../setup/coco.md)). | |
| 3. Download the ImageNet pretrained | |
| [checkpoints](./imagenet_pretrained_checkpoints.md), and update the | |
| `initial_checkpoint` path in the config files. | |
| ## Model Zoo | |
| In the Model Zoo, we explore building Panoptic-DeepLab on top of several | |
| backbones (e.g., ResNet model variants [3]). | |
| Herein, we highlight some of the employed backbones: | |
| 1. **ResNet-50-Beta**: We replace the original stem in ResNet-50 [3] with the | |
| Inception stem [9], i.e., the first original 7x7 convolution is replaced | |
| by three 3x3 convolutions. | |
| 2. **Wide-ResNet-41**: We modify the Wide-ResNet-38 [5] by (1) removing the | |
| last residual block, and (2) repeating the second last residual block two | |
| more times. | |
| 3. **SWideRNet-SAC-(1, 1, x)**, where x = $$\{1, 3, 4.5\}$$, scaling the | |
| backbone layers (excluding the stem) of Wide-ResNet-41 by a factor of x. This | |
| backbone only employs the Switchable Atrous Convolution (SAC) without the | |
| Squeeze-and-Excitation modules [10]. | |
| ### Cityscapes Panoptic Segmentation | |
| We provide checkpoints pretrained on Cityscapes train-fine set below. If you | |
| would like to train those models by yourself, please find the corresponding | |
| config files under the directory | |
| [configs/cityscapes/panoptic_deeplab](../../configs/cityscapes/panoptic_deeplab). | |
| All the reported results are obtained by *single-scale* inference and | |
| *ImageNet-1K* pretrained checkpoints. | |
| Backbone | Output stride | Input resolution | PQ [*] | mIoU [*] | PQ [**] | mIoU [**] | AP<sup>Mask</sup> [**] | |
| -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :---------------: | :----: | :------: | :-----: | :-------: | :--------------------: | |
| MobilenetV3-S ([config](../../configs/cityscapes/panoptic_deeplab/mobilenet_v3_small_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/mobilenet_v3_small_os32_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 32 | 1025 x 2049 | 46.7 | 69.5 | 46.92 | 69.8 | 16.53 | |
| MobilenetV3-L ([config](../../configs/cityscapes/panoptic_deeplab/mobilenet_v3_large_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/mobilenet_v3_large_os32_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 32 | 1025 x 2049 | 52.7 | 73.8 | 53.07 | 74.15 | 22.58 | |
| ResNet-50 ([config](../../configs/cityscapes/panoptic_deeplab/resnet50_os32_merge_with_pure_tf_func.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50_os32_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 32 | 1025 x 2049 | 59.8 | 76.0 | 60.24 | 76.36 | 30.01 | |
| ResNet-50-Beta ([config](../../configs/cityscapes/panoptic_deeplab/resnet50_beta_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50_beta_os32_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 32 | 1025 x 2049 | 60.8 | 77.0 | 61.16 | 77.37 | 31.58 | |
| Wide-ResNet-41 ([config](../../configs/cityscapes/panoptic_deeplab/wide_resnet41_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/wide_resnet41_os16_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 16 | 1025 x 2049 | 64.4 | 81.5 | 64.83 | 81.92 | 36.07 | |
| SWideRNet-SAC-(1, 1, 1) ([config](../../configs/cityscapes/panoptic_deeplab/swidernet_sac_1_1_1_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/swidernet_sac_1_1_1_os16_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 16 | 1025 x 2049 | 64.3 | 81.8 | 64.81 | 82.24 | 36.80 | |
| SWideRNet-SAC-(1, 1, 3) ([config](../../configs/cityscapes/panoptic_deeplab/swidernet_sac_1_1_3_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/swidernet_sac_1_1_3_os16_panoptic_deeplab_cityscapes_trainfine.tar.gz))) | 16 | 1025 x 2049 | 66.6 | 82.1 | 67.05 | 82.67 | 38.59 | |
| SWideRNet-SAC-(1, 1, 4.5) ([config](../../configs/cityscapes/panoptic_deeplab/swidernet_sac_1_1_4.5_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/swidernet_sac_1_1_4.5_os16_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 16 | 1025 x 2049 | 66.8 | 82.2 | 67.29 | 82.74 | 39.51 | |
| [*]: Results evaluated by the official script. Instance segmentation evaluation | |
| is not supported yet (need to convert our prediction format). | |
| [**]: Results evaluated by our pipeline. See Q4 in [FAQ](../faq.md). | |
| ### COCO Panoptic Segmentation | |
| We provide checkpoints pretrained on COCO train set below. If you would like to | |
| train those models by yourself, please find the corresponding config files under | |
| the directory | |
| [configs/coco/panoptic_deeplab](../../configs/coco/panoptic_deeplab). | |
| All the reported results are obtained by *single-scale* inference and | |
| *ImageNet-1K* pretrained checkpoints. | |
| Backbone | Output stride | Input resolution | PQ [*] | PQ [**] | mIoU [**] | AP<sup>Mask</sup> [**] | |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :-----------: | :---------------: | :----: | :-----: | :-------: | :--------------------: | |
| ResNet-50 ([config](../../configs/coco/panoptic_deeplab/resnet50_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50_os32_panoptic_deeplab_coco_train_2.tar.gz)) | 32 | 641 x 641 | 34.1 | 34.60 | 54.75 | 18.50 | |
| ResNet-50-Beta ([config](../../configs/coco/panoptic_deeplab/resnet50_beta_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50beta_os32_panoptic_deeplab_coco_train.tar.gz)) | 32 | 641 x 641 | 34.6 | 35.10 | 54.98 | 19.24 | |
| ResNet-50 ([config](../../configs/coco/panoptic_deeplab/resnet50_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50_os16_panoptic_deeplab_coco_train.tar.gz)) | 16 | 641 x 641 | 35.1 | 35.67 | 55.52 | 19.40 | |
| ResNet-50-Beta ([config](../../configs/coco/panoptic_deeplab/resnet50_beta_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50beta_os16_panoptic_deeplab_coco_train.tar.gz)) | 16 | 641 x 641 | 35.2 | 35.76 | 55.45 | 19.63 | |
| \[*]: Results evaluated by the official script. | |
| \[**]: Results evaluated by our pipeline. See Q4 in [FAQ](../faq.md). | |
| ## Citing Panoptic-DeepLab | |
| If you find this code helpful in your research or wish to refer to the baseline | |
| results, please use the following BibTeX entry. | |
| * Panoptic-DeepLab: | |
| ``` | |
| @inproceedings{panoptic_deeplab_2020, | |
| author={Bowen Cheng and Maxwell D Collins and Yukun Zhu and Ting Liu and Thomas S Huang and Hartwig Adam and Liang-Chieh Chen}, | |
| title={{Panoptic-DeepLab}: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation}, | |
| booktitle={CVPR}, | |
| year={2020} | |
| } | |
| ``` | |
| If you use the Wide-ResNet-41 backbone, please consider citing | |
| * Naive-Student: | |
| ``` | |
| @inproceedings{naive_student_2020, | |
| title={{Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation}}, | |
| author={Chen, Liang-Chieh and Lopes, Raphael Gontijo and Cheng, Bowen and Collins, Maxwell D and Cubuk, Ekin D and Zoph, Barret and Adam, Hartwig and Shlens, Jonathon}, | |
| booktitle={ECCV}, | |
| year={2020} | |
| } | |
| ``` | |
| If you use the SWideRNet backbone w/ Switchable Atrous Convolution, | |
| please consider citing | |
| * SWideRNet: | |
| ``` | |
| @article{swidernet_2020, | |
| title={Scaling Wide Residual Networks for Panoptic Segmentation}, | |
| author={Chen, Liang-Chieh and Wang, Huiyu and Qiao, Siyuan}, | |
| journal={arXiv:2011.11675}, | |
| year={2020} | |
| } | |
| ``` | |
| * Swichable Atrous Convolution (SAC): | |
| ``` | |
| @inproceedings{detectors_2021, | |
| title={{DetectoRS}: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution}, | |
| author={Qiao, Siyuan and Chen, Liang-Chieh and Yuille, Alan}, | |
| booktitle={CVPR}, | |
| year={2021} | |
| } | |
| ``` | |
| If you use the MobileNetv3 backbone, please consider citing | |
| * MobileNetv3 | |
| ``` | |
| @inproceedings{howard2019searching, | |
| title={Searching for {MobileNetV3}}, | |
| author={Howard, Andrew and Sandler, Mark and Chu, Grace and Chen, Liang-Chieh and Chen, Bo and Tan, Mingxing and Wang, Weijun and Zhu, Yukun and Pang, Ruoming and Vasudevan, Vijay and others}, | |
| booktitle={ICCV}, | |
| year={2019} | |
| } | |
| ``` | |
| ### References | |
| 1. Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr | |
| Dollar. "Panoptic segmentation." In CVPR, 2019. | |
| 2. Alex Kendall, Yarin Gal, and Roberto Cipolla. "Multi-task learning using | |
| uncertainty to weigh losses for scene geometry and semantics." In CVPR, 2018. | |
| 3. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual | |
| learning for image recognition." In CVPR, 2016. | |
| 4. Sergey Zagoruyko and Nikos Komodakis. "Wide residual networks." In BMVC, | |
| 2016. | |
| 5. Zifeng Wu, Chunhua Shen, and Anton Van Den Hengel. "Wider or deeper: | |
| Revisiting the ResNet model for visual recognition." Pattern Recognition, | |
| 2019. | |
| 6. Tien-Ju Yang, Maxwell D Collins, Yukun Zhu, Jyh-Jing Hwang, Ting Liu, | |
| Xiao Zhang, Vivienne Sze, George Papandreou, and Liang-Chieh Chen. | |
| "DeeperLab: Single-shot image parser." arXiv:1902.05093, 2019. | |
| 7. Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and | |
| Hartwig Adam. "Encoder-decoder with atrous separable convolution for | |
| semantic image segmentation." In ECCV, 2018. | |
| 8. George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris, | |
| Jonathan Tompson, and Kevin Murphy. "Personlab: Person pose estimation | |
| and instance segmentation with a bottom-up, part-based, geometric embedding | |
| model." In ECCV, 2018. | |
| 9. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and | |
| Zbigniew Wojna. "Rethinking the inception architecture for computer | |
| vision." In CVPR, 2016. | |
| 10. Jie Hu, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." | |
| In CVPR, 2018. | |