File size: 8,596 Bytes
352cafd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
# High Quality Entity Segmentation (ICCV 2023 Oral)
Lu Qi, Jason Kuen, Tiancheng Shen, Jiuxiang Gu, Weidong Guo, Jiaya Jia, Zhe Lin, Ming-Hsuan Yang

This project offers an implementation of the paper, "[High-Quality Entity Segmentation](https://arxiv.org/abs/2211.05776)". This repository serves as an unofficial extension to the [Adobe EntitySeg Github](https://github.com/adobe-research/EntitySeg-Dataset), where you can directly download the EntitySeg Dataset and the source code of our proposed CropFormer. For a more comprehensive view of our results and visualizations, we invite you to explore our [project website](http://luqi.info/entityv2.github.io/).

<div align="center">
  <img src="figures/teaser_mosaic_low.png" width="90%"/>
</div><br/>

## News
2023-09-07 The dataset, code and pretrained models are released.

2023-08-01 Our paper is accepted as ICCV2023 oral.

## Data
Please refer to the official repo [EntitySeg-Dataset](https://github.com/adobe-research/EntitySeg-Dataset) for annotation files and image URLs.
For convenience, we provide the images in several links including [Google Drive](https://drive.google.com/drive/folders/1yX2rhOroyhUCGCrmzSm7DL4BfQWvcG0v?usp=drive_link) and [Hugging Face](https://huggingface.co/datasets/qqlu1992/Adobe_EntitySeg), but we do not own the copyright of the images. It is solely your responsibility to check the original licenses of the images before using them. Any use of the images are at your own discretion and risk.
Furthermore, please refer to [the dataset description](DATA.md) on how to set up the dataset before running our code.

## Code
We offer the instructions on installation, train, evaluation and visualization for the proposed CropFormer in [the code description](CODE.md).

## Model Zoo
Overall, we also provide the pretrained segmentation models in two links including [Google Drive](https://drive.google.com/drive/folders/1uCas3I3xxqZReiXtw2GyNeLgXwkmlsQ_?usp=sharing) or [Hugging Face](https://huggingface.co/datasets/qqlu1992/Adobe_EntitySeg). We illustrate them in the following. For the class-aware segmentation tasks, we directly use the COCO-pretrained Mask2Former Model as our pretrain weights.

### Entity Segmentation
We provide several entity segmentation models. For all the training, we use the COCO-Entity pretrained models as our initialization that are provided in [Google Drive](https://drive.google.com/drive/folders/1dFEE7pefbzvxj1M4-pBOOGEwDalVagJx?usp=drive_link) or [Hugging Face](https://huggingface.co/datasets/qqlu1992/Adobe_EntitySeg/tree/main/CropFormer_model/Entity_Segmentation/COCO_Pretrained_Weights). We evaluate the model on both the overlapped-free $AP$ and no-overlapped $AP^e$ in low-resolution (L) and high-resolution images (H). 
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="center">Method</th>
<th valign="center">Backbone</th>
<th valign="center">Sched</th>
<th valign="center">AP_L</th>
<th valign="center">AP_L^e</th>
<th valign="center">AP_H</th>
<th valign="center">AP_H^e</th>
<th valign="center">download</th>

<tr><td align="center">Mask2Former</td>
<td align="center">Swin-T</td>
<td align="center">3x</td>
<td align="center"> - </td>
<td align="center"> 38.8 </td>
<td align="center"> - </td>
<td align="center"> 40.7 </td>
<td align="center"> <a href="https://huggingface.co/datasets/qqlu1992/Adobe_EntitySeg/tree/main/CropFormer_model/Entity_Segmentation/Mask2Former_swin_tiny_3x">model</a> </td>


<tr><td align="center">CropFormer</td>
<td align="center">Swin-T</td>
<td align="center">3x</td>
<td align="center"> - </td>
<td align="center"> 40.6 </td>
<td align="center"> - </td>
<td align="center"> 43.0 </td>
<td align="center"> <a href="https://huggingface.co/datasets/qqlu1992/Adobe_EntitySeg/tree/main/CropFormer_model/Entity_Segmentation/CropFormer_swin_tiny_3x">model</a> </td>

<tr><td align="center">Mask2Former</td>
<td align="center"> Swin-L </td>
<td align="center"> 3x </td>
<td align="center"> - </td>
<td align="center"> 44.4 </td>
<td align="center"> - </td>
<td align="center"> 46.2 </td>
<td align="center"> <a href="https://huggingface.co/datasets/qqlu1992/Adobe_EntitySeg/tree/main/CropFormer_model/Entity_Segmentation/Mask2Former_swin_large_w7_3x">model</a> </td>

<tr><td align="center">CropFormer</td>
<td align="center">Swin-L</td>
<td align="center">3x</td>
<td align="center"> - </td>
<td align="center"> 45.8 </td>
<td align="center"> - </td>
<td align="center"> 48.2 </td>
<td align="center"> <a href="https://huggingface.co/datasets/qqlu1992/Adobe_EntitySeg/tree/main/CropFormer_model/Entity_Segmentation/CropFormer_swin_large_w7_3x">model</a></td>

<tr><td align="center">Mask2Former</td>
<td align="center"> Hornet-L </td>
<td align="center"> 3x </td>
<td align="center"> 51.0 </td>
<td align="center"> 47.1 </td>
<td align="center"> 53.6 </td>
<td align="center"> 49.2 </td>
<td align="center"> <a href="https://huggingface.co/datasets/qqlu1992/Adobe_EntitySeg/tree/main/CropFormer_model/Entity_Segmentation/Mask2Former_hornet_3x">model</a> </td>

<tr><td align="center">CropFormer</td>
<td align="center"> Hornet-L </td>
<td align="center"> 3x </td>
<td align="center"> - </td>
<td align="center"> 49.1 </td>
<td align="center"> - </td>
<td align="center"> 51.5 </td>
<td align="center"> <a href="https://huggingface.co/datasets/qqlu1992/Adobe_EntitySeg/tree/main/CropFormer_model/Entity_Segmentation/CropFormer_hornet_3x">model</a> </td>
</tbody></table>

### Instance Segmentation
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="center">Method</th>
<th valign="center">Backbone</th>
<th valign="center">Sched</th>
<th valign="center">AP_H</th>
<th valign="center">download</th>

<tr><td align="center">Mask2Former</td>
<td align="center">Swin-T</td>
<td align="center">3x</td>
<td align="center"> 22.7 </td>
<td align="center"> <a href="https://huggingface.co/datasets/qqlu1992/Adobe_EntitySeg/tree/main/CropFormer_model/Instance_Segmentation/Mask2Former_swin_tiny">model</a> </td>


<tr><td align="center">Mask2Former</td>
<td align="center">Swin-L</td>
<td align="center">3x</td>
<td align="center">30.3</td>
<td align="center"> <a href="https://huggingface.co/datasets/qqlu1992/Adobe_EntitySeg/tree/main/CropFormer_model/Instance_Segmentation/Mask2Former_swin_large_w12">model</a> </td>
</tbody></table>

### Semantic Segmentation
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="center">Method</th>
<th valign="center">Backbone</th>
<th valign="center">Sched</th>
<th valign="center">mIoU_H</th>
<th valign="center">download</th>

<tr><td align="center">Mask2Former</td>
<td align="center">Swin-T</td>
<td align="center">3x</td>
<td align="center"> 45.2 </td>
<td align="center"> <a href="https://huggingface.co/datasets/qqlu1992/Adobe_EntitySeg/tree/main/CropFormer_model/Semantic_Segmentation/Mask2Former_Swin_Tiny">model</a> </td>


<tr><td align="center">Mask2Former</td>
<td align="center">Swin-L</td>
<td align="center">3x</td>
<td align="center">51.1</td>
<td align="center"> <a href="https://huggingface.co/datasets/qqlu1992/Adobe_EntitySeg/tree/main/CropFormer_model/Semantic_Segmentation/Mask2Former_Swin_Large">model</a> </td>
</tbody></table>

### Panoptic Segmentation
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="center">Method</th>
<th valign="center">Backbone</th>
<th valign="center">Sched</th>
<th valign="center">PQ_H</th>
<th valign="center">download</th>

<tr><td align="center">Mask2Former</td>
<td align="center">Swin-T</td>
<td align="center">3x</td>
<td align="center"> 9.8 </td>
<td align="center"> <a href="https://huggingface.co/datasets/qqlu1992/Adobe_EntitySeg/tree/main/CropFormer_model/Panoptic_Segmentation/Mask2Former_Swin_Tiny">model</a> </td>


<tr><td align="center">Mask2Former</td>
<td align="center">Swin-L</td>
<td align="center">3x</td>
<td align="center">13.5</td>
<td align="center"> <a href="https://huggingface.co/datasets/qqlu1992/Adobe_EntitySeg/tree/main/CropFormer_model/Panoptic_Segmentation/Mask2Former_Swin_Large_w12">model</a> </td>
</tbody></table>

## <a name="Citation"></a>Citing Ours

Consider to cite **High Quality Entity Segmentation** if it helps your research.

```
@inproceedings{qi2022high,
  title={High Quality Entity Segmentation},
  author={Qi, Lu and Kuen, Jason and Shen, Tiancheng and Gu, Jiuxiang and Guo, Weidong and Jia, Jiaya and Lin, Zhe and Yang, Ming-Hsuan},
  booktitle={ICCV},
  year={2023}
}
```
## <a name="License"></a>License
The code and models are released under the [CC BY-NC 4.0 license](https://creativecommons.org/licenses/by-nc/4.0/).