| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="UTF-8"> |
| <title>Title</title> |
| </head> |
| <body> |
| This is a demo of <a href="https://arxiv.org/pdf/2209.11228.pdf">NamedMask: Distilling Segmenters from Complementary Foundation Models</a>.</br> |
| The goal of this work is to segment and name regions of images without access to pixel-level labels during training. |
| To tackle this task, we construct segmenters by distilling the complementary strengths of two foundation models. |
| The first, CLIP (Radford et al. 2021), exhibits the ability to assign names to image content but lacks an accessible representation of object structure. |
| The second, DINO (Caron et al. 2021), captures the spatial extent of objects but has no knowledge of object names. |
| Our method, termed NamedMask, begins by using CLIP to construct category-specific archives of images. |
| These images are pseudo-labelled with a category-agnostic salient object detector bootstrapped from DINO, then refined by category-specific segmenters using the CLIP archive labels. |
| Thanks to the high quality of the refined masks, we show that a standard segmentation architecture trained on these archives with appropriate data augmentation achieves impressive semantic segmentation abilities for both single-object and multi-object images. |
| As a result, our proposed NamedMask performs favourably against a range of prior work on five benchmarks including the VOC2012, COCO and large-scale ImageNet-S datasets. |
| Code is publicly available at <a href="https://github.com/NoelShin/namedmask">our repo</a>. |
| </body> |
| </html> |