Spaces:
Running
Running
| Read the MobileSAM paper this weekend 📖 Sharing some insights! | |
| The idea 💡: SAM model consist of three parts, a heavy image encoder, a prompt encoder (prompt can be text, bounding box, mask or point) and a mask decoder. | |
| To make the SAM model smaller without compromising from the performance, the authors looked into three types of distillation. | |
| First one is distilling the decoder outputs directly (a more naive approach) with a completely randomly initialized small ViT and randomly initialized mask decoder. | |
| However, when the ViT and the decoder are both in a bad state, this doesn't work well. | |
|  | |
| The second type of distillation is called semi-coupled, where the authors only randomly initialized the ViT image encoder and kept the mask decoder. | |
| This is called semi-coupled because the image encoder distillation still depends on the mask decoder (see below 👇 ) | |
|  | |
| The last type of distillation, decoupled distillation, is the most intuitive IMO. | |
| The authors have "decoupled" image encoder altogether and have frozen the mask decoder and didn't really distill based on generated masks. | |
| This makes sense as the bottleneck here is the encoder itself and most of the time, distillation works well with encoding. | |
|  | |
| Finally, they found out that decoupled distillation performs better than coupled distillation by means of mean IoU and requires much less compute! ♥️ | |
|  | |
| Wanted to leave some links here if you'd like to try yourself 👇 | |
| - MobileSAM [demo](https://huggingface.co/spaces/dhkim2810/MobileSAMMobileSAM) | |
| - Model [repository](https://huggingface.co/dhkim2810/MobileSAM) | |
| If you'd like to experiment around TinyViT, timm library has a bunch of [checkpoints available](https://huggingface.co/models?sort=trending&search=timm%2Ftinyvit). | |
|  | |
| > [!TIP] | |
| Ressources: | |
| [Faster Segment Anything: Towards Lightweight SAM for Mobile Applications](https://arxiv.org/abs/2306.14289) | |
| by Chaoning Zhang, Dongshen Han, Yu Qiao, Jung Uk Kim, Sung-Ho Bae, Seungkyu Lee, Choong Seon Hong (2023) | |
| [GitHub](https://github.com/ChaoningZhang/MobileSAM) | |
| > [!NOTE] | |
| [Original tweet](https://twitter.com/mervenoyann/status/1738959605542076863) (December 24, 2023) |