Masked-attention Mask Transformer for Universal Image Segmentation
Paper • 2112.01527 • Published
Cheng et al., 2022 — Masked-attention Mask Transformer for Universal Image Segmentation (arXiv:2112.01527)
Lucid port of facebook/mask2former-swin-tiny-ade-semantic,
converted to Lucid-native safetensors.
| Tag | mIoU | Params | GFLOPs | Size | Source |
|---|---|---|---|---|---|
ADE20K (default) |
47.7 | 47.4M | — | 181.04 MB |
import lucid.models as models
from lucid.models.weights import Mask2FormerSwinTinyWeights
# default tag
model = models.mask2former_swin_tiny(pretrained=True)
# explicit tag (enum or string)
model = models.mask2former_swin_tiny(weights=Mask2FormerSwinTinyWeights.ADE20K)
model = models.mask2former_swin_tiny(pretrained="ADE20K")
# preprocessing travels with the weights
weights = Mask2FormerSwinTinyWeights.ADE20K
preprocess = weights.transforms()
out = model(preprocess(image)[None])
# SemanticSegmentationOutput: per-pixel class logits (B, C, H, W)
seg = out.logits.argmax(axis=1) # (B, H, W) class indices
Converted from facebook/mask2former-swin-tiny-ade-semantic via
python -m tools.convert_weights mask2former_swin_tiny --tag ADE20K.
Key mapping + numerical parity verified against the source.
other — inherited from the original weights.
@inproceedings{cheng2022mask2former,
title={Masked-attention Mask Transformer for Universal Image Segmentation},
author={Cheng, Bowen and Misra, Ishan and Schwing, Alexander G. and Kirillov, Alexander and Girdhar, Rohit},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2022}
}