SAM 2: Segment Anything in Images and Videos
Paper
•
2408.00714
•
Published
•
120
A HieraDet small image encoder from Meta's SAM 2.1 release, converted to the Birder format for image feature extraction. This version retains the pretrained backbone weights and exposes the backbone as a general-purpose visual feature extractor for downstream tasks (e.g., embeddings or detection backbones).
See: https://huggingface.co/facebook/sam2.1-hiera-small and the official SAM 2 repository for further details.
Model Type: Image classification and detection backbone
Model Stats:
Papers:
import birder
from birder.inference.classification import infer_image
(net, model_info) = birder.load_pretrained_model("hieradet_small_sam2_1", inference=True)
# Get the image size the model was trained on
size = birder.get_size_from_signature(model_info.signature)
# Create an inference transform
transform = birder.classification_transform(size, model_info.rgb_stats)
image = "path/to/image.jpeg" # or a PIL image
(out, embedding) = infer_image(net, image, transform, return_embedding=True)
# embedding is a NumPy array with shape of (1, 768)
from PIL import Image
import birder
(net, model_info) = birder.load_pretrained_model("hieradet_small_sam2_1", inference=True)
# Get the image size the model was trained on
size = birder.get_size_from_signature(model_info.signature)
# Create an inference transform
transform = birder.classification_transform(size, model_info.rgb_stats)
image = Image.open("path/to/image.jpeg")
features = net.detection_features(transform(image).unsqueeze(0))
# features is a dict (stage name -> torch.Tensor)
print([(k, v.size()) for k, v in features.items()])
# Output example:
# [('stage1', torch.Size([1, 96, 56, 56])),
# ('stage2', torch.Size([1, 192, 28, 28])),
# ('stage3', torch.Size([1, 384, 14, 14])),
# ('stage4', torch.Size([1, 768, 7, 7]))]
@misc{ryali2023hierahierarchicalvisiontransformer,
title={Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles},
author={Chaitanya Ryali and Yuan-Ting Hu and Daniel Bolya and Chen Wei and Haoqi Fan and Po-Yao Huang and Vaibhav Aggarwal and Arkabandhu Chowdhury and Omid Poursaeed and Judy Hoffman and Jitendra Malik and Yanghao Li and Christoph Feichtenhofer},
year={2023},
eprint={2306.00989},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2306.00989},
}
@misc{ravi2024sam2segmentimages,
title={SAM 2: Segment Anything in Images and Videos},
author={Nikhila Ravi and Valentin Gabeur and Yuan-Ting Hu and Ronghang Hu and Chaitanya Ryali and Tengyu Ma and Haitham Khedr and Roman Rädle and Chloe Rolland and Laura Gustafson and Eric Mintun and Junting Pan and Kalyan Vasudev Alwala and Nicolas Carion and Chao-Yuan Wu and Ross Girshick and Piotr Dollár and Christoph Feichtenhofer},
year={2024},
eprint={2408.00714},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.00714},
}
Base model
facebook/sam2.1-hiera-small