# Use SAM2 with videos
import torch
from sam2.sam2_video_predictor import SAM2VideoPredictor
predictor = SAM2VideoPredictor.from_pretrained(aimi-models/sam2)
with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
state = predictor.init_state(<your_video>)
# add new prompts and instantly get the output on the same frame
frame_idx, object_ids, masks = predictor.add_new_points(state, <your_prompts>):
# propagate the prompts to get masklets throughout the video
for frame_idx, object_ids, masks in predictor.propagate_in_video(state):
...SAM2 ONNX Mirror (A.I.M.I)
Mirror of SharpAI's SAM2 ONNX exports for A.I.M.I's "Smart-Mask" (click-to-select) feature. Used by the Edit tab's SAM2 workflow — user clicks anywhere on a subject, gets a pixel-perfect alpha mask in ~500 ms.
Contents unmodified from upstream.
Files
hiera-tiny/ — small + fast (default at all tiers)
| File | Size | Purpose |
|---|---|---|
encoder.onnx |
128 MB | Image encoder (run once per image) |
decoder.onnx |
20 MB | Click-to-mask decoder (run per click) |
hiera-base-plus/ — slightly better masks, slightly slower (Expert option)
| File | Size | Purpose |
|---|---|---|
encoder.onnx |
324 MB | Image encoder (base-plus variant) |
decoder.onnx |
20 MB | Click-to-mask decoder |
Total: ~500 MB.
License
Meta AI's Segment Anything 2 (SAM2) code + weights: Apache 2.0. ONNX conversion by SharpAI on HuggingFace, redistributed here under the same license.
Attribution
- SAM2: Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár, Christoph Feichtenhofer — SAM 2: Segment Anything in Images and Videos (Meta AI Research, 2024).
- ONNX export: SharpAI on HuggingFace.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
# Use SAM2 with images import torch from sam2.sam2_image_predictor import SAM2ImagePredictor predictor = SAM2ImagePredictor.from_pretrained(aimi-models/sam2) with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16): predictor.set_image(<your_image>) masks, _, _ = predictor.predict(<input_prompts>)