Instructions to use SceneWorks/sam2-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use SceneWorks/sam2-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir sam2-mlx SceneWorks/sam2-mlx
- sam2
How to use SceneWorks/sam2-mlx with sam2:
# Use SAM2 with images import torch from sam2.sam2_image_predictor import SAM2ImagePredictor predictor = SAM2ImagePredictor.from_pretrained(SceneWorks/sam2-mlx) with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16): predictor.set_image(<your_image>) masks, _, _ = predictor.predict(<input_prompts>)# Use SAM2 with videos import torch from sam2.sam2_video_predictor import SAM2VideoPredictor predictor = SAM2VideoPredictor.from_pretrained(SceneWorks/sam2-mlx) with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16): state = predictor.init_state(<your_video>) # add new prompts and instantly get the output on the same frame frame_idx, object_ids, masks = predictor.add_new_points(state, <your_prompts>): # propagate the prompts to get masklets throughout the video for frame_idx, object_ids, masks in predictor.propagate_in_video(state): ... - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
# Use SAM2 with videos
import torch
from sam2.sam2_video_predictor import SAM2VideoPredictor
predictor = SAM2VideoPredictor.from_pretrained(SceneWorks/sam2-mlx)
with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
state = predictor.init_state(<your_video>)
# add new prompts and instantly get the output on the same frame
frame_idx, object_ids, masks = predictor.add_new_points(state, <your_prompts>):
# propagate the prompts to get masklets throughout the video
for frame_idx, object_ids, masks in predictor.propagate_in_video(state):
...SceneWorks/sam2-mlx
MLX-converted SAM2.1 (Segment Anything 2) checkpoints for the native-MLX mlx-gen-sam2 segmenter
(SceneWorks engine, epic 3704). No Python in the runtime or weight path — these load directly into
the Rust mlx-rs model.
Provenance
| file | source (official Meta) | source sha256 | output sha256 |
|---|---|---|---|
sam2.1_hiera_large.safetensors |
facebook/sam2.1-hiera-large / sam2.1_hiera_large.pt |
2647878d5dfa5098f2f8649825738a9345572bae2d4350a2468587ece47dd318 |
bbbd94abd316a0867d906c6cdf2d51c780c3fd3e804ab47bdcdc9b29763628e1 |
Converted from the canonical Meta .pt (Apache-2.0) with mlx-gen tools/convert_sam2_to_mlx.py
(Torch OIHW→MLX OHWI conv transposes; learned pos_embed bicubic-interpolated + window-pos fused
into trunk.pos_embed_full). f32. Full segmenter (encoder + prompt encoder + mask decoder + memory).
Key layout matches mlx-gen-sam2 (trunk. / neck. / sam_prompt_encoder. / sam_mask_decoder.
/ memory / obj-ptr); bit-identical to avbiswas/sam2.1-hiera-large-mlx (same conversion).
Quantized
# Use SAM2 with images import torch from sam2.sam2_image_predictor import SAM2ImagePredictor predictor = SAM2ImagePredictor.from_pretrained(SceneWorks/sam2-mlx) with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16): predictor.set_image(<your_image>) masks, _, _ = predictor.predict(<input_prompts>)