SAM 2.1 MLX

MLX-native ports of Meta/Facebook SAM 2.1 models for Apple Silicon.

This model is converted from Meta's SAM 2.1 checkpoints and the official facebookresearch/sam2 implementation. It is intended for local image segmentation and video object tracking with MLX, without requiring PyTorch at runtime.

Install

pip install mlx-sam

or with uv:

uv pip install mlx-sam

Usage

import numpy as np
from mlx_sam import SAM2VideoPredictor

predictor = SAM2VideoPredictor.from_pretrained(
    "avbiswas/sam2.1-hiera-small-mlx"  # replace with this model repo id
)

state = predictor.init_state("path/to/video_or_frames")

predictor.add_new_points_or_box(
    state,
    frame_idx=0,
    obj_id=1,
    points=np.array([[625.0, 429.0]], dtype=np.float32),
    labels=np.array([1], dtype=np.int32),
)

for frame_idx, obj_ids, masks in predictor.propagate_in_video(state):
    # masks: NumPy float32 array shaped [objects, 1, height, width]
    pass

Benchmarks

Benchmarks were run on an Apple M2 Max with 32 GB unified memory. Video tests use the SAM2 dog demo clip: 1280x720, 289 frames, 29.97 FPS, 9.64 s.

FP32 MLX vs Torch/MPS

Prompted first-frame fixture at 1024x1024 internal resolution.

Model Size Torch/MPS MLX Speedup Parity vs Torch
sam2.1-hiera-tiny-mlx 172.6 MiB 96.6 ms 71.3 ms 1.36x mask mean abs 1.17e-05
sam2.1-hiera-small-mlx 199.7 MiB 112.5 ms 84.5 ms 1.33x mask mean abs 8.14e-06
sam2.1-hiera-base-plus-mlx 336.4 MiB 203.5 ms 144.7 ms 1.41x mask mean abs 5.04e-06
sam2.1-hiera-large-mlx 892.2 MiB 433.0 ms 341.1 ms 1.27x mask mean abs 7.84e-06

Video Tracking

For sam2.1-hiera-small-mlx on the 9.64 second dog clip:

Workload Torch/MPS MLX Result
Full video, post-prompt propagation 331 ms/frame 189 ms/frame MLX 1.75x faster
Full video, total run 100.5 s 94.8 s MLX faster end to end
Raw propagation, no save/overlay/final resize 407 ms/frame 287 ms/frame MLX 1.42x faster

Experimental preview mode at 768x768 internal resolution:

Setting Propagation Quality vs 1024
1024x1024 baseline 268.5 ms/frame reference
768x768, fp16 memory attention 52.9 ms/frame mean IoU 0.949, presence 80 / 80 on 80-frame dog clip

Quantized Variants

Quantized models reduce download size and memory footprint. On current MLX kernels, quantization should not be assumed to speed up video tracking; it primarily helps memory and distribution size.

Variant Typical Size Reduction Notes
*-mlx-16bit about 2x smaller fp16 weights, closest quantized parity
*-mlx-8bit about 2.5x-3x smaller int8 linear quantization
*-mlx-4bit about 3.5x smaller mixed recipe: int8 trunk/mask decoder, int4 memory/object-pointer layers

Example small model parity vs fp32 MLX:

Model Size Parity vs fp32 MLX
sam2.1-hiera-small-mlx-16bit 99.9 MiB mask mean abs 8.24e-03
sam2.1-hiera-small-mlx-8bit 76.7 MiB mask mean abs 2.99e-02
sam2.1-hiera-small-mlx-4bit 56.4 MiB mask mean abs 2.87e-02

License

This MLX port is released under the Apache 2.0 license.

The original SAM 2 repository and source models are from Meta/Facebook and are also Apache 2.0 licensed.

Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
52.3M params
Tensor type
F32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for avbiswas/sam2.1-hiera-small-mlx

Finetuned
(20)
this model

Collection including avbiswas/sam2.1-hiera-small-mlx