| --- |
| library_name: mlx |
| base_model: facebook/sam3 |
| tags: |
| - mlx |
| - sam3 |
| - segmentation |
| - detection |
| - tracking |
| --- |
| |
| # sam3-6bit |
|
|
| [facebook/sam3](https://huggingface.co/facebook/sam3) converted to MLX (6-bit quantized, 0.83 GB). |
|
|
| Open-vocabulary **object detection**, **instance segmentation**, and **video tracking** on Apple Silicon (~860M parameters). |
|
|
| ## Quick Start |
|
|
| ```bash |
| pip install mlx-vlm |
| ``` |
|
|
| ```python |
| from PIL import Image |
| from mlx_vlm.utils import load_model, get_model_path |
| from mlx_vlm.models.sam3.generate import Sam3Predictor |
| from mlx_vlm.models.sam3.processing_sam3 import Sam3Processor |
| |
| model_path = get_model_path("mlx-community/sam3-6bit") |
| model = load_model(model_path) |
| processor = Sam3Processor.from_pretrained(str(model_path)) |
| predictor = Sam3Predictor(model, processor, score_threshold=0.3) |
| ``` |
|
|
| ## Object Detection |
|
|
| ```python |
| image = Image.open("photo.jpg") |
| result = predictor.predict(image, text_prompt="a dog") |
| |
| for i in range(len(result.scores)): |
| x1, y1, x2, y2 = result.boxes[i] |
| print(f"[{result.scores[i]:.2f}] box=({x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f})") |
| ``` |
|
|
| ## Instance Segmentation |
|
|
| ```python |
| result = predictor.predict(image, text_prompt="a person") |
| |
| # result.boxes -> (N, 4) xyxy bounding boxes |
| # result.masks -> (N, H, W) binary segmentation masks |
| # result.scores -> (N,) confidence scores |
| |
| import numpy as np |
| overlay = np.array(image).copy() |
| W, H = image.size |
| for i in range(len(result.scores)): |
| mask = result.masks[i] |
| if mask.shape != (H, W): |
| mask = np.array(Image.fromarray(mask.astype(np.float32)).resize((W, H))) |
| binary = mask > 0 |
| overlay[binary] = (overlay[binary] * 0.5 + np.array([255, 0, 0]) * 0.5).astype(np.uint8) |
| ``` |
|
|
| ## Box-Guided Detection |
|
|
| ```python |
| import numpy as np |
| boxes = np.array([[100, 50, 400, 350]]) # xyxy pixel coords |
| result = predictor.predict(image, text_prompt="a cat", boxes=boxes) |
| ``` |
|
|
| ## Semantic Segmentation |
|
|
| ```python |
| import mlx.core as mx |
| |
| inputs = processor.preprocess_image(image) |
| text_inputs = processor.preprocess_text("a cat") |
| outputs = model.detect( |
| mx.array(inputs["pixel_values"]), |
| mx.array(text_inputs["input_ids"]), |
| mx.array(text_inputs["attention_mask"]), |
| ) |
| mx.eval(outputs) |
| |
| pred_masks = outputs["pred_masks"] # (B, 200, 288, 288) instance masks |
| semantic_seg = outputs["semantic_seg"] # (B, 1, 288, 288) semantic segmentation |
| ``` |
|
|
| ## Video Tracking (CLI) |
|
|
| ```bash |
| python -m mlx_vlm.models.sam3.track_video --video input.mp4 --prompt "a car" --model mlx-community/sam3-6bit |
| ``` |
|
|
| | Flag | Default | Description | |
| |------|---------|-------------| |
| | `--video` | *(required)* | Input video path | |
| | `--prompt` | *(required)* | Text prompt | |
| | `--output` | `<input>_tracked.mp4` | Output video path | |
| | `--model` | `facebook/sam3` | Model path or HF repo | |
| | `--threshold` | `0.15` | Score threshold | |
| | `--every` | `2` | Detect every N frames | |
|
|
| ## Original Model |
|
|
| [facebook/sam3](https://huggingface.co/facebook/sam3) 路 [Paper](https://ai.meta.com/blog/segment-anything-model-3/) 路 [Code](https://github.com/facebookresearch/sam3) |
|
|
|
|
| ## License |
|
|
| The original SAM3 model weights are released by Meta under the [**SAM License**](https://huggingface.co/facebook/sam3/blob/main/LICENSE), a custom permissive license that grants a non-exclusive, worldwide, royalty-free license to use, reproduce, distribute, and modify the SAM Materials. Key points: |
|
|
| - Commercial and research use is permitted |
| - Derivative works must include a copy of the SAM License and attribution to Meta |
| - Provided "AS IS" without warranty |
| - Subject to applicable trade controls |
|
|
| This MLX conversion is a derivative work. By using it, you agree to the terms of Meta's SAM License. See the [full license text](https://huggingface.co/facebook/sam3/blob/main/LICENSE) for details. |
|
|