sam3-6bit / README.md
prince-canuma's picture
Add SAM License disclaimer
b453179 verified
---
library_name: mlx
base_model: facebook/sam3
tags:
- mlx
- sam3
- segmentation
- detection
- tracking
---
# sam3-6bit
[facebook/sam3](https://huggingface.co/facebook/sam3) converted to MLX (6-bit quantized, 0.83 GB).
Open-vocabulary **object detection**, **instance segmentation**, and **video tracking** on Apple Silicon (~860M parameters).
## Quick Start
```bash
pip install mlx-vlm
```
```python
from PIL import Image
from mlx_vlm.utils import load_model, get_model_path
from mlx_vlm.models.sam3.generate import Sam3Predictor
from mlx_vlm.models.sam3.processing_sam3 import Sam3Processor
model_path = get_model_path("mlx-community/sam3-6bit")
model = load_model(model_path)
processor = Sam3Processor.from_pretrained(str(model_path))
predictor = Sam3Predictor(model, processor, score_threshold=0.3)
```
## Object Detection
```python
image = Image.open("photo.jpg")
result = predictor.predict(image, text_prompt="a dog")
for i in range(len(result.scores)):
x1, y1, x2, y2 = result.boxes[i]
print(f"[{result.scores[i]:.2f}] box=({x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f})")
```
## Instance Segmentation
```python
result = predictor.predict(image, text_prompt="a person")
# result.boxes -> (N, 4) xyxy bounding boxes
# result.masks -> (N, H, W) binary segmentation masks
# result.scores -> (N,) confidence scores
import numpy as np
overlay = np.array(image).copy()
W, H = image.size
for i in range(len(result.scores)):
mask = result.masks[i]
if mask.shape != (H, W):
mask = np.array(Image.fromarray(mask.astype(np.float32)).resize((W, H)))
binary = mask > 0
overlay[binary] = (overlay[binary] * 0.5 + np.array([255, 0, 0]) * 0.5).astype(np.uint8)
```
## Box-Guided Detection
```python
import numpy as np
boxes = np.array([[100, 50, 400, 350]]) # xyxy pixel coords
result = predictor.predict(image, text_prompt="a cat", boxes=boxes)
```
## Semantic Segmentation
```python
import mlx.core as mx
inputs = processor.preprocess_image(image)
text_inputs = processor.preprocess_text("a cat")
outputs = model.detect(
mx.array(inputs["pixel_values"]),
mx.array(text_inputs["input_ids"]),
mx.array(text_inputs["attention_mask"]),
)
mx.eval(outputs)
pred_masks = outputs["pred_masks"] # (B, 200, 288, 288) instance masks
semantic_seg = outputs["semantic_seg"] # (B, 1, 288, 288) semantic segmentation
```
## Video Tracking (CLI)
```bash
python -m mlx_vlm.models.sam3.track_video --video input.mp4 --prompt "a car" --model mlx-community/sam3-6bit
```
| Flag | Default | Description |
|------|---------|-------------|
| `--video` | *(required)* | Input video path |
| `--prompt` | *(required)* | Text prompt |
| `--output` | `<input>_tracked.mp4` | Output video path |
| `--model` | `facebook/sam3` | Model path or HF repo |
| `--threshold` | `0.15` | Score threshold |
| `--every` | `2` | Detect every N frames |
## Original Model
[facebook/sam3](https://huggingface.co/facebook/sam3) 路 [Paper](https://ai.meta.com/blog/segment-anything-model-3/) 路 [Code](https://github.com/facebookresearch/sam3)
## License
The original SAM3 model weights are released by Meta under the [**SAM License**](https://huggingface.co/facebook/sam3/blob/main/LICENSE), a custom permissive license that grants a non-exclusive, worldwide, royalty-free license to use, reproduce, distribute, and modify the SAM Materials. Key points:
- Commercial and research use is permitted
- Derivative works must include a copy of the SAM License and attribution to Meta
- Provided "AS IS" without warranty
- Subject to applicable trade controls
This MLX conversion is a derivative work. By using it, you agree to the terms of Meta's SAM License. See the [full license text](https://huggingface.co/facebook/sam3/blob/main/LICENSE) for details.