File size: 5,636 Bytes

1b9f061

---
license: apache-2.0
tags:
- image-segmentation
- segment-anything
- segment-anything-3
- open-vocabulary
- text-to-segmentation
- onnx
- onnxruntime
library_name: onnxruntime
base_model:
- facebook/sam3
---

# Segment Anything 3 (SAM 3) — ONNX Models

ONNX-exported version of Meta's **Segment Anything Model 3 (SAM 3)**, an open-vocabulary segmentation model that accepts **text prompts** in addition to points and rectangles.

SAM 3 uses a CLIP-based language encoder to let you describe objects in natural language (e.g., `"truck"`, `"person with hat"`) and segment them without task-specific training.

These models are used by **[AnyLabeling](https://github.com/vietanhdev/anylabeling)** for AI-assisted image annotation, and exported by **[samexporter](https://github.com/vietanhdev/samexporter)**.

## Available Models

| File | Contents | Description |
|------|----------|-------------|
| `sam3_vit_h.zip` | 3 ONNX files | SAM 3 ViT-H (all components) |

The zip contains three ONNX components that work together:

| ONNX File | Role | Runs |
|-----------|------|------|
| `sam3_image_encoder.onnx` | Extracts visual features from the input image | Once per image |
| `sam3_language_encoder.onnx` | Encodes text prompt tokens into feature vectors | Once per text query |
| `sam3_decoder.onnx` | Produces segmentation masks given image + language features | Per prompt |

## Prompt Types

SAM 3 supports **three prompt modalities**:

| Prompt | Description |
|--------|-------------|
| **Text** | Natural-language description, e.g. `"truck"` — unique to SAM 3 |
| **Point** | Click `+point` / `-point` to include/exclude regions |
| **Rectangle** | Draw a bounding box around the target object |

Text prompts are the recommended workflow: they drive detection open-vocabulary style, so you can label **any object class** without retraining.

## Use with AnyLabeling (Recommended)

[AnyLabeling](https://github.com/vietanhdev/anylabeling) is a desktop annotation tool with a built-in model manager that downloads, caches, and runs these models automatically — no coding required.

1. Install: `pip install anylabeling`
2. Launch: `anylabeling`
3. Click the **Brain** button → select **Segment Anything 3 (ViT-H)** from the dropdown
4. Type a text description (e.g., `truck`) in the text prompt field
5. Optionally refine with point/rectangle prompts

[![AnyLabeling demo](https://user-images.githubusercontent.com/18329471/236625792-07f01838-3f69-48b0-a12e-30bad27bd921.gif)](https://github.com/vietanhdev/anylabeling)

## Use Programmatically with ONNX Runtime

```python
import urllib.request, zipfile
url = "https://huggingface.co/vietanhdev/segment-anything-3-onnx-models/resolve/main/sam3_vit_h.zip"
urllib.request.urlretrieve(url, "sam3_vit_h.zip")
with zipfile.ZipFile("sam3_vit_h.zip") as z:
    z.extractall("sam3")
```

Then use [samexporter](https://github.com/vietanhdev/samexporter)'s inference module:

```bash
pip install samexporter

# Text prompt
python -m samexporter.inference \
    --sam_variant sam3 \
    --encoder_model sam3/sam3_image_encoder.onnx \
    --decoder_model sam3/sam3_decoder.onnx \
    --language_encoder_model sam3/sam3_language_encoder.onnx \
    --image photo.jpg \
    --prompt prompt.json \
    --text_prompt "truck" \
    --output result.png
```

Example `prompt.json` for a text-only query:
```json
[{"type": "text", "data": "truck"}]
```

## Model Architecture

SAM 3 follows the same encoder/decoder pattern as SAM and SAM 2, with an added CLIP-based language branch:

```
Input image  ──► Image Encoder  ──────────────────────────┐
                                                           ▼
Text prompt  ──► Language Encoder ──► Decoder ──► Masks + Scores + Boxes
                                        ▲
Optional: point / box prompts ──────────┘
```

The **image encoder** runs once per image and caches features. The **language encoder** runs once per text query. The **decoder** is lightweight and runs interactively for each prompt combination.

## Re-export from Source

To re-export or customize the models using [samexporter](https://github.com/vietanhdev/samexporter):

```bash
pip install samexporter

# Export all three SAM 3 ONNX components
python -m samexporter.export_sam3 --output_dir output_models/sam3

# Or use the convenience script:
bash convert_sam3.sh
```

## Custom Model Config for AnyLabeling

To use a locally re-exported SAM 3 as a custom model in AnyLabeling, create a `config.yaml`:

```yaml
type: segment_anything
name: sam3_vit_h_custom
display_name: Segment Anything 3 (ViT-H)
encoder_model_path: sam3_image_encoder.onnx
decoder_model_path: sam3_decoder.onnx
language_encoder_path: sam3_language_encoder.onnx
input_size: 1008
max_height: 1008
max_width: 1008
```

Then load it via **Brain button → Load Custom Model** in AnyLabeling.

## Related Repositories

| Repo | Description |
|------|-------------|
| [vietanhdev/samexporter](https://github.com/vietanhdev/samexporter) | Export scripts, inference code, conversion tools |
| [vietanhdev/anylabeling](https://github.com/vietanhdev/anylabeling) | Desktop annotation app powered by these models |
| [facebook/sam3](https://huggingface.co/facebook/sam3) | Original SAM 3 PyTorch checkpoint by Meta |

## License

The ONNX models are derived from Meta's SAM 3, released under the **[SAM License](https://github.com/facebookresearch/sam3/blob/main/LICENSE)**.
The export code is part of [samexporter](https://github.com/vietanhdev/samexporter), released under the **MIT** license.