File size: 5,636 Bytes
1b9f061
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
license: apache-2.0
tags:
- image-segmentation
- segment-anything
- segment-anything-3
- open-vocabulary
- text-to-segmentation
- onnx
- onnxruntime
library_name: onnxruntime
base_model:
- facebook/sam3
---

# Segment Anything 3 (SAM 3) β€” ONNX Models

ONNX-exported version of Meta's **Segment Anything Model 3 (SAM 3)**, an open-vocabulary segmentation model that accepts **text prompts** in addition to points and rectangles.

SAM 3 uses a CLIP-based language encoder to let you describe objects in natural language (e.g., `"truck"`, `"person with hat"`) and segment them without task-specific training.

These models are used by **[AnyLabeling](https://github.com/vietanhdev/anylabeling)** for AI-assisted image annotation, and exported by **[samexporter](https://github.com/vietanhdev/samexporter)**.

## Available Models

| File | Contents | Description |
|------|----------|-------------|
| `sam3_vit_h.zip` | 3 ONNX files | SAM 3 ViT-H (all components) |

The zip contains three ONNX components that work together:

| ONNX File | Role | Runs |
|-----------|------|------|
| `sam3_image_encoder.onnx` | Extracts visual features from the input image | Once per image |
| `sam3_language_encoder.onnx` | Encodes text prompt tokens into feature vectors | Once per text query |
| `sam3_decoder.onnx` | Produces segmentation masks given image + language features | Per prompt |

## Prompt Types

SAM 3 supports **three prompt modalities**:

| Prompt | Description |
|--------|-------------|
| **Text** | Natural-language description, e.g. `"truck"` β€” unique to SAM 3 |
| **Point** | Click `+point` / `-point` to include/exclude regions |
| **Rectangle** | Draw a bounding box around the target object |

Text prompts are the recommended workflow: they drive detection open-vocabulary style, so you can label **any object class** without retraining.

## Use with AnyLabeling (Recommended)

[AnyLabeling](https://github.com/vietanhdev/anylabeling) is a desktop annotation tool with a built-in model manager that downloads, caches, and runs these models automatically β€” no coding required.

1. Install: `pip install anylabeling`
2. Launch: `anylabeling`
3. Click the **Brain** button β†’ select **Segment Anything 3 (ViT-H)** from the dropdown
4. Type a text description (e.g., `truck`) in the text prompt field
5. Optionally refine with point/rectangle prompts

[![AnyLabeling demo](https://user-images.githubusercontent.com/18329471/236625792-07f01838-3f69-48b0-a12e-30bad27bd921.gif)](https://github.com/vietanhdev/anylabeling)

## Use Programmatically with ONNX Runtime

```python
import urllib.request, zipfile
url = "https://huggingface.co/vietanhdev/segment-anything-3-onnx-models/resolve/main/sam3_vit_h.zip"
urllib.request.urlretrieve(url, "sam3_vit_h.zip")
with zipfile.ZipFile("sam3_vit_h.zip") as z:
    z.extractall("sam3")
```

Then use [samexporter](https://github.com/vietanhdev/samexporter)'s inference module:

```bash
pip install samexporter

# Text prompt
python -m samexporter.inference \
    --sam_variant sam3 \
    --encoder_model sam3/sam3_image_encoder.onnx \
    --decoder_model sam3/sam3_decoder.onnx \
    --language_encoder_model sam3/sam3_language_encoder.onnx \
    --image photo.jpg \
    --prompt prompt.json \
    --text_prompt "truck" \
    --output result.png
```

Example `prompt.json` for a text-only query:
```json
[{"type": "text", "data": "truck"}]
```

## Model Architecture

SAM 3 follows the same encoder/decoder pattern as SAM and SAM 2, with an added CLIP-based language branch:

```
Input image  ──► Image Encoder  ──────────────────────────┐
                                                           β–Ό
Text prompt  ──► Language Encoder ──► Decoder ──► Masks + Scores + Boxes
                                        β–²
Optional: point / box prompts β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

The **image encoder** runs once per image and caches features. The **language encoder** runs once per text query. The **decoder** is lightweight and runs interactively for each prompt combination.

## Re-export from Source

To re-export or customize the models using [samexporter](https://github.com/vietanhdev/samexporter):

```bash
pip install samexporter

# Export all three SAM 3 ONNX components
python -m samexporter.export_sam3 --output_dir output_models/sam3

# Or use the convenience script:
bash convert_sam3.sh
```

## Custom Model Config for AnyLabeling

To use a locally re-exported SAM 3 as a custom model in AnyLabeling, create a `config.yaml`:

```yaml
type: segment_anything
name: sam3_vit_h_custom
display_name: Segment Anything 3 (ViT-H)
encoder_model_path: sam3_image_encoder.onnx
decoder_model_path: sam3_decoder.onnx
language_encoder_path: sam3_language_encoder.onnx
input_size: 1008
max_height: 1008
max_width: 1008
```

Then load it via **Brain button β†’ Load Custom Model** in AnyLabeling.

## Related Repositories

| Repo | Description |
|------|-------------|
| [vietanhdev/samexporter](https://github.com/vietanhdev/samexporter) | Export scripts, inference code, conversion tools |
| [vietanhdev/anylabeling](https://github.com/vietanhdev/anylabeling) | Desktop annotation app powered by these models |
| [facebook/sam3](https://huggingface.co/facebook/sam3) | Original SAM 3 PyTorch checkpoint by Meta |

## License

The ONNX models are derived from Meta's SAM 3, released under the **[SAM License](https://github.com/facebookresearch/sam3/blob/main/LICENSE)**.
The export code is part of [samexporter](https://github.com/vietanhdev/samexporter), released under the **MIT** license.