vietanhdev commited on
Commit
1b9f061
Β·
verified Β·
1 Parent(s): 057c155

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +151 -149
README.md CHANGED
@@ -1,149 +1,151 @@
1
- ---
2
- license: apache-2.0
3
- tags:
4
- - image-segmentation
5
- - segment-anything
6
- - segment-anything-3
7
- - open-vocabulary
8
- - text-to-segmentation
9
- - onnx
10
- - onnxruntime
11
- library_name: onnxruntime
12
- ---
13
-
14
- # Segment Anything 3 (SAM 3) β€” ONNX Models
15
-
16
- ONNX-exported version of Meta's **Segment Anything Model 3 (SAM 3)**, an open-vocabulary segmentation model that accepts **text prompts** in addition to points and rectangles.
17
-
18
- SAM 3 uses a CLIP-based language encoder to let you describe objects in natural language (e.g., `"truck"`, `"person with hat"`) and segment them without task-specific training.
19
-
20
- These models are used by **[AnyLabeling](https://github.com/vietanhdev/anylabeling)** for AI-assisted image annotation, and exported by **[samexporter](https://github.com/vietanhdev/samexporter)**.
21
-
22
- ## Available Models
23
-
24
- | File | Contents | Description |
25
- |------|----------|-------------|
26
- | `sam3_vit_h.zip` | 3 ONNX files | SAM 3 ViT-H (all components) |
27
-
28
- The zip contains three ONNX components that work together:
29
-
30
- | ONNX File | Role | Runs |
31
- |-----------|------|------|
32
- | `sam3_image_encoder.onnx` | Extracts visual features from the input image | Once per image |
33
- | `sam3_language_encoder.onnx` | Encodes text prompt tokens into feature vectors | Once per text query |
34
- | `sam3_decoder.onnx` | Produces segmentation masks given image + language features | Per prompt |
35
-
36
- ## Prompt Types
37
-
38
- SAM 3 supports **three prompt modalities**:
39
-
40
- | Prompt | Description |
41
- |--------|-------------|
42
- | **Text** | Natural-language description, e.g. `"truck"` β€” unique to SAM 3 |
43
- | **Point** | Click `+point` / `-point` to include/exclude regions |
44
- | **Rectangle** | Draw a bounding box around the target object |
45
-
46
- Text prompts are the recommended workflow: they drive detection open-vocabulary style, so you can label **any object class** without retraining.
47
-
48
- ## Use with AnyLabeling (Recommended)
49
-
50
- [AnyLabeling](https://github.com/vietanhdev/anylabeling) is a desktop annotation tool with a built-in model manager that downloads, caches, and runs these models automatically β€” no coding required.
51
-
52
- 1. Install: `pip install anylabeling`
53
- 2. Launch: `anylabeling`
54
- 3. Click the **Brain** button β†’ select **Segment Anything 3 (ViT-H)** from the dropdown
55
- 4. Type a text description (e.g., `truck`) in the text prompt field
56
- 5. Optionally refine with point/rectangle prompts
57
-
58
- [![AnyLabeling demo](https://user-images.githubusercontent.com/18329471/236625792-07f01838-3f69-48b0-a12e-30bad27bd921.gif)](https://github.com/vietanhdev/anylabeling)
59
-
60
- ## Use Programmatically with ONNX Runtime
61
-
62
- ```python
63
- import urllib.request, zipfile
64
- url = "https://huggingface.co/vietanhdev/segment-anything-3-onnx-models/resolve/main/sam3_vit_h.zip"
65
- urllib.request.urlretrieve(url, "sam3_vit_h.zip")
66
- with zipfile.ZipFile("sam3_vit_h.zip") as z:
67
- z.extractall("sam3")
68
- ```
69
-
70
- Then use [samexporter](https://github.com/vietanhdev/samexporter)'s inference module:
71
-
72
- ```bash
73
- pip install samexporter
74
-
75
- # Text prompt
76
- python -m samexporter.inference \
77
- --sam_variant sam3 \
78
- --encoder_model sam3/sam3_image_encoder.onnx \
79
- --decoder_model sam3/sam3_decoder.onnx \
80
- --language_encoder_model sam3/sam3_language_encoder.onnx \
81
- --image photo.jpg \
82
- --prompt prompt.json \
83
- --text_prompt "truck" \
84
- --output result.png
85
- ```
86
-
87
- Example `prompt.json` for a text-only query:
88
- ```json
89
- [{"type": "text", "data": "truck"}]
90
- ```
91
-
92
- ## Model Architecture
93
-
94
- SAM 3 follows the same encoder/decoder pattern as SAM and SAM 2, with an added CLIP-based language branch:
95
-
96
- ```
97
- Input image ──► Image Encoder ──────────────────────────┐
98
- β–Ό
99
- Text prompt ──► Language Encoder ──► Decoder ──► Masks + Scores + Boxes
100
- β–²
101
- Optional: point / box prompts β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
102
- ```
103
-
104
- The **image encoder** runs once per image and caches features. The **language encoder** runs once per text query. The **decoder** is lightweight and runs interactively for each prompt combination.
105
-
106
- ## Re-export from Source
107
-
108
- To re-export or customize the models using [samexporter](https://github.com/vietanhdev/samexporter):
109
-
110
- ```bash
111
- pip install samexporter
112
-
113
- # Export all three SAM 3 ONNX components
114
- python -m samexporter.export_sam3 --output_dir output_models/sam3
115
-
116
- # Or use the convenience script:
117
- bash convert_sam3.sh
118
- ```
119
-
120
- ## Custom Model Config for AnyLabeling
121
-
122
- To use a locally re-exported SAM 3 as a custom model in AnyLabeling, create a `config.yaml`:
123
-
124
- ```yaml
125
- type: segment_anything
126
- name: sam3_vit_h_custom
127
- display_name: Segment Anything 3 (ViT-H)
128
- encoder_model_path: sam3_image_encoder.onnx
129
- decoder_model_path: sam3_decoder.onnx
130
- language_encoder_path: sam3_language_encoder.onnx
131
- input_size: 1008
132
- max_height: 1008
133
- max_width: 1008
134
- ```
135
-
136
- Then load it via **Brain button β†’ Load Custom Model** in AnyLabeling.
137
-
138
- ## Related Repositories
139
-
140
- | Repo | Description |
141
- |------|-------------|
142
- | [vietanhdev/samexporter](https://github.com/vietanhdev/samexporter) | Export scripts, inference code, conversion tools |
143
- | [vietanhdev/anylabeling](https://github.com/vietanhdev/anylabeling) | Desktop annotation app powered by these models |
144
- | [facebook/sam3](https://huggingface.co/facebook/sam3) | Original SAM 3 PyTorch checkpoint by Meta |
145
-
146
- ## License
147
-
148
- The ONNX models are derived from Meta's SAM 3, released under the **Apache 2.0** license.
149
- The export code is part of [samexporter](https://github.com/vietanhdev/samexporter), released under the **MIT** license.
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - image-segmentation
5
+ - segment-anything
6
+ - segment-anything-3
7
+ - open-vocabulary
8
+ - text-to-segmentation
9
+ - onnx
10
+ - onnxruntime
11
+ library_name: onnxruntime
12
+ base_model:
13
+ - facebook/sam3
14
+ ---
15
+
16
+ # Segment Anything 3 (SAM 3) β€” ONNX Models
17
+
18
+ ONNX-exported version of Meta's **Segment Anything Model 3 (SAM 3)**, an open-vocabulary segmentation model that accepts **text prompts** in addition to points and rectangles.
19
+
20
+ SAM 3 uses a CLIP-based language encoder to let you describe objects in natural language (e.g., `"truck"`, `"person with hat"`) and segment them without task-specific training.
21
+
22
+ These models are used by **[AnyLabeling](https://github.com/vietanhdev/anylabeling)** for AI-assisted image annotation, and exported by **[samexporter](https://github.com/vietanhdev/samexporter)**.
23
+
24
+ ## Available Models
25
+
26
+ | File | Contents | Description |
27
+ |------|----------|-------------|
28
+ | `sam3_vit_h.zip` | 3 ONNX files | SAM 3 ViT-H (all components) |
29
+
30
+ The zip contains three ONNX components that work together:
31
+
32
+ | ONNX File | Role | Runs |
33
+ |-----------|------|------|
34
+ | `sam3_image_encoder.onnx` | Extracts visual features from the input image | Once per image |
35
+ | `sam3_language_encoder.onnx` | Encodes text prompt tokens into feature vectors | Once per text query |
36
+ | `sam3_decoder.onnx` | Produces segmentation masks given image + language features | Per prompt |
37
+
38
+ ## Prompt Types
39
+
40
+ SAM 3 supports **three prompt modalities**:
41
+
42
+ | Prompt | Description |
43
+ |--------|-------------|
44
+ | **Text** | Natural-language description, e.g. `"truck"` β€” unique to SAM 3 |
45
+ | **Point** | Click `+point` / `-point` to include/exclude regions |
46
+ | **Rectangle** | Draw a bounding box around the target object |
47
+
48
+ Text prompts are the recommended workflow: they drive detection open-vocabulary style, so you can label **any object class** without retraining.
49
+
50
+ ## Use with AnyLabeling (Recommended)
51
+
52
+ [AnyLabeling](https://github.com/vietanhdev/anylabeling) is a desktop annotation tool with a built-in model manager that downloads, caches, and runs these models automatically β€” no coding required.
53
+
54
+ 1. Install: `pip install anylabeling`
55
+ 2. Launch: `anylabeling`
56
+ 3. Click the **Brain** button β†’ select **Segment Anything 3 (ViT-H)** from the dropdown
57
+ 4. Type a text description (e.g., `truck`) in the text prompt field
58
+ 5. Optionally refine with point/rectangle prompts
59
+
60
+ [![AnyLabeling demo](https://user-images.githubusercontent.com/18329471/236625792-07f01838-3f69-48b0-a12e-30bad27bd921.gif)](https://github.com/vietanhdev/anylabeling)
61
+
62
+ ## Use Programmatically with ONNX Runtime
63
+
64
+ ```python
65
+ import urllib.request, zipfile
66
+ url = "https://huggingface.co/vietanhdev/segment-anything-3-onnx-models/resolve/main/sam3_vit_h.zip"
67
+ urllib.request.urlretrieve(url, "sam3_vit_h.zip")
68
+ with zipfile.ZipFile("sam3_vit_h.zip") as z:
69
+ z.extractall("sam3")
70
+ ```
71
+
72
+ Then use [samexporter](https://github.com/vietanhdev/samexporter)'s inference module:
73
+
74
+ ```bash
75
+ pip install samexporter
76
+
77
+ # Text prompt
78
+ python -m samexporter.inference \
79
+ --sam_variant sam3 \
80
+ --encoder_model sam3/sam3_image_encoder.onnx \
81
+ --decoder_model sam3/sam3_decoder.onnx \
82
+ --language_encoder_model sam3/sam3_language_encoder.onnx \
83
+ --image photo.jpg \
84
+ --prompt prompt.json \
85
+ --text_prompt "truck" \
86
+ --output result.png
87
+ ```
88
+
89
+ Example `prompt.json` for a text-only query:
90
+ ```json
91
+ [{"type": "text", "data": "truck"}]
92
+ ```
93
+
94
+ ## Model Architecture
95
+
96
+ SAM 3 follows the same encoder/decoder pattern as SAM and SAM 2, with an added CLIP-based language branch:
97
+
98
+ ```
99
+ Input image ──► Image Encoder ──────────────────────────┐
100
+ β–Ό
101
+ Text prompt ──► Language Encoder ──► Decoder ──► Masks + Scores + Boxes
102
+ β–²
103
+ Optional: point / box prompts β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
104
+ ```
105
+
106
+ The **image encoder** runs once per image and caches features. The **language encoder** runs once per text query. The **decoder** is lightweight and runs interactively for each prompt combination.
107
+
108
+ ## Re-export from Source
109
+
110
+ To re-export or customize the models using [samexporter](https://github.com/vietanhdev/samexporter):
111
+
112
+ ```bash
113
+ pip install samexporter
114
+
115
+ # Export all three SAM 3 ONNX components
116
+ python -m samexporter.export_sam3 --output_dir output_models/sam3
117
+
118
+ # Or use the convenience script:
119
+ bash convert_sam3.sh
120
+ ```
121
+
122
+ ## Custom Model Config for AnyLabeling
123
+
124
+ To use a locally re-exported SAM 3 as a custom model in AnyLabeling, create a `config.yaml`:
125
+
126
+ ```yaml
127
+ type: segment_anything
128
+ name: sam3_vit_h_custom
129
+ display_name: Segment Anything 3 (ViT-H)
130
+ encoder_model_path: sam3_image_encoder.onnx
131
+ decoder_model_path: sam3_decoder.onnx
132
+ language_encoder_path: sam3_language_encoder.onnx
133
+ input_size: 1008
134
+ max_height: 1008
135
+ max_width: 1008
136
+ ```
137
+
138
+ Then load it via **Brain button β†’ Load Custom Model** in AnyLabeling.
139
+
140
+ ## Related Repositories
141
+
142
+ | Repo | Description |
143
+ |------|-------------|
144
+ | [vietanhdev/samexporter](https://github.com/vietanhdev/samexporter) | Export scripts, inference code, conversion tools |
145
+ | [vietanhdev/anylabeling](https://github.com/vietanhdev/anylabeling) | Desktop annotation app powered by these models |
146
+ | [facebook/sam3](https://huggingface.co/facebook/sam3) | Original SAM 3 PyTorch checkpoint by Meta |
147
+
148
+ ## License
149
+
150
+ The ONNX models are derived from Meta's SAM 3, released under the **[SAM License](https://github.com/facebookresearch/sam3/blob/main/LICENSE)**.
151
+ The export code is part of [samexporter](https://github.com/vietanhdev/samexporter), released under the **MIT** license.