vietanhdev
/

segment-anything-3-onnx-models

@@ -1,149 +1,151 @@
----
-license: apache-2.0
-tags:
-  - image-segmentation
-  - segment-anything
-  - segment-anything-3
-  - open-vocabulary
-  - text-to-segmentation
-  - onnx
-  - onnxruntime
-library_name: onnxruntime
----
-# Segment Anything 3 (SAM 3) — ONNX Models
-ONNX-exported version of Meta's **Segment Anything Model 3 (SAM 3)**, an open-vocabulary segmentation model that accepts **text prompts** in addition to points and rectangles.
-SAM 3 uses a CLIP-based language encoder to let you describe objects in natural language (e.g., `"truck"`, `"person with hat"`) and segment them without task-specific training.
-These models are used by **[AnyLabeling](https://github.com/vietanhdev/anylabeling)** for AI-assisted image annotation, and exported by **[samexporter](https://github.com/vietanhdev/samexporter)**.
-## Available Models
-| File | Contents | Description |
-|------|----------|-------------|
-| `sam3_vit_h.zip` | 3 ONNX files | SAM 3 ViT-H (all components) |
-The zip contains three ONNX components that work together:
-| ONNX File | Role | Runs |
-|-----------|------|------|
-| `sam3_image_encoder.onnx` | Extracts visual features from the input image | Once per image |
-| `sam3_language_encoder.onnx` | Encodes text prompt tokens into feature vectors | Once per text query |
-| `sam3_decoder.onnx` | Produces segmentation masks given image + language features | Per prompt |
-## Prompt Types
-SAM 3 supports **three prompt modalities**:
-| Prompt | Description |
-|--------|-------------|
-| **Text** | Natural-language description, e.g. `"truck"` — unique to SAM 3 |
-| **Point** | Click `+point` / `-point` to include/exclude regions |
-| **Rectangle** | Draw a bounding box around the target object |
-Text prompts are the recommended workflow: they drive detection open-vocabulary style, so you can label **any object class** without retraining.
-## Use with AnyLabeling (Recommended)
-[AnyLabeling](https://github.com/vietanhdev/anylabeling) is a desktop annotation tool with a built-in model manager that downloads, caches, and runs these models automatically — no coding required.
-1. Install: `pip install anylabeling`
-2. Launch: `anylabeling`
-3. Click the **Brain** button → select **Segment Anything 3 (ViT-H)** from the dropdown
-4. Type a text description (e.g., `truck`) in the text prompt field
-5. Optionally refine with point/rectangle prompts
-[![AnyLabeling demo](https://user-images.githubusercontent.com/18329471/236625792-07f01838-3f69-48b0-a12e-30bad27bd921.gif)](https://github.com/vietanhdev/anylabeling)
-## Use Programmatically with ONNX Runtime
-```python
-import urllib.request, zipfile
-url = "https://huggingface.co/vietanhdev/segment-anything-3-onnx-models/resolve/main/sam3_vit_h.zip"
-urllib.request.urlretrieve(url, "sam3_vit_h.zip")
-with zipfile.ZipFile("sam3_vit_h.zip") as z:
-    z.extractall("sam3")
-```
-Then use [samexporter](https://github.com/vietanhdev/samexporter)'s inference module:
-```bash
-pip install samexporter
-# Text prompt
-python -m samexporter.inference \
-    --sam_variant sam3 \
-    --encoder_model sam3/sam3_image_encoder.onnx \
-    --decoder_model sam3/sam3_decoder.onnx \
-    --language_encoder_model sam3/sam3_language_encoder.onnx \
-    --image photo.jpg \
-    --prompt prompt.json \
-    --text_prompt "truck" \
-    --output result.png
-```
-Example `prompt.json` for a text-only query:
-```json
-[{"type": "text", "data": "truck"}]
-```
-## Model Architecture
-SAM 3 follows the same encoder/decoder pattern as SAM and SAM 2, with an added CLIP-based language branch:
-```
-Input image  ──► Image Encoder  ──────────────────────────┐
-                                                           ▼
-Text prompt  ──► Language Encoder ──► Decoder ──► Masks + Scores + Boxes
-                                        ▲
-Optional: point / box prompts ──────────┘
-```
-The **image encoder** runs once per image and caches features. The **language encoder** runs once per text query. The **decoder** is lightweight and runs interactively for each prompt combination.
-## Re-export from Source
-To re-export or customize the models using [samexporter](https://github.com/vietanhdev/samexporter):
-```bash
-pip install samexporter
-# Export all three SAM 3 ONNX components
-python -m samexporter.export_sam3 --output_dir output_models/sam3
-# Or use the convenience script:
-bash convert_sam3.sh
-```
-## Custom Model Config for AnyLabeling
-To use a locally re-exported SAM 3 as a custom model in AnyLabeling, create a `config.yaml`:
-```yaml
-type: segment_anything
-name: sam3_vit_h_custom
-display_name: Segment Anything 3 (ViT-H)
-encoder_model_path: sam3_image_encoder.onnx
-decoder_model_path: sam3_decoder.onnx
-language_encoder_path: sam3_language_encoder.onnx
-input_size: 1008
-max_height: 1008
-max_width: 1008
-```
-Then load it via **Brain button → Load Custom Model** in AnyLabeling.
-## Related Repositories
-| Repo | Description |
-|------|-------------|
-| [vietanhdev/samexporter](https://github.com/vietanhdev/samexporter) | Export scripts, inference code, conversion tools |
-| [vietanhdev/anylabeling](https://github.com/vietanhdev/anylabeling) | Desktop annotation app powered by these models |
-| [facebook/sam3](https://huggingface.co/facebook/sam3) | Original SAM 3 PyTorch checkpoint by Meta |
-## License
-The ONNX models are derived from Meta's SAM 3, released under the **Apache 2.0** license.
-The export code is part of [samexporter](https://github.com/vietanhdev/samexporter), released under the **MIT** license.

+---
+license: apache-2.0
+tags:
+- image-segmentation
+- segment-anything
+- segment-anything-3
+- open-vocabulary
+- text-to-segmentation
+- onnx
+- onnxruntime
+library_name: onnxruntime
+base_model:
+- facebook/sam3
+---
+# Segment Anything 3 (SAM 3) — ONNX Models
+ONNX-exported version of Meta's **Segment Anything Model 3 (SAM 3)**, an open-vocabulary segmentation model that accepts **text prompts** in addition to points and rectangles.
+SAM 3 uses a CLIP-based language encoder to let you describe objects in natural language (e.g., `"truck"`, `"person with hat"`) and segment them without task-specific training.
+These models are used by **[AnyLabeling](https://github.com/vietanhdev/anylabeling)** for AI-assisted image annotation, and exported by **[samexporter](https://github.com/vietanhdev/samexporter)**.
+## Available Models
+| File | Contents | Description |
+|------|----------|-------------|
+| `sam3_vit_h.zip` | 3 ONNX files | SAM 3 ViT-H (all components) |
+The zip contains three ONNX components that work together:
+| ONNX File | Role | Runs |
+|-----------|------|------|
+| `sam3_image_encoder.onnx` | Extracts visual features from the input image | Once per image |
+| `sam3_language_encoder.onnx` | Encodes text prompt tokens into feature vectors | Once per text query |
+| `sam3_decoder.onnx` | Produces segmentation masks given image + language features | Per prompt |
+## Prompt Types
+SAM 3 supports **three prompt modalities**:
+| Prompt | Description |
+|--------|-------------|
+| **Text** | Natural-language description, e.g. `"truck"` — unique to SAM 3 |
+| **Point** | Click `+point` / `-point` to include/exclude regions |
+| **Rectangle** | Draw a bounding box around the target object |
+Text prompts are the recommended workflow: they drive detection open-vocabulary style, so you can label **any object class** without retraining.
+## Use with AnyLabeling (Recommended)
+[AnyLabeling](https://github.com/vietanhdev/anylabeling) is a desktop annotation tool with a built-in model manager that downloads, caches, and runs these models automatically — no coding required.
+1. Install: `pip install anylabeling`
+2. Launch: `anylabeling`
+3. Click the **Brain** button → select **Segment Anything 3 (ViT-H)** from the dropdown
+4. Type a text description (e.g., `truck`) in the text prompt field
+5. Optionally refine with point/rectangle prompts
+[![AnyLabeling demo](https://user-images.githubusercontent.com/18329471/236625792-07f01838-3f69-48b0-a12e-30bad27bd921.gif)](https://github.com/vietanhdev/anylabeling)
+## Use Programmatically with ONNX Runtime
+```python
+import urllib.request, zipfile
+url = "https://huggingface.co/vietanhdev/segment-anything-3-onnx-models/resolve/main/sam3_vit_h.zip"
+urllib.request.urlretrieve(url, "sam3_vit_h.zip")
+with zipfile.ZipFile("sam3_vit_h.zip") as z:
+    z.extractall("sam3")
+```
+Then use [samexporter](https://github.com/vietanhdev/samexporter)'s inference module:
+```bash
+pip install samexporter
+# Text prompt
+python -m samexporter.inference \
+    --sam_variant sam3 \
+    --encoder_model sam3/sam3_image_encoder.onnx \
+    --decoder_model sam3/sam3_decoder.onnx \
+    --language_encoder_model sam3/sam3_language_encoder.onnx \
+    --image photo.jpg \
+    --prompt prompt.json \
+    --text_prompt "truck" \
+    --output result.png
+```
+Example `prompt.json` for a text-only query:
+```json
+[{"type": "text", "data": "truck"}]
+```
+## Model Architecture
+SAM 3 follows the same encoder/decoder pattern as SAM and SAM 2, with an added CLIP-based language branch:
+```
+Input image  ──► Image Encoder  ──────────────────────────┐
+                                                           ▼
+Text prompt  ──► Language Encoder ──► Decoder ──► Masks + Scores + Boxes
+                                        ▲
+Optional: point / box prompts ──────────┘
+```
+The **image encoder** runs once per image and caches features. The **language encoder** runs once per text query. The **decoder** is lightweight and runs interactively for each prompt combination.
+## Re-export from Source
+To re-export or customize the models using [samexporter](https://github.com/vietanhdev/samexporter):
+```bash
+pip install samexporter
+# Export all three SAM 3 ONNX components
+python -m samexporter.export_sam3 --output_dir output_models/sam3
+# Or use the convenience script:
+bash convert_sam3.sh
+```
+## Custom Model Config for AnyLabeling
+To use a locally re-exported SAM 3 as a custom model in AnyLabeling, create a `config.yaml`:
+```yaml
+type: segment_anything
+name: sam3_vit_h_custom
+display_name: Segment Anything 3 (ViT-H)
+encoder_model_path: sam3_image_encoder.onnx
+decoder_model_path: sam3_decoder.onnx
+language_encoder_path: sam3_language_encoder.onnx
+input_size: 1008
+max_height: 1008
+max_width: 1008
+```
+Then load it via **Brain button → Load Custom Model** in AnyLabeling.
+## Related Repositories
+| Repo | Description |
+|------|-------------|
+| [vietanhdev/samexporter](https://github.com/vietanhdev/samexporter) | Export scripts, inference code, conversion tools |
+| [vietanhdev/anylabeling](https://github.com/vietanhdev/anylabeling) | Desktop annotation app powered by these models |
+| [facebook/sam3](https://huggingface.co/facebook/sam3) | Original SAM 3 PyTorch checkpoint by Meta |
+## License
+The ONNX models are derived from Meta's SAM 3, released under the **[SAM License](https://github.com/facebookresearch/sam3/blob/main/LICENSE)**.
+The export code is part of [samexporter](https://github.com/vietanhdev/samexporter), released under the **MIT** license.