| | --- |
| | license: apache-2.0 |
| | tags: |
| | - image-segmentation |
| | - segment-anything |
| | - segment-anything-3 |
| | - open-vocabulary |
| | - text-to-segmentation |
| | - onnx |
| | - onnxruntime |
| | library_name: onnxruntime |
| | base_model: |
| | - facebook/sam3 |
| | --- |
| | |
| | # Segment Anything 3 (SAM 3) β ONNX Models |
| |
|
| | ONNX-exported version of Meta's **Segment Anything Model 3 (SAM 3)**, an open-vocabulary segmentation model that accepts **text prompts** in addition to points and rectangles. |
| |
|
| | SAM 3 uses a CLIP-based language encoder to let you describe objects in natural language (e.g., `"truck"`, `"person with hat"`) and segment them without task-specific training. |
| |
|
| | These models are used by **[AnyLabeling](https://github.com/vietanhdev/anylabeling)** for AI-assisted image annotation, and exported by **[samexporter](https://github.com/vietanhdev/samexporter)**. |
| |
|
| | ## Available Models |
| |
|
| | | File | Contents | Description | |
| | |------|----------|-------------| |
| | | `sam3_vit_h.zip` | 3 ONNX files | SAM 3 ViT-H (all components) | |
| |
|
| | The zip contains three ONNX components that work together: |
| |
|
| | | ONNX File | Role | Runs | |
| | |-----------|------|------| |
| | | `sam3_image_encoder.onnx` | Extracts visual features from the input image | Once per image | |
| | | `sam3_language_encoder.onnx` | Encodes text prompt tokens into feature vectors | Once per text query | |
| | | `sam3_decoder.onnx` | Produces segmentation masks given image + language features | Per prompt | |
| |
|
| | ## Prompt Types |
| |
|
| | SAM 3 supports **three prompt modalities**: |
| |
|
| | | Prompt | Description | |
| | |--------|-------------| |
| | | **Text** | Natural-language description, e.g. `"truck"` β unique to SAM 3 | |
| | | **Point** | Click `+point` / `-point` to include/exclude regions | |
| | | **Rectangle** | Draw a bounding box around the target object | |
| |
|
| | Text prompts are the recommended workflow: they drive detection open-vocabulary style, so you can label **any object class** without retraining. |
| |
|
| | ## Use with AnyLabeling (Recommended) |
| |
|
| | [AnyLabeling](https://github.com/vietanhdev/anylabeling) is a desktop annotation tool with a built-in model manager that downloads, caches, and runs these models automatically β no coding required. |
| |
|
| | 1. Install: `pip install anylabeling` |
| | 2. Launch: `anylabeling` |
| | 3. Click the **Brain** button β select **Segment Anything 3 (ViT-H)** from the dropdown |
| | 4. Type a text description (e.g., `truck`) in the text prompt field |
| | 5. Optionally refine with point/rectangle prompts |
| |
|
| | [](https://github.com/vietanhdev/anylabeling) |
| |
|
| | ## Use Programmatically with ONNX Runtime |
| |
|
| | ```python |
| | import urllib.request, zipfile |
| | url = "https://huggingface.co/vietanhdev/segment-anything-3-onnx-models/resolve/main/sam3_vit_h.zip" |
| | urllib.request.urlretrieve(url, "sam3_vit_h.zip") |
| | with zipfile.ZipFile("sam3_vit_h.zip") as z: |
| | z.extractall("sam3") |
| | ``` |
| |
|
| | Then use [samexporter](https://github.com/vietanhdev/samexporter)'s inference module: |
| |
|
| | ```bash |
| | pip install samexporter |
| | |
| | # Text prompt |
| | python -m samexporter.inference \ |
| | --sam_variant sam3 \ |
| | --encoder_model sam3/sam3_image_encoder.onnx \ |
| | --decoder_model sam3/sam3_decoder.onnx \ |
| | --language_encoder_model sam3/sam3_language_encoder.onnx \ |
| | --image photo.jpg \ |
| | --prompt prompt.json \ |
| | --text_prompt "truck" \ |
| | --output result.png |
| | ``` |
| |
|
| | Example `prompt.json` for a text-only query: |
| | ```json |
| | [{"type": "text", "data": "truck"}] |
| | ``` |
| |
|
| | ## Model Architecture |
| |
|
| | SAM 3 follows the same encoder/decoder pattern as SAM and SAM 2, with an added CLIP-based language branch: |
| |
|
| | ``` |
| | Input image βββΊ Image Encoder βββββββββββββββββββββββββββ |
| | βΌ |
| | Text prompt βββΊ Language Encoder βββΊ Decoder βββΊ Masks + Scores + Boxes |
| | β² |
| | Optional: point / box prompts βββββββββββ |
| | ``` |
| |
|
| | The **image encoder** runs once per image and caches features. The **language encoder** runs once per text query. The **decoder** is lightweight and runs interactively for each prompt combination. |
| |
|
| | ## Re-export from Source |
| |
|
| | To re-export or customize the models using [samexporter](https://github.com/vietanhdev/samexporter): |
| |
|
| | ```bash |
| | pip install samexporter |
| | |
| | # Export all three SAM 3 ONNX components |
| | python -m samexporter.export_sam3 --output_dir output_models/sam3 |
| | |
| | # Or use the convenience script: |
| | bash convert_sam3.sh |
| | ``` |
| |
|
| | ## Custom Model Config for AnyLabeling |
| |
|
| | To use a locally re-exported SAM 3 as a custom model in AnyLabeling, create a `config.yaml`: |
| |
|
| | ```yaml |
| | type: segment_anything |
| | name: sam3_vit_h_custom |
| | display_name: Segment Anything 3 (ViT-H) |
| | encoder_model_path: sam3_image_encoder.onnx |
| | decoder_model_path: sam3_decoder.onnx |
| | language_encoder_path: sam3_language_encoder.onnx |
| | input_size: 1008 |
| | max_height: 1008 |
| | max_width: 1008 |
| | ``` |
| |
|
| | Then load it via **Brain button β Load Custom Model** in AnyLabeling. |
| |
|
| | ## Related Repositories |
| |
|
| | | Repo | Description | |
| | |------|-------------| |
| | | [vietanhdev/samexporter](https://github.com/vietanhdev/samexporter) | Export scripts, inference code, conversion tools | |
| | | [vietanhdev/anylabeling](https://github.com/vietanhdev/anylabeling) | Desktop annotation app powered by these models | |
| | | [facebook/sam3](https://huggingface.co/facebook/sam3) | Original SAM 3 PyTorch checkpoint by Meta | |
| |
|
| | ## License |
| |
|
| | The ONNX models are derived from Meta's SAM 3, released under the **[SAM License](https://github.com/facebookresearch/sam3/blob/main/LICENSE)**. |
| | The export code is part of [samexporter](https://github.com/vietanhdev/samexporter), released under the **MIT** license. |