--- license: apache-2.0 tags: - image-segmentation - segment-anything - segment-anything-3 - open-vocabulary - text-to-segmentation - onnx - onnxruntime library_name: onnxruntime base_model: - facebook/sam3 --- # Segment Anything 3 (SAM 3) — ONNX Models ONNX-exported version of Meta's **Segment Anything Model 3 (SAM 3)**, an open-vocabulary segmentation model that accepts **text prompts** in addition to points and rectangles. SAM 3 uses a CLIP-based language encoder to let you describe objects in natural language (e.g., `"truck"`, `"person with hat"`) and segment them without task-specific training. These models are used by **[AnyLabeling](https://github.com/vietanhdev/anylabeling)** for AI-assisted image annotation, and exported by **[samexporter](https://github.com/vietanhdev/samexporter)**. ## Available Models | File | Contents | Description | |------|----------|-------------| | `sam3_vit_h.zip` | 3 ONNX files | SAM 3 ViT-H (all components) | The zip contains three ONNX components that work together: | ONNX File | Role | Runs | |-----------|------|------| | `sam3_image_encoder.onnx` | Extracts visual features from the input image | Once per image | | `sam3_language_encoder.onnx` | Encodes text prompt tokens into feature vectors | Once per text query | | `sam3_decoder.onnx` | Produces segmentation masks given image + language features | Per prompt | ## Prompt Types SAM 3 supports **three prompt modalities**: | Prompt | Description | |--------|-------------| | **Text** | Natural-language description, e.g. `"truck"` — unique to SAM 3 | | **Point** | Click `+point` / `-point` to include/exclude regions | | **Rectangle** | Draw a bounding box around the target object | Text prompts are the recommended workflow: they drive detection open-vocabulary style, so you can label **any object class** without retraining. ## Use with AnyLabeling (Recommended) [AnyLabeling](https://github.com/vietanhdev/anylabeling) is a desktop annotation tool with a built-in model manager that downloads, caches, and runs these models automatically — no coding required. 1. Install: `pip install anylabeling` 2. Launch: `anylabeling` 3. Click the **Brain** button → select **Segment Anything 3 (ViT-H)** from the dropdown 4. Type a text description (e.g., `truck`) in the text prompt field 5. Optionally refine with point/rectangle prompts [![AnyLabeling demo](https://user-images.githubusercontent.com/18329471/236625792-07f01838-3f69-48b0-a12e-30bad27bd921.gif)](https://github.com/vietanhdev/anylabeling) ## Use Programmatically with ONNX Runtime ```python import urllib.request, zipfile url = "https://huggingface.co/vietanhdev/segment-anything-3-onnx-models/resolve/main/sam3_vit_h.zip" urllib.request.urlretrieve(url, "sam3_vit_h.zip") with zipfile.ZipFile("sam3_vit_h.zip") as z: z.extractall("sam3") ``` Then use [samexporter](https://github.com/vietanhdev/samexporter)'s inference module: ```bash pip install samexporter # Text prompt python -m samexporter.inference \ --sam_variant sam3 \ --encoder_model sam3/sam3_image_encoder.onnx \ --decoder_model sam3/sam3_decoder.onnx \ --language_encoder_model sam3/sam3_language_encoder.onnx \ --image photo.jpg \ --prompt prompt.json \ --text_prompt "truck" \ --output result.png ``` Example `prompt.json` for a text-only query: ```json [{"type": "text", "data": "truck"}] ``` ## Model Architecture SAM 3 follows the same encoder/decoder pattern as SAM and SAM 2, with an added CLIP-based language branch: ``` Input image ──► Image Encoder ──────────────────────────┐ ▼ Text prompt ──► Language Encoder ──► Decoder ──► Masks + Scores + Boxes ▲ Optional: point / box prompts ──────────┘ ``` The **image encoder** runs once per image and caches features. The **language encoder** runs once per text query. The **decoder** is lightweight and runs interactively for each prompt combination. ## Re-export from Source To re-export or customize the models using [samexporter](https://github.com/vietanhdev/samexporter): ```bash pip install samexporter # Export all three SAM 3 ONNX components python -m samexporter.export_sam3 --output_dir output_models/sam3 # Or use the convenience script: bash convert_sam3.sh ``` ## Custom Model Config for AnyLabeling To use a locally re-exported SAM 3 as a custom model in AnyLabeling, create a `config.yaml`: ```yaml type: segment_anything name: sam3_vit_h_custom display_name: Segment Anything 3 (ViT-H) encoder_model_path: sam3_image_encoder.onnx decoder_model_path: sam3_decoder.onnx language_encoder_path: sam3_language_encoder.onnx input_size: 1008 max_height: 1008 max_width: 1008 ``` Then load it via **Brain button → Load Custom Model** in AnyLabeling. ## Related Repositories | Repo | Description | |------|-------------| | [vietanhdev/samexporter](https://github.com/vietanhdev/samexporter) | Export scripts, inference code, conversion tools | | [vietanhdev/anylabeling](https://github.com/vietanhdev/anylabeling) | Desktop annotation app powered by these models | | [facebook/sam3](https://huggingface.co/facebook/sam3) | Original SAM 3 PyTorch checkpoint by Meta | ## License The ONNX models are derived from Meta's SAM 3, released under the **[SAM License](https://github.com/facebookresearch/sam3/blob/main/LICENSE)**. The export code is part of [samexporter](https://github.com/vietanhdev/samexporter), released under the **MIT** license.