vietanhdev's picture
Update README.md
1b9f061 verified
metadata
license: apache-2.0
tags:
  - image-segmentation
  - segment-anything
  - segment-anything-3
  - open-vocabulary
  - text-to-segmentation
  - onnx
  - onnxruntime
library_name: onnxruntime
base_model:
  - facebook/sam3

Segment Anything 3 (SAM 3) β€” ONNX Models

ONNX-exported version of Meta's Segment Anything Model 3 (SAM 3), an open-vocabulary segmentation model that accepts text prompts in addition to points and rectangles.

SAM 3 uses a CLIP-based language encoder to let you describe objects in natural language (e.g., "truck", "person with hat") and segment them without task-specific training.

These models are used by AnyLabeling for AI-assisted image annotation, and exported by samexporter.

Available Models

File Contents Description
sam3_vit_h.zip 3 ONNX files SAM 3 ViT-H (all components)

The zip contains three ONNX components that work together:

ONNX File Role Runs
sam3_image_encoder.onnx Extracts visual features from the input image Once per image
sam3_language_encoder.onnx Encodes text prompt tokens into feature vectors Once per text query
sam3_decoder.onnx Produces segmentation masks given image + language features Per prompt

Prompt Types

SAM 3 supports three prompt modalities:

Prompt Description
Text Natural-language description, e.g. "truck" β€” unique to SAM 3
Point Click +point / -point to include/exclude regions
Rectangle Draw a bounding box around the target object

Text prompts are the recommended workflow: they drive detection open-vocabulary style, so you can label any object class without retraining.

Use with AnyLabeling (Recommended)

AnyLabeling is a desktop annotation tool with a built-in model manager that downloads, caches, and runs these models automatically β€” no coding required.

  1. Install: pip install anylabeling
  2. Launch: anylabeling
  3. Click the Brain button β†’ select Segment Anything 3 (ViT-H) from the dropdown
  4. Type a text description (e.g., truck) in the text prompt field
  5. Optionally refine with point/rectangle prompts

AnyLabeling demo

Use Programmatically with ONNX Runtime

import urllib.request, zipfile
url = "https://huggingface.co/vietanhdev/segment-anything-3-onnx-models/resolve/main/sam3_vit_h.zip"
urllib.request.urlretrieve(url, "sam3_vit_h.zip")
with zipfile.ZipFile("sam3_vit_h.zip") as z:
    z.extractall("sam3")

Then use samexporter's inference module:

pip install samexporter

# Text prompt
python -m samexporter.inference \
    --sam_variant sam3 \
    --encoder_model sam3/sam3_image_encoder.onnx \
    --decoder_model sam3/sam3_decoder.onnx \
    --language_encoder_model sam3/sam3_language_encoder.onnx \
    --image photo.jpg \
    --prompt prompt.json \
    --text_prompt "truck" \
    --output result.png

Example prompt.json for a text-only query:

[{"type": "text", "data": "truck"}]

Model Architecture

SAM 3 follows the same encoder/decoder pattern as SAM and SAM 2, with an added CLIP-based language branch:

Input image  ──► Image Encoder  ──────────────────────────┐
                                                           β–Ό
Text prompt  ──► Language Encoder ──► Decoder ──► Masks + Scores + Boxes
                                        β–²
Optional: point / box prompts β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The image encoder runs once per image and caches features. The language encoder runs once per text query. The decoder is lightweight and runs interactively for each prompt combination.

Re-export from Source

To re-export or customize the models using samexporter:

pip install samexporter

# Export all three SAM 3 ONNX components
python -m samexporter.export_sam3 --output_dir output_models/sam3

# Or use the convenience script:
bash convert_sam3.sh

Custom Model Config for AnyLabeling

To use a locally re-exported SAM 3 as a custom model in AnyLabeling, create a config.yaml:

type: segment_anything
name: sam3_vit_h_custom
display_name: Segment Anything 3 (ViT-H)
encoder_model_path: sam3_image_encoder.onnx
decoder_model_path: sam3_decoder.onnx
language_encoder_path: sam3_language_encoder.onnx
input_size: 1008
max_height: 1008
max_width: 1008

Then load it via Brain button β†’ Load Custom Model in AnyLabeling.

Related Repositories

Repo Description
vietanhdev/samexporter Export scripts, inference code, conversion tools
vietanhdev/anylabeling Desktop annotation app powered by these models
facebook/sam3 Original SAM 3 PyTorch checkpoint by Meta

License

The ONNX models are derived from Meta's SAM 3, released under the SAM License. The export code is part of samexporter, released under the MIT license.