Update README.md

1b9f061 verified 4 days ago

5.64 kB

	---
	license: apache-2.0
	tags:
	- image-segmentation
	- segment-anything
	- segment-anything-3
	- open-vocabulary
	- text-to-segmentation
	- onnx
	- onnxruntime
	library_name: onnxruntime
	base_model:
	- facebook/sam3
	---

	# Segment Anything 3 (SAM 3) — ONNX Models

	ONNX-exported version of Meta's Segment Anything Model 3 (SAM 3), an open-vocabulary segmentation model that accepts text prompts in addition to points and rectangles.

	SAM 3 uses a CLIP-based language encoder to let you describe objects in natural language (e.g., `"truck"`, `"person with hat"`) and segment them without task-specific training.

	These models are used by [AnyLabeling](https://github.com/vietanhdev/anylabeling) for AI-assisted image annotation, and exported by [samexporter](https://github.com/vietanhdev/samexporter).

	## Available Models

	\| File \| Contents \| Description \|
	\|------\|----------\|-------------\|
	\| `sam3_vit_h.zip` \| 3 ONNX files \| SAM 3 ViT-H (all components) \|

	The zip contains three ONNX components that work together:

	\| ONNX File \| Role \| Runs \|
	\|-----------\|------\|------\|
	\| `sam3_image_encoder.onnx` \| Extracts visual features from the input image \| Once per image \|
	\| `sam3_language_encoder.onnx` \| Encodes text prompt tokens into feature vectors \| Once per text query \|
	\| `sam3_decoder.onnx` \| Produces segmentation masks given image + language features \| Per prompt \|

	## Prompt Types

	SAM 3 supports three prompt modalities:

	\| Prompt \| Description \|
	\|--------\|-------------\|
	\| Text \| Natural-language description, e.g. `"truck"` — unique to SAM 3 \|
	\| Point \| Click `+point` / `-point` to include/exclude regions \|
	\| Rectangle \| Draw a bounding box around the target object \|

	Text prompts are the recommended workflow: they drive detection open-vocabulary style, so you can label any object class without retraining.

	## Use with AnyLabeling (Recommended)

	[AnyLabeling](https://github.com/vietanhdev/anylabeling) is a desktop annotation tool with a built-in model manager that downloads, caches, and runs these models automatically — no coding required.

	1. Install: `pip install anylabeling`
	2. Launch: `anylabeling`
	3. Click the Brain button → select Segment Anything 3 (ViT-H) from the dropdown
	4. Type a text description (e.g., `truck`) in the text prompt field
	5. Optionally refine with point/rectangle prompts

	[![AnyLabeling demo](https://user-images.githubusercontent.com/18329471/236625792-07f01838-3f69-48b0-a12e-30bad27bd921.gif)](https://github.com/vietanhdev/anylabeling)

	## Use Programmatically with ONNX Runtime

	```python
	import urllib.request, zipfile
	url = "https://huggingface.co/vietanhdev/segment-anything-3-onnx-models/resolve/main/sam3_vit_h.zip"
	urllib.request.urlretrieve(url, "sam3_vit_h.zip")
	with zipfile.ZipFile("sam3_vit_h.zip") as z:
	z.extractall("sam3")
	```

	Then use [samexporter](https://github.com/vietanhdev/samexporter)'s inference module:

	```bash
	pip install samexporter

	# Text prompt
	python -m samexporter.inference \
	--sam_variant sam3 \
	--encoder_model sam3/sam3_image_encoder.onnx \
	--decoder_model sam3/sam3_decoder.onnx \
	--language_encoder_model sam3/sam3_language_encoder.onnx \
	--image photo.jpg \
	--prompt prompt.json \
	--text_prompt "truck" \
	--output result.png
	```

	Example `prompt.json` for a text-only query:
	```json
	[{"type": "text", "data": "truck"}]
	```

	## Model Architecture

	SAM 3 follows the same encoder/decoder pattern as SAM and SAM 2, with an added CLIP-based language branch:

	```
	Input image ──► Image Encoder ──────────────────────────┐
	▼
	Text prompt ──► Language Encoder ──► Decoder ──► Masks + Scores + Boxes
	▲
	Optional: point / box prompts ──────────┘
	```

	The image encoder runs once per image and caches features. The language encoder runs once per text query. The decoder is lightweight and runs interactively for each prompt combination.

	## Re-export from Source

	To re-export or customize the models using [samexporter](https://github.com/vietanhdev/samexporter):

	```bash
	pip install samexporter

	# Export all three SAM 3 ONNX components
	python -m samexporter.export_sam3 --output_dir output_models/sam3

	# Or use the convenience script:
	bash convert_sam3.sh
	```

	## Custom Model Config for AnyLabeling

	To use a locally re-exported SAM 3 as a custom model in AnyLabeling, create a `config.yaml`:

	```yaml
	type: segment_anything
	name: sam3_vit_h_custom
	display_name: Segment Anything 3 (ViT-H)
	encoder_model_path: sam3_image_encoder.onnx
	decoder_model_path: sam3_decoder.onnx
	language_encoder_path: sam3_language_encoder.onnx
	input_size: 1008
	max_height: 1008
	max_width: 1008
	```

	Then load it via Brain button → Load Custom Model in AnyLabeling.

	## Related Repositories

	\| Repo \| Description \|
	\|------\|-------------\|
	\| [vietanhdev/samexporter](https://github.com/vietanhdev/samexporter) \| Export scripts, inference code, conversion tools \|
	\| [vietanhdev/anylabeling](https://github.com/vietanhdev/anylabeling) \| Desktop annotation app powered by these models \|
	\| [facebook/sam3](https://huggingface.co/facebook/sam3) \| Original SAM 3 PyTorch checkpoint by Meta \|

	## License

	The ONNX models are derived from Meta's SAM 3, released under the [SAM License](https://github.com/facebookresearch/sam3/blob/main/LICENSE).
	The export code is part of [samexporter](https://github.com/vietanhdev/samexporter), released under the MIT license.