vietanhdev's picture
Update README.md
1b9f061 verified
---
license: apache-2.0
tags:
- image-segmentation
- segment-anything
- segment-anything-3
- open-vocabulary
- text-to-segmentation
- onnx
- onnxruntime
library_name: onnxruntime
base_model:
- facebook/sam3
---
# Segment Anything 3 (SAM 3) β€” ONNX Models
ONNX-exported version of Meta's **Segment Anything Model 3 (SAM 3)**, an open-vocabulary segmentation model that accepts **text prompts** in addition to points and rectangles.
SAM 3 uses a CLIP-based language encoder to let you describe objects in natural language (e.g., `"truck"`, `"person with hat"`) and segment them without task-specific training.
These models are used by **[AnyLabeling](https://github.com/vietanhdev/anylabeling)** for AI-assisted image annotation, and exported by **[samexporter](https://github.com/vietanhdev/samexporter)**.
## Available Models
| File | Contents | Description |
|------|----------|-------------|
| `sam3_vit_h.zip` | 3 ONNX files | SAM 3 ViT-H (all components) |
The zip contains three ONNX components that work together:
| ONNX File | Role | Runs |
|-----------|------|------|
| `sam3_image_encoder.onnx` | Extracts visual features from the input image | Once per image |
| `sam3_language_encoder.onnx` | Encodes text prompt tokens into feature vectors | Once per text query |
| `sam3_decoder.onnx` | Produces segmentation masks given image + language features | Per prompt |
## Prompt Types
SAM 3 supports **three prompt modalities**:
| Prompt | Description |
|--------|-------------|
| **Text** | Natural-language description, e.g. `"truck"` β€” unique to SAM 3 |
| **Point** | Click `+point` / `-point` to include/exclude regions |
| **Rectangle** | Draw a bounding box around the target object |
Text prompts are the recommended workflow: they drive detection open-vocabulary style, so you can label **any object class** without retraining.
## Use with AnyLabeling (Recommended)
[AnyLabeling](https://github.com/vietanhdev/anylabeling) is a desktop annotation tool with a built-in model manager that downloads, caches, and runs these models automatically β€” no coding required.
1. Install: `pip install anylabeling`
2. Launch: `anylabeling`
3. Click the **Brain** button β†’ select **Segment Anything 3 (ViT-H)** from the dropdown
4. Type a text description (e.g., `truck`) in the text prompt field
5. Optionally refine with point/rectangle prompts
[![AnyLabeling demo](https://user-images.githubusercontent.com/18329471/236625792-07f01838-3f69-48b0-a12e-30bad27bd921.gif)](https://github.com/vietanhdev/anylabeling)
## Use Programmatically with ONNX Runtime
```python
import urllib.request, zipfile
url = "https://huggingface.co/vietanhdev/segment-anything-3-onnx-models/resolve/main/sam3_vit_h.zip"
urllib.request.urlretrieve(url, "sam3_vit_h.zip")
with zipfile.ZipFile("sam3_vit_h.zip") as z:
z.extractall("sam3")
```
Then use [samexporter](https://github.com/vietanhdev/samexporter)'s inference module:
```bash
pip install samexporter
# Text prompt
python -m samexporter.inference \
--sam_variant sam3 \
--encoder_model sam3/sam3_image_encoder.onnx \
--decoder_model sam3/sam3_decoder.onnx \
--language_encoder_model sam3/sam3_language_encoder.onnx \
--image photo.jpg \
--prompt prompt.json \
--text_prompt "truck" \
--output result.png
```
Example `prompt.json` for a text-only query:
```json
[{"type": "text", "data": "truck"}]
```
## Model Architecture
SAM 3 follows the same encoder/decoder pattern as SAM and SAM 2, with an added CLIP-based language branch:
```
Input image ──► Image Encoder ──────────────────────────┐
β–Ό
Text prompt ──► Language Encoder ──► Decoder ──► Masks + Scores + Boxes
β–²
Optional: point / box prompts β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
The **image encoder** runs once per image and caches features. The **language encoder** runs once per text query. The **decoder** is lightweight and runs interactively for each prompt combination.
## Re-export from Source
To re-export or customize the models using [samexporter](https://github.com/vietanhdev/samexporter):
```bash
pip install samexporter
# Export all three SAM 3 ONNX components
python -m samexporter.export_sam3 --output_dir output_models/sam3
# Or use the convenience script:
bash convert_sam3.sh
```
## Custom Model Config for AnyLabeling
To use a locally re-exported SAM 3 as a custom model in AnyLabeling, create a `config.yaml`:
```yaml
type: segment_anything
name: sam3_vit_h_custom
display_name: Segment Anything 3 (ViT-H)
encoder_model_path: sam3_image_encoder.onnx
decoder_model_path: sam3_decoder.onnx
language_encoder_path: sam3_language_encoder.onnx
input_size: 1008
max_height: 1008
max_width: 1008
```
Then load it via **Brain button β†’ Load Custom Model** in AnyLabeling.
## Related Repositories
| Repo | Description |
|------|-------------|
| [vietanhdev/samexporter](https://github.com/vietanhdev/samexporter) | Export scripts, inference code, conversion tools |
| [vietanhdev/anylabeling](https://github.com/vietanhdev/anylabeling) | Desktop annotation app powered by these models |
| [facebook/sam3](https://huggingface.co/facebook/sam3) | Original SAM 3 PyTorch checkpoint by Meta |
## License
The ONNX models are derived from Meta's SAM 3, released under the **[SAM License](https://github.com/facebookresearch/sam3/blob/main/LICENSE)**.
The export code is part of [samexporter](https://github.com/vietanhdev/samexporter), released under the **MIT** license.