sam3 / README.md
Thibaut's picture
Fix SAM3 instance segmentation and update documentation
d032bfc
---
tags:
- image-segmentation
- sam
- custom-docker
license: mit
task_categories:
- image-segmentation
library_name: transformers
pipeline_tag: image-segmentation
---
# SAM3 - Instance Segmentation for Road Damage Detection
SAM3 is an instance segmentation model deployed as a custom Docker container on HuggingFace Inference Endpoints. It detects and segments individual instances of road damage (potholes, cracks) using text prompts.
## πŸš€ Deployment
- **GitHub Repository**: https://github.com/logiroad/sam3
- **Inference Endpoint**: https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud
- **Docker Registry**: sam3acr4hf.azurecr.io/sam3-hf:latest
- **Model**: facebook/sam3 (Sam3Model for static images)
- **Hardware**: NVIDIA A10G (24GB VRAM)
## πŸ“Š Model Architecture
Built on Meta's SAM3 (Segment Anything Model 3) architecture for text-prompted **instance segmentation** of static images. SAM3 detects and segments all individual instances of specified object classes.
**Key features**:
- Multiple instances per class (e.g., 3 potholes in one image)
- Text-based prompting (natural language class names)
- High-quality segmentation masks
- Confidence scores per instance
## 🎯 Usage
### Basic Example
```python
import requests
import base64
# Read image
with open("road_image.jpg", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
# Call endpoint
response = requests.post(
"https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud",
json={
"inputs": image_b64,
"parameters": {"classes": ["Pothole", "Road crack", "Road"]}
}
)
# Get results - RETURNS VARIABLE NUMBER OF INSTANCES
instances = response.json()
print(f"Detected {len(instances)} instance(s)")
for instance in instances:
label = instance['label']
score = instance['score']
instance_id = instance['instance_id']
mask_b64 = instance['mask']
print(f"{label} #{instance_id}: confidence={score:.2f}")
```
### Response Format
The endpoint returns a **list of instances** (NOT one per class):
```json
[
{
"label": "Pothole",
"mask": "iVBORw0KG...",
"score": 0.92,
"instance_id": 0
},
{
"label": "Pothole",
"mask": "iVBORw0KG...",
"score": 0.71,
"instance_id": 1
},
{
"label": "Road crack",
"mask": "iVBORw0KG...",
"score": 0.38,
"instance_id": 0
},
{
"label": "Road",
"mask": "iVBORw0KG...",
"score": 0.89,
"instance_id": 0
}
]
```
**Fields**:
- `label`: Class name (from input prompts)
- `mask`: Base64-encoded PNG mask (grayscale, 0-255)
- `score`: Confidence score (0.0-1.0)
- `instance_id`: Instance number within the class (0, 1, 2...)
### Processing Results
```python
# Group instances by class
from collections import defaultdict
instances_by_class = defaultdict(list)
for instance in instances:
instances_by_class[instance['label']].append(instance)
# Count instances per class
for cls, insts in instances_by_class.items():
print(f"{cls}: {len(insts)} instance(s)")
# Get highest confidence instance per class
best_instances = {}
for cls, insts in instances_by_class.items():
best = max(insts, key=lambda x: x['score'])
best_instances[cls] = best
# Decode and visualize masks
import base64
from PIL import Image
import io
for instance in instances:
mask_bytes = base64.b64decode(instance['mask'])
mask_img = Image.open(io.BytesIO(mask_bytes))
# mask_img is now a PIL Image (grayscale)
mask_img.save(f"{instance['label']}_{instance['instance_id']}.png")
```
## βš™οΈ Model Parameters
- **Detection threshold**: 0.3 (instances with score < 0.3 are filtered out)
- **Mask threshold**: 0.5 (pixel probability threshold for mask generation)
- **Max instances**: Up to 200 per image (DETR architecture limit)
## 🎨 Use Cases
**Road Damage Detection**:
```python
classes = ["Pothole", "Road crack", "Road"]
# Detects: multiple potholes, multiple cracks, road surface
```
**Traffic Infrastructure**:
```python
classes = ["Traffic sign", "Traffic light", "Road marking"]
# Detects: all signs, all lights, all markings in view
```
**General Object Detection**:
```python
classes = ["car", "person", "bicycle"]
# Detects: all cars, all people, all bicycles
```
## πŸ“¦ Deployment
This model is deployed using a custom Docker image. See the [GitHub repository](https://github.com/logiroad/sam3) for full documentation and deployment instructions.
## πŸ“„ License
MIT License. This deployment uses Meta's SAM3 model - see the [facebook/sam3 model card](https://huggingface.co/facebook/sam3) for model license information.
## πŸ”— Resources
- **Paper**: [SAM 3: Segment Anything with Concepts](https://ai.meta.com/research/publications/sam-3/)
- **Full Documentation**: [GitHub Repository](https://github.com/logiroad/sam3)
- **Endpoint Console**: [HuggingFace Endpoints](https://ui.endpoints.huggingface.co/Logiroad/endpoints/sam3-segmentation)