|
|
--- |
|
|
tags: |
|
|
- image-segmentation |
|
|
- sam |
|
|
- custom-docker |
|
|
license: mit |
|
|
task_categories: |
|
|
- image-segmentation |
|
|
library_name: transformers |
|
|
pipeline_tag: image-segmentation |
|
|
--- |
|
|
|
|
|
# SAM3 - Instance Segmentation for Road Damage Detection |
|
|
|
|
|
SAM3 is an instance segmentation model deployed as a custom Docker container on HuggingFace Inference Endpoints. It detects and segments individual instances of road damage (potholes, cracks) using text prompts. |
|
|
|
|
|
## π Deployment |
|
|
|
|
|
- **GitHub Repository**: https://github.com/logiroad/sam3 |
|
|
- **Inference Endpoint**: https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud |
|
|
- **Docker Registry**: sam3acr4hf.azurecr.io/sam3-hf:latest |
|
|
- **Model**: facebook/sam3 (Sam3Model for static images) |
|
|
- **Hardware**: NVIDIA A10G (24GB VRAM) |
|
|
|
|
|
## π Model Architecture |
|
|
|
|
|
Built on Meta's SAM3 (Segment Anything Model 3) architecture for text-prompted **instance segmentation** of static images. SAM3 detects and segments all individual instances of specified object classes. |
|
|
|
|
|
**Key features**: |
|
|
- Multiple instances per class (e.g., 3 potholes in one image) |
|
|
- Text-based prompting (natural language class names) |
|
|
- High-quality segmentation masks |
|
|
- Confidence scores per instance |
|
|
|
|
|
## π― Usage |
|
|
|
|
|
### Basic Example |
|
|
|
|
|
```python |
|
|
import requests |
|
|
import base64 |
|
|
|
|
|
# Read image |
|
|
with open("road_image.jpg", "rb") as f: |
|
|
image_b64 = base64.b64encode(f.read()).decode() |
|
|
|
|
|
# Call endpoint |
|
|
response = requests.post( |
|
|
"https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud", |
|
|
json={ |
|
|
"inputs": image_b64, |
|
|
"parameters": {"classes": ["Pothole", "Road crack", "Road"]} |
|
|
} |
|
|
) |
|
|
|
|
|
# Get results - RETURNS VARIABLE NUMBER OF INSTANCES |
|
|
instances = response.json() |
|
|
print(f"Detected {len(instances)} instance(s)") |
|
|
|
|
|
for instance in instances: |
|
|
label = instance['label'] |
|
|
score = instance['score'] |
|
|
instance_id = instance['instance_id'] |
|
|
mask_b64 = instance['mask'] |
|
|
|
|
|
print(f"{label} #{instance_id}: confidence={score:.2f}") |
|
|
``` |
|
|
|
|
|
### Response Format |
|
|
|
|
|
The endpoint returns a **list of instances** (NOT one per class): |
|
|
|
|
|
```json |
|
|
[ |
|
|
{ |
|
|
"label": "Pothole", |
|
|
"mask": "iVBORw0KG...", |
|
|
"score": 0.92, |
|
|
"instance_id": 0 |
|
|
}, |
|
|
{ |
|
|
"label": "Pothole", |
|
|
"mask": "iVBORw0KG...", |
|
|
"score": 0.71, |
|
|
"instance_id": 1 |
|
|
}, |
|
|
{ |
|
|
"label": "Road crack", |
|
|
"mask": "iVBORw0KG...", |
|
|
"score": 0.38, |
|
|
"instance_id": 0 |
|
|
}, |
|
|
{ |
|
|
"label": "Road", |
|
|
"mask": "iVBORw0KG...", |
|
|
"score": 0.89, |
|
|
"instance_id": 0 |
|
|
} |
|
|
] |
|
|
``` |
|
|
|
|
|
**Fields**: |
|
|
- `label`: Class name (from input prompts) |
|
|
- `mask`: Base64-encoded PNG mask (grayscale, 0-255) |
|
|
- `score`: Confidence score (0.0-1.0) |
|
|
- `instance_id`: Instance number within the class (0, 1, 2...) |
|
|
|
|
|
### Processing Results |
|
|
|
|
|
```python |
|
|
# Group instances by class |
|
|
from collections import defaultdict |
|
|
|
|
|
instances_by_class = defaultdict(list) |
|
|
for instance in instances: |
|
|
instances_by_class[instance['label']].append(instance) |
|
|
|
|
|
# Count instances per class |
|
|
for cls, insts in instances_by_class.items(): |
|
|
print(f"{cls}: {len(insts)} instance(s)") |
|
|
|
|
|
# Get highest confidence instance per class |
|
|
best_instances = {} |
|
|
for cls, insts in instances_by_class.items(): |
|
|
best = max(insts, key=lambda x: x['score']) |
|
|
best_instances[cls] = best |
|
|
|
|
|
# Decode and visualize masks |
|
|
import base64 |
|
|
from PIL import Image |
|
|
import io |
|
|
|
|
|
for instance in instances: |
|
|
mask_bytes = base64.b64decode(instance['mask']) |
|
|
mask_img = Image.open(io.BytesIO(mask_bytes)) |
|
|
# mask_img is now a PIL Image (grayscale) |
|
|
mask_img.save(f"{instance['label']}_{instance['instance_id']}.png") |
|
|
``` |
|
|
|
|
|
## βοΈ Model Parameters |
|
|
|
|
|
- **Detection threshold**: 0.3 (instances with score < 0.3 are filtered out) |
|
|
- **Mask threshold**: 0.5 (pixel probability threshold for mask generation) |
|
|
- **Max instances**: Up to 200 per image (DETR architecture limit) |
|
|
|
|
|
## π¨ Use Cases |
|
|
|
|
|
**Road Damage Detection**: |
|
|
```python |
|
|
classes = ["Pothole", "Road crack", "Road"] |
|
|
# Detects: multiple potholes, multiple cracks, road surface |
|
|
``` |
|
|
|
|
|
**Traffic Infrastructure**: |
|
|
```python |
|
|
classes = ["Traffic sign", "Traffic light", "Road marking"] |
|
|
# Detects: all signs, all lights, all markings in view |
|
|
``` |
|
|
|
|
|
**General Object Detection**: |
|
|
```python |
|
|
classes = ["car", "person", "bicycle"] |
|
|
# Detects: all cars, all people, all bicycles |
|
|
``` |
|
|
|
|
|
## π¦ Deployment |
|
|
|
|
|
This model is deployed using a custom Docker image. See the [GitHub repository](https://github.com/logiroad/sam3) for full documentation and deployment instructions. |
|
|
|
|
|
## π License |
|
|
|
|
|
MIT License. This deployment uses Meta's SAM3 model - see the [facebook/sam3 model card](https://huggingface.co/facebook/sam3) for model license information. |
|
|
|
|
|
## π Resources |
|
|
|
|
|
- **Paper**: [SAM 3: Segment Anything with Concepts](https://ai.meta.com/research/publications/sam-3/) |
|
|
- **Full Documentation**: [GitHub Repository](https://github.com/logiroad/sam3) |
|
|
- **Endpoint Console**: [HuggingFace Endpoints](https://ui.endpoints.huggingface.co/Logiroad/endpoints/sam3-segmentation) |
|
|
|