File size: 4,947 Bytes
39593e0 647f69c d032bfc 647f69c d032bfc de95768 39593e0 81da345 39593e0 81da345 39593e0 81da345 d032bfc 81da345 39593e0 de95768 d032bfc 647f69c 81da345 39593e0 d032bfc 647f69c 81da345 39593e0 647f69c b2e88b8 647f69c d032bfc 647f69c d032bfc 647f69c 39593e0 647f69c 39593e0 647f69c 39593e0 647f69c 39593e0 647f69c 39593e0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
---
tags:
- image-segmentation
- sam
- custom-docker
license: mit
task_categories:
- image-segmentation
library_name: transformers
pipeline_tag: image-segmentation
---
# SAM3 - Instance Segmentation for Road Damage Detection
SAM3 is an instance segmentation model deployed as a custom Docker container on HuggingFace Inference Endpoints. It detects and segments individual instances of road damage (potholes, cracks) using text prompts.
## π Deployment
- **GitHub Repository**: https://github.com/logiroad/sam3
- **Inference Endpoint**: https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud
- **Docker Registry**: sam3acr4hf.azurecr.io/sam3-hf:latest
- **Model**: facebook/sam3 (Sam3Model for static images)
- **Hardware**: NVIDIA A10G (24GB VRAM)
## π Model Architecture
Built on Meta's SAM3 (Segment Anything Model 3) architecture for text-prompted **instance segmentation** of static images. SAM3 detects and segments all individual instances of specified object classes.
**Key features**:
- Multiple instances per class (e.g., 3 potholes in one image)
- Text-based prompting (natural language class names)
- High-quality segmentation masks
- Confidence scores per instance
## π― Usage
### Basic Example
```python
import requests
import base64
# Read image
with open("road_image.jpg", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
# Call endpoint
response = requests.post(
"https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud",
json={
"inputs": image_b64,
"parameters": {"classes": ["Pothole", "Road crack", "Road"]}
}
)
# Get results - RETURNS VARIABLE NUMBER OF INSTANCES
instances = response.json()
print(f"Detected {len(instances)} instance(s)")
for instance in instances:
label = instance['label']
score = instance['score']
instance_id = instance['instance_id']
mask_b64 = instance['mask']
print(f"{label} #{instance_id}: confidence={score:.2f}")
```
### Response Format
The endpoint returns a **list of instances** (NOT one per class):
```json
[
{
"label": "Pothole",
"mask": "iVBORw0KG...",
"score": 0.92,
"instance_id": 0
},
{
"label": "Pothole",
"mask": "iVBORw0KG...",
"score": 0.71,
"instance_id": 1
},
{
"label": "Road crack",
"mask": "iVBORw0KG...",
"score": 0.38,
"instance_id": 0
},
{
"label": "Road",
"mask": "iVBORw0KG...",
"score": 0.89,
"instance_id": 0
}
]
```
**Fields**:
- `label`: Class name (from input prompts)
- `mask`: Base64-encoded PNG mask (grayscale, 0-255)
- `score`: Confidence score (0.0-1.0)
- `instance_id`: Instance number within the class (0, 1, 2...)
### Processing Results
```python
# Group instances by class
from collections import defaultdict
instances_by_class = defaultdict(list)
for instance in instances:
instances_by_class[instance['label']].append(instance)
# Count instances per class
for cls, insts in instances_by_class.items():
print(f"{cls}: {len(insts)} instance(s)")
# Get highest confidence instance per class
best_instances = {}
for cls, insts in instances_by_class.items():
best = max(insts, key=lambda x: x['score'])
best_instances[cls] = best
# Decode and visualize masks
import base64
from PIL import Image
import io
for instance in instances:
mask_bytes = base64.b64decode(instance['mask'])
mask_img = Image.open(io.BytesIO(mask_bytes))
# mask_img is now a PIL Image (grayscale)
mask_img.save(f"{instance['label']}_{instance['instance_id']}.png")
```
## βοΈ Model Parameters
- **Detection threshold**: 0.3 (instances with score < 0.3 are filtered out)
- **Mask threshold**: 0.5 (pixel probability threshold for mask generation)
- **Max instances**: Up to 200 per image (DETR architecture limit)
## π¨ Use Cases
**Road Damage Detection**:
```python
classes = ["Pothole", "Road crack", "Road"]
# Detects: multiple potholes, multiple cracks, road surface
```
**Traffic Infrastructure**:
```python
classes = ["Traffic sign", "Traffic light", "Road marking"]
# Detects: all signs, all lights, all markings in view
```
**General Object Detection**:
```python
classes = ["car", "person", "bicycle"]
# Detects: all cars, all people, all bicycles
```
## π¦ Deployment
This model is deployed using a custom Docker image. See the [GitHub repository](https://github.com/logiroad/sam3) for full documentation and deployment instructions.
## π License
MIT License. This deployment uses Meta's SAM3 model - see the [facebook/sam3 model card](https://huggingface.co/facebook/sam3) for model license information.
## π Resources
- **Paper**: [SAM 3: Segment Anything with Concepts](https://ai.meta.com/research/publications/sam-3/)
- **Full Documentation**: [GitHub Repository](https://github.com/logiroad/sam3)
- **Endpoint Console**: [HuggingFace Endpoints](https://ui.endpoints.huggingface.co/Logiroad/endpoints/sam3-segmentation)
|