File size: 4,947 Bytes
39593e0
 
 
 
 
 
 
 
 
 
 
647f69c
d032bfc
647f69c
d032bfc
de95768
39593e0
81da345
39593e0
 
 
 
 
81da345
39593e0
81da345
d032bfc
 
 
 
 
 
 
81da345
39593e0
de95768
d032bfc
 
647f69c
 
 
81da345
39593e0
d032bfc
647f69c
81da345
39593e0
647f69c
b2e88b8
647f69c
 
d032bfc
647f69c
 
 
d032bfc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
647f69c
 
39593e0
647f69c
39593e0
647f69c
 
 
39593e0
647f69c
39593e0
647f69c
39593e0
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
---
tags:
- image-segmentation
- sam
- custom-docker
license: mit
task_categories:
- image-segmentation
library_name: transformers
pipeline_tag: image-segmentation
---

# SAM3 - Instance Segmentation for Road Damage Detection

SAM3 is an instance segmentation model deployed as a custom Docker container on HuggingFace Inference Endpoints. It detects and segments individual instances of road damage (potholes, cracks) using text prompts.

## πŸš€ Deployment

- **GitHub Repository**: https://github.com/logiroad/sam3
- **Inference Endpoint**: https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud
- **Docker Registry**: sam3acr4hf.azurecr.io/sam3-hf:latest
- **Model**: facebook/sam3 (Sam3Model for static images)
- **Hardware**: NVIDIA A10G (24GB VRAM)

## πŸ“Š Model Architecture

Built on Meta's SAM3 (Segment Anything Model 3) architecture for text-prompted **instance segmentation** of static images. SAM3 detects and segments all individual instances of specified object classes.

**Key features**:
- Multiple instances per class (e.g., 3 potholes in one image)
- Text-based prompting (natural language class names)
- High-quality segmentation masks
- Confidence scores per instance

## 🎯 Usage

### Basic Example

```python
import requests
import base64

# Read image
with open("road_image.jpg", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

# Call endpoint
response = requests.post(
    "https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud",
    json={
        "inputs": image_b64,
        "parameters": {"classes": ["Pothole", "Road crack", "Road"]}
    }
)

# Get results - RETURNS VARIABLE NUMBER OF INSTANCES
instances = response.json()
print(f"Detected {len(instances)} instance(s)")

for instance in instances:
    label = instance['label']
    score = instance['score']
    instance_id = instance['instance_id']
    mask_b64 = instance['mask']

    print(f"{label} #{instance_id}: confidence={score:.2f}")
```

### Response Format

The endpoint returns a **list of instances** (NOT one per class):

```json
[
  {
    "label": "Pothole",
    "mask": "iVBORw0KG...",
    "score": 0.92,
    "instance_id": 0
  },
  {
    "label": "Pothole",
    "mask": "iVBORw0KG...",
    "score": 0.71,
    "instance_id": 1
  },
  {
    "label": "Road crack",
    "mask": "iVBORw0KG...",
    "score": 0.38,
    "instance_id": 0
  },
  {
    "label": "Road",
    "mask": "iVBORw0KG...",
    "score": 0.89,
    "instance_id": 0
  }
]
```

**Fields**:
- `label`: Class name (from input prompts)
- `mask`: Base64-encoded PNG mask (grayscale, 0-255)
- `score`: Confidence score (0.0-1.0)
- `instance_id`: Instance number within the class (0, 1, 2...)

### Processing Results

```python
# Group instances by class
from collections import defaultdict

instances_by_class = defaultdict(list)
for instance in instances:
    instances_by_class[instance['label']].append(instance)

# Count instances per class
for cls, insts in instances_by_class.items():
    print(f"{cls}: {len(insts)} instance(s)")

# Get highest confidence instance per class
best_instances = {}
for cls, insts in instances_by_class.items():
    best = max(insts, key=lambda x: x['score'])
    best_instances[cls] = best

# Decode and visualize masks
import base64
from PIL import Image
import io

for instance in instances:
    mask_bytes = base64.b64decode(instance['mask'])
    mask_img = Image.open(io.BytesIO(mask_bytes))
    # mask_img is now a PIL Image (grayscale)
    mask_img.save(f"{instance['label']}_{instance['instance_id']}.png")
```

## βš™οΈ Model Parameters

- **Detection threshold**: 0.3 (instances with score < 0.3 are filtered out)
- **Mask threshold**: 0.5 (pixel probability threshold for mask generation)
- **Max instances**: Up to 200 per image (DETR architecture limit)

## 🎨 Use Cases

**Road Damage Detection**:
```python
classes = ["Pothole", "Road crack", "Road"]
# Detects: multiple potholes, multiple cracks, road surface
```

**Traffic Infrastructure**:
```python
classes = ["Traffic sign", "Traffic light", "Road marking"]
# Detects: all signs, all lights, all markings in view
```

**General Object Detection**:
```python
classes = ["car", "person", "bicycle"]
# Detects: all cars, all people, all bicycles
```

## πŸ“¦ Deployment

This model is deployed using a custom Docker image. See the [GitHub repository](https://github.com/logiroad/sam3) for full documentation and deployment instructions.

## πŸ“„ License

MIT License. This deployment uses Meta's SAM3 model - see the [facebook/sam3 model card](https://huggingface.co/facebook/sam3) for model license information.

## πŸ”— Resources

- **Paper**: [SAM 3: Segment Anything with Concepts](https://ai.meta.com/research/publications/sam-3/)
- **Full Documentation**: [GitHub Repository](https://github.com/logiroad/sam3)
- **Endpoint Console**: [HuggingFace Endpoints](https://ui.endpoints.huggingface.co/Logiroad/endpoints/sam3-segmentation)