sam3

File size: 4,947 Bytes

---
tags:
- image-segmentation
- sam
- custom-docker
license: mit
task_categories:
- image-segmentation
library_name: transformers
pipeline_tag: image-segmentation
---

# SAM3 - Instance Segmentation for Road Damage Detection

SAM3 is an instance segmentation model deployed as a custom Docker container on HuggingFace Inference Endpoints. It detects and segments individual instances of road damage (potholes, cracks) using text prompts.

## 🚀 Deployment

- **GitHub Repository**: https://github.com/logiroad/sam3
- **Inference Endpoint**: https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud
- **Docker Registry**: sam3acr4hf.azurecr.io/sam3-hf:latest
- **Model**: facebook/sam3 (Sam3Model for static images)
- **Hardware**: NVIDIA A10G (24GB VRAM)

## 📊 Model Architecture

Built on Meta's SAM3 (Segment Anything Model 3) architecture for text-prompted **instance segmentation** of static images. SAM3 detects and segments all individual instances of specified object classes.

**Key features**:
- Multiple instances per class (e.g., 3 potholes in one image)
- Text-based prompting (natural language class names)
- High-quality segmentation masks
- Confidence scores per instance

## 🎯 Usage

### Basic Example

```python
import requests
import base64

# Read image
with open("road_image.jpg", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

# Call endpoint
response = requests.post(
    "https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud",
    json={
        "inputs": image_b64,
        "parameters": {"classes": ["Pothole", "Road crack", "Road"]}
    }
)

# Get results - RETURNS VARIABLE NUMBER OF INSTANCES
instances = response.json()
print(f"Detected {len(instances)} instance(s)")

for instance in instances:
    label = instance['label']
    score = instance['score']
    instance_id = instance['instance_id']
    mask_b64 = instance['mask']

    print(f"{label} #{instance_id}: confidence={score:.2f}")
```

### Response Format

The endpoint returns a **list of instances** (NOT one per class):

```json
[
  {
    "label": "Pothole",
    "mask": "iVBORw0KG...",
    "score": 0.92,
    "instance_id": 0
  },
  {
    "label": "Pothole",
    "mask": "iVBORw0KG...",
    "score": 0.71,
    "instance_id": 1
  },
  {
    "label": "Road crack",
    "mask": "iVBORw0KG...",
    "score": 0.38,
    "instance_id": 0
  },
  {
    "label": "Road",
    "mask": "iVBORw0KG...",
    "score": 0.89,
    "instance_id": 0
  }
]
```

**Fields**:
- `label`: Class name (from input prompts)
- `mask`: Base64-encoded PNG mask (grayscale, 0-255)
- `score`: Confidence score (0.0-1.0)
- `instance_id`: Instance number within the class (0, 1, 2...)

### Processing Results

```python
# Group instances by class
from collections import defaultdict

instances_by_class = defaultdict(list)
for instance in instances:
    instances_by_class[instance['label']].append(instance)

# Count instances per class
for cls, insts in instances_by_class.items():
    print(f"{cls}: {len(insts)} instance(s)")

# Get highest confidence instance per class
best_instances = {}
for cls, insts in instances_by_class.items():
    best = max(insts, key=lambda x: x['score'])
    best_instances[cls] = best

# Decode and visualize masks
import base64
from PIL import Image
import io

for instance in instances:
    mask_bytes = base64.b64decode(instance['mask'])
    mask_img = Image.open(io.BytesIO(mask_bytes))
    # mask_img is now a PIL Image (grayscale)
    mask_img.save(f"{instance['label']}_{instance['instance_id']}.png")
```

## ⚙️ Model Parameters

- **Detection threshold**: 0.3 (instances with score < 0.3 are filtered out)
- **Mask threshold**: 0.5 (pixel probability threshold for mask generation)
- **Max instances**: Up to 200 per image (DETR architecture limit)

## 🎨 Use Cases

**Road Damage Detection**:
```python
classes = ["Pothole", "Road crack", "Road"]
# Detects: multiple potholes, multiple cracks, road surface
```

**Traffic Infrastructure**:
```python
classes = ["Traffic sign", "Traffic light", "Road marking"]
# Detects: all signs, all lights, all markings in view
```

**General Object Detection**:
```python
classes = ["car", "person", "bicycle"]
# Detects: all cars, all people, all bicycles
```

## 📦 Deployment

This model is deployed using a custom Docker image. See the [GitHub repository](https://github.com/logiroad/sam3) for full documentation and deployment instructions.

## 📄 License

MIT License. This deployment uses Meta's SAM3 model - see the [facebook/sam3 model card](https://huggingface.co/facebook/sam3) for model license information.

## 🔗 Resources

- **Paper**: [SAM 3: Segment Anything with Concepts](https://ai.meta.com/research/publications/sam-3/)
- **Full Documentation**: [GitHub Repository](https://github.com/logiroad/sam3)
- **Endpoint Console**: [HuggingFace Endpoints](https://ui.endpoints.huggingface.co/Logiroad/endpoints/sam3-segmentation)