sam3 / README.md

Fix SAM3 instance segmentation and update documentation

d032bfc 16 days ago

4.95 kB

	---
	tags:
	- image-segmentation
	- sam
	- custom-docker
	license: mit
	task_categories:
	- image-segmentation
	library_name: transformers
	pipeline_tag: image-segmentation
	---

	# SAM3 - Instance Segmentation for Road Damage Detection

	SAM3 is an instance segmentation model deployed as a custom Docker container on HuggingFace Inference Endpoints. It detects and segments individual instances of road damage (potholes, cracks) using text prompts.

	## 🚀 Deployment

	- GitHub Repository: https://github.com/logiroad/sam3
	- Inference Endpoint: https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud
	- Docker Registry: sam3acr4hf.azurecr.io/sam3-hf:latest
	- Model: facebook/sam3 (Sam3Model for static images)
	- Hardware: NVIDIA A10G (24GB VRAM)

	## 📊 Model Architecture

	Built on Meta's SAM3 (Segment Anything Model 3) architecture for text-prompted instance segmentation of static images. SAM3 detects and segments all individual instances of specified object classes.

	Key features:
	- Multiple instances per class (e.g., 3 potholes in one image)
	- Text-based prompting (natural language class names)
	- High-quality segmentation masks
	- Confidence scores per instance

	## 🎯 Usage

	### Basic Example

	```python
	import requests
	import base64

	# Read image
	with open("road_image.jpg", "rb") as f:
	image_b64 = base64.b64encode(f.read()).decode()

	# Call endpoint
	response = requests.post(
	"https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud",
	json={
	"inputs": image_b64,
	"parameters": {"classes": ["Pothole", "Road crack", "Road"]}
	}
	)

	# Get results - RETURNS VARIABLE NUMBER OF INSTANCES
	instances = response.json()
	print(f"Detected {len(instances)} instance(s)")

	for instance in instances:
	label = instance['label']
	score = instance['score']
	instance_id = instance['instance_id']
	mask_b64 = instance['mask']

	print(f"{label} #{instance_id}: confidence={score:.2f}")
	```

	### Response Format

	The endpoint returns a list of instances (NOT one per class):

	```json
	[
	{
	"label": "Pothole",
	"mask": "iVBORw0KG...",
	"score": 0.92,
	"instance_id": 0
	},
	{
	"label": "Pothole",
	"mask": "iVBORw0KG...",
	"score": 0.71,
	"instance_id": 1
	},
	{
	"label": "Road crack",
	"mask": "iVBORw0KG...",
	"score": 0.38,
	"instance_id": 0
	},
	{
	"label": "Road",
	"mask": "iVBORw0KG...",
	"score": 0.89,
	"instance_id": 0
	}
	]
	```

	Fields:
	- `label`: Class name (from input prompts)
	- `mask`: Base64-encoded PNG mask (grayscale, 0-255)
	- `score`: Confidence score (0.0-1.0)
	- `instance_id`: Instance number within the class (0, 1, 2...)

	### Processing Results

	```python
	# Group instances by class
	from collections import defaultdict

	instances_by_class = defaultdict(list)
	for instance in instances:
	instances_by_class[instance['label']].append(instance)

	# Count instances per class
	for cls, insts in instances_by_class.items():
	print(f"{cls}: {len(insts)} instance(s)")

	# Get highest confidence instance per class
	best_instances = {}
	for cls, insts in instances_by_class.items():
	best = max(insts, key=lambda x: x['score'])
	best_instances[cls] = best

	# Decode and visualize masks
	import base64
	from PIL import Image
	import io

	for instance in instances:
	mask_bytes = base64.b64decode(instance['mask'])
	mask_img = Image.open(io.BytesIO(mask_bytes))
	# mask_img is now a PIL Image (grayscale)
	mask_img.save(f"{instance['label']}_{instance['instance_id']}.png")
	```

	## ⚙️ Model Parameters

	- Detection threshold: 0.3 (instances with score < 0.3 are filtered out)
	- Mask threshold: 0.5 (pixel probability threshold for mask generation)
	- Max instances: Up to 200 per image (DETR architecture limit)

	## 🎨 Use Cases

	Road Damage Detection:
	```python
	classes = ["Pothole", "Road crack", "Road"]
	# Detects: multiple potholes, multiple cracks, road surface
	```

	Traffic Infrastructure:
	```python
	classes = ["Traffic sign", "Traffic light", "Road marking"]
	# Detects: all signs, all lights, all markings in view
	```

	General Object Detection:
	```python
	classes = ["car", "person", "bicycle"]
	# Detects: all cars, all people, all bicycles
	```

	## 📦 Deployment

	This model is deployed using a custom Docker image. See the [GitHub repository](https://github.com/logiroad/sam3) for full documentation and deployment instructions.

	## 📄 License

	MIT License. This deployment uses Meta's SAM3 model - see the [facebook/sam3 model card](https://huggingface.co/facebook/sam3) for model license information.

	## 🔗 Resources

	- Paper: [SAM 3: Segment Anything with Concepts](https://ai.meta.com/research/publications/sam-3/)
	- Full Documentation: [GitHub Repository](https://github.com/logiroad/sam3)
	- Endpoint Console: [HuggingFace Endpoints](https://ui.endpoints.huggingface.co/Logiroad/endpoints/sam3-segmentation)