--- tags: - image-segmentation - sam - custom-docker license: mit task_categories: - image-segmentation library_name: transformers pipeline_tag: image-segmentation --- # SAM3 - Instance Segmentation for Road Damage Detection SAM3 is an instance segmentation model deployed as a custom Docker container on HuggingFace Inference Endpoints. It detects and segments individual instances of road damage (potholes, cracks) using text prompts. ## 🚀 Deployment - **GitHub Repository**: https://github.com/logiroad/sam3 - **Inference Endpoint**: https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud - **Docker Registry**: sam3acr4hf.azurecr.io/sam3-hf:latest - **Model**: facebook/sam3 (Sam3Model for static images) - **Hardware**: NVIDIA A10G (24GB VRAM) ## 📊 Model Architecture Built on Meta's SAM3 (Segment Anything Model 3) architecture for text-prompted **instance segmentation** of static images. SAM3 detects and segments all individual instances of specified object classes. **Key features**: - Multiple instances per class (e.g., 3 potholes in one image) - Text-based prompting (natural language class names) - High-quality segmentation masks - Confidence scores per instance ## 🎯 Usage ### Basic Example ```python import requests import base64 # Read image with open("road_image.jpg", "rb") as f: image_b64 = base64.b64encode(f.read()).decode() # Call endpoint response = requests.post( "https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud", json={ "inputs": image_b64, "parameters": {"classes": ["Pothole", "Road crack", "Road"]} } ) # Get results - RETURNS VARIABLE NUMBER OF INSTANCES instances = response.json() print(f"Detected {len(instances)} instance(s)") for instance in instances: label = instance['label'] score = instance['score'] instance_id = instance['instance_id'] mask_b64 = instance['mask'] print(f"{label} #{instance_id}: confidence={score:.2f}") ``` ### Response Format The endpoint returns a **list of instances** (NOT one per class): ```json [ { "label": "Pothole", "mask": "iVBORw0KG...", "score": 0.92, "instance_id": 0 }, { "label": "Pothole", "mask": "iVBORw0KG...", "score": 0.71, "instance_id": 1 }, { "label": "Road crack", "mask": "iVBORw0KG...", "score": 0.38, "instance_id": 0 }, { "label": "Road", "mask": "iVBORw0KG...", "score": 0.89, "instance_id": 0 } ] ``` **Fields**: - `label`: Class name (from input prompts) - `mask`: Base64-encoded PNG mask (grayscale, 0-255) - `score`: Confidence score (0.0-1.0) - `instance_id`: Instance number within the class (0, 1, 2...) ### Processing Results ```python # Group instances by class from collections import defaultdict instances_by_class = defaultdict(list) for instance in instances: instances_by_class[instance['label']].append(instance) # Count instances per class for cls, insts in instances_by_class.items(): print(f"{cls}: {len(insts)} instance(s)") # Get highest confidence instance per class best_instances = {} for cls, insts in instances_by_class.items(): best = max(insts, key=lambda x: x['score']) best_instances[cls] = best # Decode and visualize masks import base64 from PIL import Image import io for instance in instances: mask_bytes = base64.b64decode(instance['mask']) mask_img = Image.open(io.BytesIO(mask_bytes)) # mask_img is now a PIL Image (grayscale) mask_img.save(f"{instance['label']}_{instance['instance_id']}.png") ``` ## ⚙️ Model Parameters - **Detection threshold**: 0.3 (instances with score < 0.3 are filtered out) - **Mask threshold**: 0.5 (pixel probability threshold for mask generation) - **Max instances**: Up to 200 per image (DETR architecture limit) ## 🎨 Use Cases **Road Damage Detection**: ```python classes = ["Pothole", "Road crack", "Road"] # Detects: multiple potholes, multiple cracks, road surface ``` **Traffic Infrastructure**: ```python classes = ["Traffic sign", "Traffic light", "Road marking"] # Detects: all signs, all lights, all markings in view ``` **General Object Detection**: ```python classes = ["car", "person", "bicycle"] # Detects: all cars, all people, all bicycles ``` ## 📦 Deployment This model is deployed using a custom Docker image. See the [GitHub repository](https://github.com/logiroad/sam3) for full documentation and deployment instructions. ## 📄 License MIT License. This deployment uses Meta's SAM3 model - see the [facebook/sam3 model card](https://huggingface.co/facebook/sam3) for model license information. ## 🔗 Resources - **Paper**: [SAM 3: Segment Anything with Concepts](https://ai.meta.com/research/publications/sam-3/) - **Full Documentation**: [GitHub Repository](https://github.com/logiroad/sam3) - **Endpoint Console**: [HuggingFace Endpoints](https://ui.endpoints.huggingface.co/Logiroad/endpoints/sam3-segmentation)