Logiroad
/

sam3

@@ -1,378 +1,66 @@
-# SAM3 Static Image Segmentation - HuggingFace Deployment
-Production-ready deployment of Meta's SAM3 (Segment Anything Model 3) for text-prompted static image segmentation on HuggingFace Inference Endpoints with Azure Container Registry.
-## 🚀 Quick Start
-### Deployments
-This repository supports deployment to **both HuggingFace and Azure AI Foundry**. See [DEPLOYMENT.md](DEPLOYMENT.md) for dual-deployment guide.
-#### HuggingFace (Current)
-**URL**: `https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud`
-**Status**: ✅ Running
-**Model**: `facebook/sam3` (Sam3Model for static images)
-**Hardware**: NVIDIA A10G GPU (24GB VRAM)
-#### Azure AI Foundry (Pending GPU Quota)
-**Registry**: `sam3acr.azurecr.io`
-**Status**: ⏳ Waiting for GPU quota approval
-**See**: [DEPLOYMENT.md](DEPLOYMENT.md) for deployment instructions
-### Basic Usage
 ```python
 import requests
 import base64
-from PIL import Image
-import io
-# Load and encode image
 with open("image.jpg", "rb") as f:
     image_b64 = base64.b64encode(f.read()).decode()
-# Request segmentation masks
-response = requests.post(
-    "https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud",
-    json={
-        "inputs": image_b64,
-        "parameters": {
-            "classes": ["pothole", "asphalt", "yellow line", "shadow"]
-        }
-    }
-)
-# Process results
-results = response.json()
-for result in results:
-    label = result["label"]
-    score = result["score"]
-    mask_b64 = result["mask"]
-    # Decode mask (PNG image as base64)
-    mask_bytes = base64.b64decode(mask_b64)
-    mask_image = Image.open(io.BytesIO(mask_bytes))
-    print(f"Class: {label}, Score: {score}")
-    mask_image.save(f"mask_{label}.png")
-```
-## 📋 API Reference
-### POST `/`
-Segment objects in an image using text prompts.
-**Request Body**:
-```json
-{
-  "inputs": "<base64 encoded JPEG/PNG image>",
-  "parameters": {
-    "classes": ["object1", "object2", "object3"]
-  }
-}
-```
-**Response**:
-```json
-[
-  {
-    "label": "object1",
-    "score": 1.0,
-    "mask": "<base64 encoded PNG mask>"
-  },
-  {
-    "label": "object2",
-    "score": 1.0,
-    "mask": "<base64 encoded PNG mask>"
-  }
-]
-```
-**Mask Format**:
-- PNG grayscale image (base64 encoded)
-- White pixels (255) = object present
-- Black pixels (0) = background
-- Same dimensions as input image
-### GET `/health`
-Check endpoint health and GPU status.
-**Response**:
-```json
-{
-  "status": "healthy",
-  "model": "Sam3Model",
-  "gpu_available": true,
-  "vram": {
-    "total_gb": 23.95,
-    "allocated_gb": 1.72,
-    "free_gb": 22.20,
-    "processing_now": 0
-  }
-}
-```
-### GET `/metrics`
-Get VRAM metrics.
-**Response**:
-```json
-{
-  "total_gb": 23.95,
-  "allocated_gb": 1.72,
-  "free_gb": 22.20,
-  "processing_now": 0
-}
-```
-## 🛠️ Deployment Architecture
-### Components
-- **Model**: `facebook/sam3` (Sam3Model - 3.4GB)
-- **Container**: NVIDIA CUDA 12.9.1 + Ubuntu 24.04
-- **Registry**: Azure Container Registry `sam3acr4hf.azurecr.io`
-- **Endpoint**: HuggingFace Inference Endpoints (Logiroad organization)
-- **GPU**: NVIDIA A10G (24GB VRAM)
-### Repository Structure
-```
-sam3_huggingface/
-├── src/                        # Source code
-│   ├── app.py                  # FastAPI inference server
-│   └── utils/                  # Utility modules
-├── docker/                     # Docker configurations
-│   ├── Dockerfile             # Container definition
-│   └── requirements.txt       # Python dependencies
-├── deployments/               # Platform-specific deployments
-│   ├── huggingface/          # HuggingFace configuration
-│   └── azure/                # Azure AI Foundry configuration
-├── scripts/                   # Automation scripts
-│   ├── deploy_all.sh         # Unified deployment
-│   └── test/                 # Test scripts
-├── docs/                      # Documentation
-│   └── DEPLOYMENT.md         # Deployment guide
-├── assets/                    # Static assets
-│   ├── test_images/          # Test images
-│   └── examples/             # Usage examples
-├── model/                     # SAM3 model files (3.4GB)
-└── README.md                  # This file
-```
-## 🔧 Local Development
-### Prerequisites
-- Docker with NVIDIA GPU support
-- Azure CLI (for ACR access)
-- Python 3.11+
-- CUDA-compatible GPU (optional, for local testing)
-### Build Docker Image
-```bash
-docker build -t sam3acr4hf.azurecr.io/sam3-hf:latest -f docker/Dockerfile .
-```
-### Run Locally (with GPU)
-```bash
-docker run -p 7860:7860 --gpus all \
-  sam3acr4hf.azurecr.io/sam3-hf:latest
-```
-### Test Locally
-```bash
-# Using test script
-python3 scripts/test/test_api.py
-# Or using example
-python3 assets/examples/usage_example.py
-```
-## 🚢 Deployment
-### Quick Deploy (Recommended)
-Use the provided deployment script for easy deployment to one or both platforms:
-```bash
-# Deploy to HuggingFace only (default)
-./deploy_all.sh --hf
-# Deploy to Azure AI Foundry only
-./deploy_all.sh --azure
-# Deploy to both platforms
-./deploy_all.sh --all
-```
-The script handles building, tagging, and pushing to both registries automatically.
-### Manual Deployment
-#### HuggingFace
-```bash
-./deployments/huggingface/deploy.sh
-```
-See [`deployments/huggingface/README.md`](deployments/huggingface/README.md) for details.
-#### Azure AI Foundry
-```bash
-./deployments/azure/deploy.sh
-```
-See [`deployments/azure/README.md`](deployments/azure/README.md) for details.
-For complete deployment instructions, see [`docs/DEPLOYMENT.md`](docs/DEPLOYMENT.md).
-## 📊 Performance
-- **Inference Time**: ~2-3 seconds for 4 classes
-- **Throughput**: Limited by GPU (24GB VRAM)
-- **Concurrency**: 2 concurrent requests (configurable)
-- **Image Size**: Supports up to ~2000x2000 pixels
-## 🔍 Key Implementation Details
-### SAM3 Model Selection
-⚠️ **Important**: Use `Sam3Model` (static images), not `Sam3VideoModel` (video tracking).
-```python
-from transformers import Sam3Model, Sam3Processor
-# ✅ Correct for static images
-model = Sam3Model.from_pretrained("facebook/sam3")
-processor = Sam3Processor.from_pretrained("facebook/sam3")
-# ❌ Wrong - for video tracking
-# model = Sam3VideoModel.from_pretrained("facebook/sam3")
-```
-### Batch Processing
-To segment multiple objects in ONE image, repeat the image for each text prompt:
-```python
-# For multiple classes in one image
-images_batch = [image] * len(classes)  # Repeat image
-inputs = processor(
-    images=images_batch,
-    text=classes,
-    return_tensors="pt"
-)
-```
-### Dtype Handling
-Only convert floating-point tensors to match model dtype (float16):
-```python
-model_dtype = next(model.parameters()).dtype
-inputs = {
-    k: v.cuda().to(model_dtype) if v.dtype.is_floating_point
-    else v.cuda()
-    for k, v in inputs.items()
-    if isinstance(v, torch.Tensor)
-}
-```
-## 📦 Dependencies
-```txt
-fastapi==0.121.3
-uvicorn==0.38.0
-torch==2.9.1
-torchvision
-git+https://github.com/huggingface/transformers.git  # SAM3 support
-huggingface_hub>=1.0.0,<2.0
-numpy>=2.3.0
-pillow>=12.0.0
-```
-## 🐛 Troubleshooting
-### Endpoint Stuck Initializing
-The 15.7GB Docker image takes 5-10 minutes to pull and initialize. Wait patiently.
-### "shape is invalid for input" Error
-Ensure you're repeating the image for each class:
-```python
-images_batch = [image] * len(classes)
-```
-### "dtype mismatch" Error
-Don't convert integer tensors (input_ids, attention_mask) to float16.
-### Empty/Wrong Masks
-Ensure text prompts match actual image content. SAM3 will try to find matches even for non-existent objects.
-## 📝 Example: Road Defect Detection
-```python
-import requests
-import base64
-from PIL import Image
-import io
-# Load road image
-with open("road.jpg", "rb") as f:
-    image_b64 = base64.b64encode(f.read()).decode()
-# Segment road defects
 response = requests.post(
     "https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud",
     json={
         "inputs": image_b64,
-        "parameters": {
-            "classes": ["pothole", "crack", "debris", "patch"]
-        }
     }
 )
-# Save masks
-results = response.json()
-for result in results:
-    mask_bytes = base64.b64decode(result["mask"])
-    mask_img = Image.open(io.BytesIO(mask_bytes))
-    mask_img.save(f"defect_{result['label']}.png")
-    print(f"Found {result['label']} (score: {result['score']:.2f})")
 ```
-## 📚 Resources
-- **Model**: [facebook/sam3 on HuggingFace](https://huggingface.co/facebook/sam3)
-- **Paper**: [SAM 3: Segment Anything with Concepts](https://ai.meta.com/research/publications/sam-3/)
-- **Endpoint Management**: [HuggingFace Console](https://ui.endpoints.huggingface.co/Logiroad/endpoints/sam3-segmentation)
 ## 📄 License
-This deployment uses Meta's SAM3 model. See the [model card](https://huggingface.co/facebook/sam3) for license information.
-## 🤝 Support
-For issues with:
-- **Model/Inference**: Check SAM3 documentation
-- **Deployment**: Contact HuggingFace support
-- **Azure Registry**: Check ACR credentials and permissions
----
-**Last Updated**: 2025-11-22
-**Status**: ✅ Production Ready
-**Endpoint**: https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud

+---
+tags:
+- image-segmentation
+- sam
+- custom-docker
+license: mit
+task_categories:
+- image-segmentation
+library_name: transformers
+pipeline_tag: image-segmentation
+---
+# SAM3 - Semantic Segmentation Model
+SAM3 is a semantic segmentation model deployed as a custom Docker container on HuggingFace Inference Endpoints.
+## 🚀 Deployment
+- **GitHub Repository**: https://github.com/logiroad/sam3
+- **Inference Endpoint**: https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud
+- **Docker Registry**: sam3acr4hf.azurecr.io/sam3-hf:latest
+- **Model**: facebook/sam3 (Sam3Model for static images)
+- **Hardware**: NVIDIA A10G (24GB VRAM)
+## 📊 Model Architecture
+Built on Meta's SAM3 (Segment Anything Model 3) architecture for text-prompted semantic segmentation of static images.
+## 🎯 Usage
 ```python
 import requests
 import base64
+# Read image
 with open("image.jpg", "rb") as f:
     image_b64 = base64.b64encode(f.read()).decode()
+# Call endpoint
 response = requests.post(
     "https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud",
     json={
         "inputs": image_b64,
+        "parameters": {"classes": ["pothole", "asphalt"]}
     }
 )
+# Get results
+masks = response.json()
+for result in masks:
+    print(f"Class: {result['label']}, Score: {result['score']}")
 ```
+## 📦 Deployment
+This model is deployed using a custom Docker image. See the [GitHub repository](https://github.com/logiroad/sam3) for full documentation and deployment instructions.
 ## 📄 License
+MIT License. This deployment uses Meta's SAM3 model - see the [facebook/sam3 model card](https://huggingface.co/facebook/sam3) for model license information.
+## 🔗 Resources
+- **Paper**: [SAM 3: Segment Anything with Concepts](https://ai.meta.com/research/publications/sam-3/)
+- **Full Documentation**: [GitHub Repository](https://github.com/logiroad/sam3)
+- **Endpoint Console**: [HuggingFace Endpoints](https://ui.endpoints.huggingface.co/Logiroad/endpoints/sam3-segmentation)