sam3 / docs /DEPLOYMENT.md

Thibaut

Update HuggingFace endpoint URL after clean rebuild

b2e88b8 2 months ago

preview code

raw

history blame contribute delete

9.11 kB

Dual Deployment Guide - HuggingFace & Azure AI Foundry

This repository supports deployment to both HuggingFace Inference Endpoints and Azure AI Foundry using the same codebase and Docker image.

📋 Deployment Overview

Platform	Status	Container Registry	Endpoint
HuggingFace	✅ Running	`sam3acr4hf.azurecr.io`	https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud
Azure AI Foundry	⏳ Pending GPU Quota	`sam3acr.azurecr.io`	To be deployed

Both deployments use the same Docker image with SAM3Model for static image segmentation.

🚀 HuggingFace Deployment (Current)

Status

✅ DEPLOYED AND RUNNING

Registry

sam3acr4hf.azurecr.io/sam3-hf:latest

Quick Deploy

# Build and push
docker build -t sam3acr4hf.azurecr.io/sam3-hf:latest .
az acr login --name sam3acr4hf
docker push sam3acr4hf.azurecr.io/sam3-hf:latest

# Restart endpoint
python3 << 'EOF'
from huggingface_hub import HfApi
api = HfApi()
endpoint = api.get_inference_endpoint('sam3-segmentation', namespace='Logiroad')
endpoint.pause()
endpoint.resume()
EOF

Configuration

Hardware: NVIDIA A10G (24GB VRAM)
Organization: Logiroad
Access: Public
Auto-scaling: Enabled (0-5 replicas)

🔷 Azure AI Foundry Deployment (Future)

Status

⏳ WAITING FOR GPU QUOTA

Once GPU quota is approved, deploy using the same Docker image:

Registry

sam3acr.azurecr.io/sam3-foundry:latest

Deployment Steps

1. Build and Push to Azure ACR

# Login to Azure AI Foundry ACR
az acr login --name sam3acr

# Build with Azure AI Foundry tag
docker build -t sam3acr.azurecr.io/sam3-foundry:latest .

# Push to Azure ACR
docker push sam3acr.azurecr.io/sam3-foundry:latest

2. Deploy to Azure AI Foundry

Using Azure CLI:

# Create Azure AI Foundry endpoint
az ml online-endpoint create \
  --name sam3-foundry \
  --resource-group productionline-test \
  --workspace-name <your-workspace>

# Create deployment
az ml online-deployment create \
  --name sam3-deployment \
  --endpoint sam3-foundry \
  --model-uri sam3acr.azurecr.io/sam3-foundry:latest \
  --instance-type Standard_NC6s_v3 \
  --instance-count 1

Or using Azure Portal:

Navigate to Azure AI Foundry workspace
Go to Endpoints → Real-time endpoints
Click Create
Select Custom container
Image: sam3acr.azurecr.io/sam3-foundry:latest
Instance type: Standard_NC6s_v3 (Tesla V100)
Deploy

3. Test Azure AI Foundry Endpoint

import requests
import base64

# Get endpoint URL and key from Azure Portal
ENDPOINT_URL = "https://<your-endpoint>.azureml.net/score"
API_KEY = "<your-api-key>"

with open("test.jpg", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

response = requests.post(
    ENDPOINT_URL,
    json={
        "inputs": image_b64,
        "parameters": {"classes": ["pothole", "asphalt"]}
    },
    headers={"Authorization": f"Bearer {API_KEY}"}
)

print(response.json())

🔄 Unified Deployment Workflow

Since both platforms use the same Docker image, you can deploy to both simultaneously:

Option 1: Separate Tags (Recommended)

# Build once
docker build -t sam3-base:latest .

# Tag for HuggingFace
docker tag sam3-base:latest sam3acr4hf.azurecr.io/sam3-hf:latest

# Tag for Azure AI Foundry
docker tag sam3-base:latest sam3acr.azurecr.io/sam3-foundry:latest

# Push to both registries
az acr login --name sam3acr4hf
docker push sam3acr4hf.azurecr.io/sam3-hf:latest

az acr login --name sam3acr
docker push sam3acr.azurecr.io/sam3-foundry:latest

Option 2: Deploy Script

Create deploy_all.sh:

#!/bin/bash
set -e

echo "Building Docker image..."
docker build -t sam3:latest .

echo "Pushing to HuggingFace ACR..."
docker tag sam3:latest sam3acr4hf.azurecr.io/sam3-hf:latest
az acr login --name sam3acr4hf
docker push sam3acr4hf.azurecr.io/sam3-hf:latest

echo "Pushing to Azure AI Foundry ACR..."
docker tag sam3:latest sam3acr.azurecr.io/sam3-foundry:latest
az acr login --name sam3acr
docker push sam3acr.azurecr.io/sam3-foundry:latest

echo "✅ Deployed to both registries!"

📊 Platform Comparison

Feature	HuggingFace	Azure AI Foundry
GPU	NVIDIA A10G (24GB)	Tesla V100 (16GB) or A100
Auto-scaling	✅ Yes (0-5 replicas)	✅ Yes (configurable)
Authentication	Public or Token	API Key required
Pricing	Per-second billing	Per-hour billing
Scale to Zero	✅ Yes	⚠️ Limited support
Integration	HuggingFace ecosystem	Azure ML ecosystem
Monitoring	HF Dashboard	Azure Monitor

🔧 Configuration Differences

API Authentication

HuggingFace (current - public):

response = requests.post(endpoint_url, json=payload)

Azure AI Foundry (requires key):

response = requests.post(
    endpoint_url,
    json=payload,
    headers={"Authorization": f"Bearer {api_key}"}
)

Environment Variables

For Azure AI Foundry, you may need to add environment variables:

# Add to Dockerfile if needed for Azure
ENV AZURE_AI_FOUNDRY=true
ENV MLFLOW_TRACKING_URI=<your-mlflow-uri>

Health Check Endpoints

Both platforms expect:

GET /health - Health check
POST / - Inference endpoint

Our current app.py already supports both! ✅

🧪 Testing Both Deployments

Create test_both_platforms.py:

import requests
import base64

def test_endpoint(name, url, api_key=None):
    """Test an endpoint"""
    print(f"\n{'='*60}")
    print(f"Testing {name}")
    print(f"{'='*60}")

    # Health check
    headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}
    health = requests.get(f"{url}/health", headers=headers)
    print(f"Health: {health.status_code}")

    # Inference
    with open("test.jpg", "rb") as f:
        image_b64 = base64.b64encode(f.read()).decode()

    response = requests.post(
        url,
        json={
            "inputs": image_b64,
            "parameters": {"classes": ["pothole", "asphalt"]}
        },
        headers=headers
    )

    print(f"Inference: {response.status_code}")
    if response.status_code == 200:
        results = response.json()
        print(f"✅ Generated {len(results)} masks")
    else:
        print(f"❌ Error: {response.text}")

# Test HuggingFace
test_endpoint(
    "HuggingFace",
    "https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud"
)

# Test Azure AI Foundry (when deployed)
# test_endpoint(
#     "Azure AI Foundry",
#     "https://<your-endpoint>.azureml.net/score",
#     api_key="<your-key>"
# )

📝 Deployment Checklist

HuggingFace (Complete) ✅

Azure Container Registry created (sam3acr4hf)
Docker image built and pushed
HuggingFace endpoint created
Model validated with test image
Documentation complete

Azure AI Foundry (Pending GPU Quota) ⏳

Azure Container Registry exists (sam3acr)
GPU quota approved
Azure AI Foundry workspace created
Docker image pushed to sam3acr
Endpoint deployed
API key obtained
Endpoint validated

🆘 Troubleshooting

HuggingFace Issues

See main README.md troubleshooting section.

Azure AI Foundry Issues

Issue: GPU quota not available

Solution: Request quota increase in Azure Portal → Quotas → ML quotas

Issue: Container registry authentication failed

az acr login --name sam3acr --expose-token

Issue: Endpoint deployment fails

Check Azure Activity Log for detailed error
Verify image is accessible: az acr repository show --name sam3acr --image sam3-foundry:latest

Issue: Model loading timeout

Increase deployment timeout in Azure ML Studio
Consider using smaller instance for testing

💡 Best Practices

Use same Docker image for both platforms to ensure consistency
Tag images with versions (e.g., v1.0.0) for rollback capability
Test locally first before pushing to registries
Monitor costs on both platforms (HF per-second, Azure per-hour)
Set up alerts for endpoint health on both platforms
Keep API keys secure (use Azure Key Vault for Azure AI Foundry)

📚 Resources

HuggingFace

Azure AI Foundry

Last Updated: 2025-11-22 Next Step: Deploy to Azure AI Foundry once GPU quota is approved