sam3 / docs /DEPLOYMENT.md
Thibaut's picture
Update HuggingFace endpoint URL after clean rebuild
b2e88b8

Dual Deployment Guide - HuggingFace & Azure AI Foundry

This repository supports deployment to both HuggingFace Inference Endpoints and Azure AI Foundry using the same codebase and Docker image.

πŸ“‹ Deployment Overview

Platform Status Container Registry Endpoint
HuggingFace βœ… Running sam3acr4hf.azurecr.io https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud
Azure AI Foundry ⏳ Pending GPU Quota sam3acr.azurecr.io To be deployed

Both deployments use the same Docker image with SAM3Model for static image segmentation.


πŸš€ HuggingFace Deployment (Current)

Status

βœ… DEPLOYED AND RUNNING

Registry

sam3acr4hf.azurecr.io/sam3-hf:latest

Quick Deploy

# Build and push
docker build -t sam3acr4hf.azurecr.io/sam3-hf:latest .
az acr login --name sam3acr4hf
docker push sam3acr4hf.azurecr.io/sam3-hf:latest

# Restart endpoint
python3 << 'EOF'
from huggingface_hub import HfApi
api = HfApi()
endpoint = api.get_inference_endpoint('sam3-segmentation', namespace='Logiroad')
endpoint.pause()
endpoint.resume()
EOF

Configuration

  • Hardware: NVIDIA A10G (24GB VRAM)
  • Organization: Logiroad
  • Access: Public
  • Auto-scaling: Enabled (0-5 replicas)

πŸ”· Azure AI Foundry Deployment (Future)

Status

⏳ WAITING FOR GPU QUOTA

Once GPU quota is approved, deploy using the same Docker image:

Registry

sam3acr.azurecr.io/sam3-foundry:latest

Deployment Steps

1. Build and Push to Azure ACR

# Login to Azure AI Foundry ACR
az acr login --name sam3acr

# Build with Azure AI Foundry tag
docker build -t sam3acr.azurecr.io/sam3-foundry:latest .

# Push to Azure ACR
docker push sam3acr.azurecr.io/sam3-foundry:latest

2. Deploy to Azure AI Foundry

Using Azure CLI:

# Create Azure AI Foundry endpoint
az ml online-endpoint create \
  --name sam3-foundry \
  --resource-group productionline-test \
  --workspace-name <your-workspace>

# Create deployment
az ml online-deployment create \
  --name sam3-deployment \
  --endpoint sam3-foundry \
  --model-uri sam3acr.azurecr.io/sam3-foundry:latest \
  --instance-type Standard_NC6s_v3 \
  --instance-count 1

Or using Azure Portal:

  1. Navigate to Azure AI Foundry workspace
  2. Go to Endpoints β†’ Real-time endpoints
  3. Click Create
  4. Select Custom container
  5. Image: sam3acr.azurecr.io/sam3-foundry:latest
  6. Instance type: Standard_NC6s_v3 (Tesla V100)
  7. Deploy

3. Test Azure AI Foundry Endpoint

import requests
import base64

# Get endpoint URL and key from Azure Portal
ENDPOINT_URL = "https://<your-endpoint>.azureml.net/score"
API_KEY = "<your-api-key>"

with open("test.jpg", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

response = requests.post(
    ENDPOINT_URL,
    json={
        "inputs": image_b64,
        "parameters": {"classes": ["pothole", "asphalt"]}
    },
    headers={"Authorization": f"Bearer {API_KEY}"}
)

print(response.json())

πŸ”„ Unified Deployment Workflow

Since both platforms use the same Docker image, you can deploy to both simultaneously:

Option 1: Separate Tags (Recommended)

# Build once
docker build -t sam3-base:latest .

# Tag for HuggingFace
docker tag sam3-base:latest sam3acr4hf.azurecr.io/sam3-hf:latest

# Tag for Azure AI Foundry
docker tag sam3-base:latest sam3acr.azurecr.io/sam3-foundry:latest

# Push to both registries
az acr login --name sam3acr4hf
docker push sam3acr4hf.azurecr.io/sam3-hf:latest

az acr login --name sam3acr
docker push sam3acr.azurecr.io/sam3-foundry:latest

Option 2: Deploy Script

Create deploy_all.sh:

#!/bin/bash
set -e

echo "Building Docker image..."
docker build -t sam3:latest .

echo "Pushing to HuggingFace ACR..."
docker tag sam3:latest sam3acr4hf.azurecr.io/sam3-hf:latest
az acr login --name sam3acr4hf
docker push sam3acr4hf.azurecr.io/sam3-hf:latest

echo "Pushing to Azure AI Foundry ACR..."
docker tag sam3:latest sam3acr.azurecr.io/sam3-foundry:latest
az acr login --name sam3acr
docker push sam3acr.azurecr.io/sam3-foundry:latest

echo "βœ… Deployed to both registries!"

πŸ“Š Platform Comparison

Feature HuggingFace Azure AI Foundry
GPU NVIDIA A10G (24GB) Tesla V100 (16GB) or A100
Auto-scaling βœ… Yes (0-5 replicas) βœ… Yes (configurable)
Authentication Public or Token API Key required
Pricing Per-second billing Per-hour billing
Scale to Zero βœ… Yes ⚠️ Limited support
Integration HuggingFace ecosystem Azure ML ecosystem
Monitoring HF Dashboard Azure Monitor

πŸ”§ Configuration Differences

API Authentication

HuggingFace (current - public):

response = requests.post(endpoint_url, json=payload)

Azure AI Foundry (requires key):

response = requests.post(
    endpoint_url,
    json=payload,
    headers={"Authorization": f"Bearer {api_key}"}
)

Environment Variables

For Azure AI Foundry, you may need to add environment variables:

# Add to Dockerfile if needed for Azure
ENV AZURE_AI_FOUNDRY=true
ENV MLFLOW_TRACKING_URI=<your-mlflow-uri>

Health Check Endpoints

Both platforms expect:

  • GET /health - Health check
  • POST / - Inference endpoint

Our current app.py already supports both! βœ…


πŸ§ͺ Testing Both Deployments

Create test_both_platforms.py:

import requests
import base64

def test_endpoint(name, url, api_key=None):
    """Test an endpoint"""
    print(f"\n{'='*60}")
    print(f"Testing {name}")
    print(f"{'='*60}")

    # Health check
    headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}
    health = requests.get(f"{url}/health", headers=headers)
    print(f"Health: {health.status_code}")

    # Inference
    with open("test.jpg", "rb") as f:
        image_b64 = base64.b64encode(f.read()).decode()

    response = requests.post(
        url,
        json={
            "inputs": image_b64,
            "parameters": {"classes": ["pothole", "asphalt"]}
        },
        headers=headers
    )

    print(f"Inference: {response.status_code}")
    if response.status_code == 200:
        results = response.json()
        print(f"βœ… Generated {len(results)} masks")
    else:
        print(f"❌ Error: {response.text}")

# Test HuggingFace
test_endpoint(
    "HuggingFace",
    "https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud"
)

# Test Azure AI Foundry (when deployed)
# test_endpoint(
#     "Azure AI Foundry",
#     "https://<your-endpoint>.azureml.net/score",
#     api_key="<your-key>"
# )

πŸ“ Deployment Checklist

HuggingFace (Complete) βœ…

  • Azure Container Registry created (sam3acr4hf)
  • Docker image built and pushed
  • HuggingFace endpoint created
  • Model validated with test image
  • Documentation complete

Azure AI Foundry (Pending GPU Quota) ⏳

  • Azure Container Registry exists (sam3acr)
  • GPU quota approved
  • Azure AI Foundry workspace created
  • Docker image pushed to sam3acr
  • Endpoint deployed
  • API key obtained
  • Endpoint validated

πŸ†˜ Troubleshooting

HuggingFace Issues

See main README.md troubleshooting section.

Azure AI Foundry Issues

Issue: GPU quota not available

  • Solution: Request quota increase in Azure Portal β†’ Quotas β†’ ML quotas

Issue: Container registry authentication failed

az acr login --name sam3acr --expose-token

Issue: Endpoint deployment fails

  • Check Azure Activity Log for detailed error
  • Verify image is accessible: az acr repository show --name sam3acr --image sam3-foundry:latest

Issue: Model loading timeout

  • Increase deployment timeout in Azure ML Studio
  • Consider using smaller instance for testing

πŸ’‘ Best Practices

  1. Use same Docker image for both platforms to ensure consistency
  2. Tag images with versions (e.g., v1.0.0) for rollback capability
  3. Test locally first before pushing to registries
  4. Monitor costs on both platforms (HF per-second, Azure per-hour)
  5. Set up alerts for endpoint health on both platforms
  6. Keep API keys secure (use Azure Key Vault for Azure AI Foundry)

πŸ“š Resources

HuggingFace

Azure AI Foundry


Last Updated: 2025-11-22 Next Step: Deploy to Azure AI Foundry once GPU quota is approved