Dual Deployment Guide - HuggingFace & Azure AI Foundry
This repository supports deployment to both HuggingFace Inference Endpoints and Azure AI Foundry using the same codebase and Docker image.
π Deployment Overview
| Platform | Status | Container Registry | Endpoint |
|---|---|---|---|
| HuggingFace | β Running | sam3acr4hf.azurecr.io |
https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud |
| Azure AI Foundry | β³ Pending GPU Quota | sam3acr.azurecr.io |
To be deployed |
Both deployments use the same Docker image with SAM3Model for static image segmentation.
π HuggingFace Deployment (Current)
Status
β DEPLOYED AND RUNNING
Registry
sam3acr4hf.azurecr.io/sam3-hf:latest
Quick Deploy
# Build and push
docker build -t sam3acr4hf.azurecr.io/sam3-hf:latest .
az acr login --name sam3acr4hf
docker push sam3acr4hf.azurecr.io/sam3-hf:latest
# Restart endpoint
python3 << 'EOF'
from huggingface_hub import HfApi
api = HfApi()
endpoint = api.get_inference_endpoint('sam3-segmentation', namespace='Logiroad')
endpoint.pause()
endpoint.resume()
EOF
Configuration
- Hardware: NVIDIA A10G (24GB VRAM)
- Organization: Logiroad
- Access: Public
- Auto-scaling: Enabled (0-5 replicas)
π· Azure AI Foundry Deployment (Future)
Status
β³ WAITING FOR GPU QUOTA
Once GPU quota is approved, deploy using the same Docker image:
Registry
sam3acr.azurecr.io/sam3-foundry:latest
Deployment Steps
1. Build and Push to Azure ACR
# Login to Azure AI Foundry ACR
az acr login --name sam3acr
# Build with Azure AI Foundry tag
docker build -t sam3acr.azurecr.io/sam3-foundry:latest .
# Push to Azure ACR
docker push sam3acr.azurecr.io/sam3-foundry:latest
2. Deploy to Azure AI Foundry
Using Azure CLI:
# Create Azure AI Foundry endpoint
az ml online-endpoint create \
--name sam3-foundry \
--resource-group productionline-test \
--workspace-name <your-workspace>
# Create deployment
az ml online-deployment create \
--name sam3-deployment \
--endpoint sam3-foundry \
--model-uri sam3acr.azurecr.io/sam3-foundry:latest \
--instance-type Standard_NC6s_v3 \
--instance-count 1
Or using Azure Portal:
- Navigate to Azure AI Foundry workspace
- Go to Endpoints β Real-time endpoints
- Click Create
- Select Custom container
- Image:
sam3acr.azurecr.io/sam3-foundry:latest - Instance type: Standard_NC6s_v3 (Tesla V100)
- Deploy
3. Test Azure AI Foundry Endpoint
import requests
import base64
# Get endpoint URL and key from Azure Portal
ENDPOINT_URL = "https://<your-endpoint>.azureml.net/score"
API_KEY = "<your-api-key>"
with open("test.jpg", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
response = requests.post(
ENDPOINT_URL,
json={
"inputs": image_b64,
"parameters": {"classes": ["pothole", "asphalt"]}
},
headers={"Authorization": f"Bearer {API_KEY}"}
)
print(response.json())
π Unified Deployment Workflow
Since both platforms use the same Docker image, you can deploy to both simultaneously:
Option 1: Separate Tags (Recommended)
# Build once
docker build -t sam3-base:latest .
# Tag for HuggingFace
docker tag sam3-base:latest sam3acr4hf.azurecr.io/sam3-hf:latest
# Tag for Azure AI Foundry
docker tag sam3-base:latest sam3acr.azurecr.io/sam3-foundry:latest
# Push to both registries
az acr login --name sam3acr4hf
docker push sam3acr4hf.azurecr.io/sam3-hf:latest
az acr login --name sam3acr
docker push sam3acr.azurecr.io/sam3-foundry:latest
Option 2: Deploy Script
Create deploy_all.sh:
#!/bin/bash
set -e
echo "Building Docker image..."
docker build -t sam3:latest .
echo "Pushing to HuggingFace ACR..."
docker tag sam3:latest sam3acr4hf.azurecr.io/sam3-hf:latest
az acr login --name sam3acr4hf
docker push sam3acr4hf.azurecr.io/sam3-hf:latest
echo "Pushing to Azure AI Foundry ACR..."
docker tag sam3:latest sam3acr.azurecr.io/sam3-foundry:latest
az acr login --name sam3acr
docker push sam3acr.azurecr.io/sam3-foundry:latest
echo "β
Deployed to both registries!"
π Platform Comparison
| Feature | HuggingFace | Azure AI Foundry |
|---|---|---|
| GPU | NVIDIA A10G (24GB) | Tesla V100 (16GB) or A100 |
| Auto-scaling | β Yes (0-5 replicas) | β Yes (configurable) |
| Authentication | Public or Token | API Key required |
| Pricing | Per-second billing | Per-hour billing |
| Scale to Zero | β Yes | β οΈ Limited support |
| Integration | HuggingFace ecosystem | Azure ML ecosystem |
| Monitoring | HF Dashboard | Azure Monitor |
π§ Configuration Differences
API Authentication
HuggingFace (current - public):
response = requests.post(endpoint_url, json=payload)
Azure AI Foundry (requires key):
response = requests.post(
endpoint_url,
json=payload,
headers={"Authorization": f"Bearer {api_key}"}
)
Environment Variables
For Azure AI Foundry, you may need to add environment variables:
# Add to Dockerfile if needed for Azure
ENV AZURE_AI_FOUNDRY=true
ENV MLFLOW_TRACKING_URI=<your-mlflow-uri>
Health Check Endpoints
Both platforms expect:
GET /health- Health checkPOST /- Inference endpoint
Our current app.py already supports both! β
π§ͺ Testing Both Deployments
Create test_both_platforms.py:
import requests
import base64
def test_endpoint(name, url, api_key=None):
"""Test an endpoint"""
print(f"\n{'='*60}")
print(f"Testing {name}")
print(f"{'='*60}")
# Health check
headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}
health = requests.get(f"{url}/health", headers=headers)
print(f"Health: {health.status_code}")
# Inference
with open("test.jpg", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
response = requests.post(
url,
json={
"inputs": image_b64,
"parameters": {"classes": ["pothole", "asphalt"]}
},
headers=headers
)
print(f"Inference: {response.status_code}")
if response.status_code == 200:
results = response.json()
print(f"β
Generated {len(results)} masks")
else:
print(f"β Error: {response.text}")
# Test HuggingFace
test_endpoint(
"HuggingFace",
"https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud"
)
# Test Azure AI Foundry (when deployed)
# test_endpoint(
# "Azure AI Foundry",
# "https://<your-endpoint>.azureml.net/score",
# api_key="<your-key>"
# )
π Deployment Checklist
HuggingFace (Complete) β
- Azure Container Registry created (
sam3acr4hf) - Docker image built and pushed
- HuggingFace endpoint created
- Model validated with test image
- Documentation complete
Azure AI Foundry (Pending GPU Quota) β³
- Azure Container Registry exists (
sam3acr) - GPU quota approved
- Azure AI Foundry workspace created
- Docker image pushed to
sam3acr - Endpoint deployed
- API key obtained
- Endpoint validated
π Troubleshooting
HuggingFace Issues
See main README.md troubleshooting section.
Azure AI Foundry Issues
Issue: GPU quota not available
- Solution: Request quota increase in Azure Portal β Quotas β ML quotas
Issue: Container registry authentication failed
az acr login --name sam3acr --expose-token
Issue: Endpoint deployment fails
- Check Azure Activity Log for detailed error
- Verify image is accessible:
az acr repository show --name sam3acr --image sam3-foundry:latest
Issue: Model loading timeout
- Increase deployment timeout in Azure ML Studio
- Consider using smaller instance for testing
π‘ Best Practices
- Use same Docker image for both platforms to ensure consistency
- Tag images with versions (e.g.,
v1.0.0) for rollback capability - Test locally first before pushing to registries
- Monitor costs on both platforms (HF per-second, Azure per-hour)
- Set up alerts for endpoint health on both platforms
- Keep API keys secure (use Azure Key Vault for Azure AI Foundry)
π Resources
HuggingFace
Azure AI Foundry
Last Updated: 2025-11-22 Next Step: Deploy to Azure AI Foundry once GPU quota is approved