# Dual Deployment Guide - HuggingFace & Azure AI Foundry This repository supports deployment to both **HuggingFace Inference Endpoints** and **Azure AI Foundry** using the same codebase and Docker image. ## 📋 Deployment Overview | Platform | Status | Container Registry | Endpoint | |----------|--------|-------------------|----------| | **HuggingFace** | ✅ Running | `sam3acr4hf.azurecr.io` | https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud | | **Azure AI Foundry** | ⏳ Pending GPU Quota | `sam3acr.azurecr.io` | To be deployed | Both deployments use the **same Docker image** with SAM3Model for static image segmentation. --- ## 🚀 HuggingFace Deployment (Current) ### Status ✅ **DEPLOYED AND RUNNING** ### Registry ```bash sam3acr4hf.azurecr.io/sam3-hf:latest ``` ### Quick Deploy ```bash # Build and push docker build -t sam3acr4hf.azurecr.io/sam3-hf:latest . az acr login --name sam3acr4hf docker push sam3acr4hf.azurecr.io/sam3-hf:latest # Restart endpoint python3 << 'EOF' from huggingface_hub import HfApi api = HfApi() endpoint = api.get_inference_endpoint('sam3-segmentation', namespace='Logiroad') endpoint.pause() endpoint.resume() EOF ``` ### Configuration - **Hardware**: NVIDIA A10G (24GB VRAM) - **Organization**: Logiroad - **Access**: Public - **Auto-scaling**: Enabled (0-5 replicas) --- ## 🔷 Azure AI Foundry Deployment (Future) ### Status ⏳ **WAITING FOR GPU QUOTA** Once GPU quota is approved, deploy using the same Docker image: ### Registry ```bash sam3acr.azurecr.io/sam3-foundry:latest ``` ### Deployment Steps #### 1. Build and Push to Azure ACR ```bash # Login to Azure AI Foundry ACR az acr login --name sam3acr # Build with Azure AI Foundry tag docker build -t sam3acr.azurecr.io/sam3-foundry:latest . # Push to Azure ACR docker push sam3acr.azurecr.io/sam3-foundry:latest ``` #### 2. Deploy to Azure AI Foundry Using Azure CLI: ```bash # Create Azure AI Foundry endpoint az ml online-endpoint create \ --name sam3-foundry \ --resource-group productionline-test \ --workspace-name # Create deployment az ml online-deployment create \ --name sam3-deployment \ --endpoint sam3-foundry \ --model-uri sam3acr.azurecr.io/sam3-foundry:latest \ --instance-type Standard_NC6s_v3 \ --instance-count 1 ``` Or using Azure Portal: 1. Navigate to Azure AI Foundry workspace 2. Go to **Endpoints** → **Real-time endpoints** 3. Click **Create** 4. Select **Custom container** 5. Image: `sam3acr.azurecr.io/sam3-foundry:latest` 6. Instance type: **Standard_NC6s_v3** (Tesla V100) 7. Deploy #### 3. Test Azure AI Foundry Endpoint ```python import requests import base64 # Get endpoint URL and key from Azure Portal ENDPOINT_URL = "https://.azureml.net/score" API_KEY = "" with open("test.jpg", "rb") as f: image_b64 = base64.b64encode(f.read()).decode() response = requests.post( ENDPOINT_URL, json={ "inputs": image_b64, "parameters": {"classes": ["pothole", "asphalt"]} }, headers={"Authorization": f"Bearer {API_KEY}"} ) print(response.json()) ``` --- ## 🔄 Unified Deployment Workflow Since both platforms use the **same Docker image**, you can deploy to both simultaneously: ### Option 1: Separate Tags (Recommended) ```bash # Build once docker build -t sam3-base:latest . # Tag for HuggingFace docker tag sam3-base:latest sam3acr4hf.azurecr.io/sam3-hf:latest # Tag for Azure AI Foundry docker tag sam3-base:latest sam3acr.azurecr.io/sam3-foundry:latest # Push to both registries az acr login --name sam3acr4hf docker push sam3acr4hf.azurecr.io/sam3-hf:latest az acr login --name sam3acr docker push sam3acr.azurecr.io/sam3-foundry:latest ``` ### Option 2: Deploy Script Create `deploy_all.sh`: ```bash #!/bin/bash set -e echo "Building Docker image..." docker build -t sam3:latest . echo "Pushing to HuggingFace ACR..." docker tag sam3:latest sam3acr4hf.azurecr.io/sam3-hf:latest az acr login --name sam3acr4hf docker push sam3acr4hf.azurecr.io/sam3-hf:latest echo "Pushing to Azure AI Foundry ACR..." docker tag sam3:latest sam3acr.azurecr.io/sam3-foundry:latest az acr login --name sam3acr docker push sam3acr.azurecr.io/sam3-foundry:latest echo "✅ Deployed to both registries!" ``` --- ## 📊 Platform Comparison | Feature | HuggingFace | Azure AI Foundry | |---------|-------------|------------------| | **GPU** | NVIDIA A10G (24GB) | Tesla V100 (16GB) or A100 | | **Auto-scaling** | ✅ Yes (0-5 replicas) | ✅ Yes (configurable) | | **Authentication** | Public or Token | API Key required | | **Pricing** | Per-second billing | Per-hour billing | | **Scale to Zero** | ✅ Yes | ⚠️ Limited support | | **Integration** | HuggingFace ecosystem | Azure ML ecosystem | | **Monitoring** | HF Dashboard | Azure Monitor | --- ## 🔧 Configuration Differences ### API Authentication **HuggingFace** (current - public): ```python response = requests.post(endpoint_url, json=payload) ``` **Azure AI Foundry** (requires key): ```python response = requests.post( endpoint_url, json=payload, headers={"Authorization": f"Bearer {api_key}"} ) ``` ### Environment Variables For Azure AI Foundry, you may need to add environment variables: ```dockerfile # Add to Dockerfile if needed for Azure ENV AZURE_AI_FOUNDRY=true ENV MLFLOW_TRACKING_URI= ``` ### Health Check Endpoints Both platforms expect: - `GET /health` - Health check - `POST /` - Inference endpoint Our current `app.py` already supports both! ✅ --- ## 🧪 Testing Both Deployments Create `test_both_platforms.py`: ```python import requests import base64 def test_endpoint(name, url, api_key=None): """Test an endpoint""" print(f"\n{'='*60}") print(f"Testing {name}") print(f"{'='*60}") # Health check headers = {"Authorization": f"Bearer {api_key}"} if api_key else {} health = requests.get(f"{url}/health", headers=headers) print(f"Health: {health.status_code}") # Inference with open("test.jpg", "rb") as f: image_b64 = base64.b64encode(f.read()).decode() response = requests.post( url, json={ "inputs": image_b64, "parameters": {"classes": ["pothole", "asphalt"]} }, headers=headers ) print(f"Inference: {response.status_code}") if response.status_code == 200: results = response.json() print(f"✅ Generated {len(results)} masks") else: print(f"❌ Error: {response.text}") # Test HuggingFace test_endpoint( "HuggingFace", "https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud" ) # Test Azure AI Foundry (when deployed) # test_endpoint( # "Azure AI Foundry", # "https://.azureml.net/score", # api_key="" # ) ``` --- ## 📝 Deployment Checklist ### HuggingFace (Complete) ✅ - [x] Azure Container Registry created (`sam3acr4hf`) - [x] Docker image built and pushed - [x] HuggingFace endpoint created - [x] Model validated with test image - [x] Documentation complete ### Azure AI Foundry (Pending GPU Quota) ⏳ - [x] Azure Container Registry exists (`sam3acr`) - [ ] GPU quota approved - [ ] Azure AI Foundry workspace created - [ ] Docker image pushed to `sam3acr` - [ ] Endpoint deployed - [ ] API key obtained - [ ] Endpoint validated --- ## 🆘 Troubleshooting ### HuggingFace Issues See main README.md troubleshooting section. ### Azure AI Foundry Issues **Issue**: GPU quota not available - **Solution**: Request quota increase in Azure Portal → Quotas → ML quotas **Issue**: Container registry authentication failed ```bash az acr login --name sam3acr --expose-token ``` **Issue**: Endpoint deployment fails - Check Azure Activity Log for detailed error - Verify image is accessible: `az acr repository show --name sam3acr --image sam3-foundry:latest` **Issue**: Model loading timeout - Increase deployment timeout in Azure ML Studio - Consider using smaller instance for testing --- ## 💡 Best Practices 1. **Use same Docker image** for both platforms to ensure consistency 2. **Tag images with versions** (e.g., `v1.0.0`) for rollback capability 3. **Test locally first** before pushing to registries 4. **Monitor costs** on both platforms (HF per-second, Azure per-hour) 5. **Set up alerts** for endpoint health on both platforms 6. **Keep API keys secure** (use Azure Key Vault for Azure AI Foundry) --- ## 📚 Resources ### HuggingFace - [Inference Endpoints Docs](https://huggingface.co/docs/inference-endpoints) - [Custom Docker Images](https://huggingface.co/docs/inference-endpoints/guides/custom_container) ### Azure AI Foundry - [Azure ML Endpoints](https://learn.microsoft.com/azure/machine-learning/concept-endpoints) - [Deploy Custom Containers](https://learn.microsoft.com/azure/machine-learning/how-to-deploy-custom-container) - [GPU Quota Requests](https://learn.microsoft.com/azure/machine-learning/how-to-manage-quotas) --- **Last Updated**: 2025-11-22 **Next Step**: Deploy to Azure AI Foundry once GPU quota is approved