| # Dual Deployment Guide - HuggingFace & Azure AI Foundry | |
| This repository supports deployment to both **HuggingFace Inference Endpoints** and **Azure AI Foundry** using the same codebase and Docker image. | |
| ## π Deployment Overview | |
| | Platform | Status | Container Registry | Endpoint | | |
| |----------|--------|-------------------|----------| | |
| | **HuggingFace** | β Running | `sam3acr4hf.azurecr.io` | https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud | | |
| | **Azure AI Foundry** | β³ Pending GPU Quota | `sam3acr.azurecr.io` | To be deployed | | |
| Both deployments use the **same Docker image** with SAM3Model for static image segmentation. | |
| --- | |
| ## π HuggingFace Deployment (Current) | |
| ### Status | |
| β **DEPLOYED AND RUNNING** | |
| ### Registry | |
| ```bash | |
| sam3acr4hf.azurecr.io/sam3-hf:latest | |
| ``` | |
| ### Quick Deploy | |
| ```bash | |
| # Build and push | |
| docker build -t sam3acr4hf.azurecr.io/sam3-hf:latest . | |
| az acr login --name sam3acr4hf | |
| docker push sam3acr4hf.azurecr.io/sam3-hf:latest | |
| # Restart endpoint | |
| python3 << 'EOF' | |
| from huggingface_hub import HfApi | |
| api = HfApi() | |
| endpoint = api.get_inference_endpoint('sam3-segmentation', namespace='Logiroad') | |
| endpoint.pause() | |
| endpoint.resume() | |
| EOF | |
| ``` | |
| ### Configuration | |
| - **Hardware**: NVIDIA A10G (24GB VRAM) | |
| - **Organization**: Logiroad | |
| - **Access**: Public | |
| - **Auto-scaling**: Enabled (0-5 replicas) | |
| --- | |
| ## π· Azure AI Foundry Deployment (Future) | |
| ### Status | |
| β³ **WAITING FOR GPU QUOTA** | |
| Once GPU quota is approved, deploy using the same Docker image: | |
| ### Registry | |
| ```bash | |
| sam3acr.azurecr.io/sam3-foundry:latest | |
| ``` | |
| ### Deployment Steps | |
| #### 1. Build and Push to Azure ACR | |
| ```bash | |
| # Login to Azure AI Foundry ACR | |
| az acr login --name sam3acr | |
| # Build with Azure AI Foundry tag | |
| docker build -t sam3acr.azurecr.io/sam3-foundry:latest . | |
| # Push to Azure ACR | |
| docker push sam3acr.azurecr.io/sam3-foundry:latest | |
| ``` | |
| #### 2. Deploy to Azure AI Foundry | |
| Using Azure CLI: | |
| ```bash | |
| # Create Azure AI Foundry endpoint | |
| az ml online-endpoint create \ | |
| --name sam3-foundry \ | |
| --resource-group productionline-test \ | |
| --workspace-name <your-workspace> | |
| # Create deployment | |
| az ml online-deployment create \ | |
| --name sam3-deployment \ | |
| --endpoint sam3-foundry \ | |
| --model-uri sam3acr.azurecr.io/sam3-foundry:latest \ | |
| --instance-type Standard_NC6s_v3 \ | |
| --instance-count 1 | |
| ``` | |
| Or using Azure Portal: | |
| 1. Navigate to Azure AI Foundry workspace | |
| 2. Go to **Endpoints** β **Real-time endpoints** | |
| 3. Click **Create** | |
| 4. Select **Custom container** | |
| 5. Image: `sam3acr.azurecr.io/sam3-foundry:latest` | |
| 6. Instance type: **Standard_NC6s_v3** (Tesla V100) | |
| 7. Deploy | |
| #### 3. Test Azure AI Foundry Endpoint | |
| ```python | |
| import requests | |
| import base64 | |
| # Get endpoint URL and key from Azure Portal | |
| ENDPOINT_URL = "https://<your-endpoint>.azureml.net/score" | |
| API_KEY = "<your-api-key>" | |
| with open("test.jpg", "rb") as f: | |
| image_b64 = base64.b64encode(f.read()).decode() | |
| response = requests.post( | |
| ENDPOINT_URL, | |
| json={ | |
| "inputs": image_b64, | |
| "parameters": {"classes": ["pothole", "asphalt"]} | |
| }, | |
| headers={"Authorization": f"Bearer {API_KEY}"} | |
| ) | |
| print(response.json()) | |
| ``` | |
| --- | |
| ## π Unified Deployment Workflow | |
| Since both platforms use the **same Docker image**, you can deploy to both simultaneously: | |
| ### Option 1: Separate Tags (Recommended) | |
| ```bash | |
| # Build once | |
| docker build -t sam3-base:latest . | |
| # Tag for HuggingFace | |
| docker tag sam3-base:latest sam3acr4hf.azurecr.io/sam3-hf:latest | |
| # Tag for Azure AI Foundry | |
| docker tag sam3-base:latest sam3acr.azurecr.io/sam3-foundry:latest | |
| # Push to both registries | |
| az acr login --name sam3acr4hf | |
| docker push sam3acr4hf.azurecr.io/sam3-hf:latest | |
| az acr login --name sam3acr | |
| docker push sam3acr.azurecr.io/sam3-foundry:latest | |
| ``` | |
| ### Option 2: Deploy Script | |
| Create `deploy_all.sh`: | |
| ```bash | |
| #!/bin/bash | |
| set -e | |
| echo "Building Docker image..." | |
| docker build -t sam3:latest . | |
| echo "Pushing to HuggingFace ACR..." | |
| docker tag sam3:latest sam3acr4hf.azurecr.io/sam3-hf:latest | |
| az acr login --name sam3acr4hf | |
| docker push sam3acr4hf.azurecr.io/sam3-hf:latest | |
| echo "Pushing to Azure AI Foundry ACR..." | |
| docker tag sam3:latest sam3acr.azurecr.io/sam3-foundry:latest | |
| az acr login --name sam3acr | |
| docker push sam3acr.azurecr.io/sam3-foundry:latest | |
| echo "β Deployed to both registries!" | |
| ``` | |
| --- | |
| ## π Platform Comparison | |
| | Feature | HuggingFace | Azure AI Foundry | | |
| |---------|-------------|------------------| | |
| | **GPU** | NVIDIA A10G (24GB) | Tesla V100 (16GB) or A100 | | |
| | **Auto-scaling** | β Yes (0-5 replicas) | β Yes (configurable) | | |
| | **Authentication** | Public or Token | API Key required | | |
| | **Pricing** | Per-second billing | Per-hour billing | | |
| | **Scale to Zero** | β Yes | β οΈ Limited support | | |
| | **Integration** | HuggingFace ecosystem | Azure ML ecosystem | | |
| | **Monitoring** | HF Dashboard | Azure Monitor | | |
| --- | |
| ## π§ Configuration Differences | |
| ### API Authentication | |
| **HuggingFace** (current - public): | |
| ```python | |
| response = requests.post(endpoint_url, json=payload) | |
| ``` | |
| **Azure AI Foundry** (requires key): | |
| ```python | |
| response = requests.post( | |
| endpoint_url, | |
| json=payload, | |
| headers={"Authorization": f"Bearer {api_key}"} | |
| ) | |
| ``` | |
| ### Environment Variables | |
| For Azure AI Foundry, you may need to add environment variables: | |
| ```dockerfile | |
| # Add to Dockerfile if needed for Azure | |
| ENV AZURE_AI_FOUNDRY=true | |
| ENV MLFLOW_TRACKING_URI=<your-mlflow-uri> | |
| ``` | |
| ### Health Check Endpoints | |
| Both platforms expect: | |
| - `GET /health` - Health check | |
| - `POST /` - Inference endpoint | |
| Our current `app.py` already supports both! β | |
| --- | |
| ## π§ͺ Testing Both Deployments | |
| Create `test_both_platforms.py`: | |
| ```python | |
| import requests | |
| import base64 | |
| def test_endpoint(name, url, api_key=None): | |
| """Test an endpoint""" | |
| print(f"\n{'='*60}") | |
| print(f"Testing {name}") | |
| print(f"{'='*60}") | |
| # Health check | |
| headers = {"Authorization": f"Bearer {api_key}"} if api_key else {} | |
| health = requests.get(f"{url}/health", headers=headers) | |
| print(f"Health: {health.status_code}") | |
| # Inference | |
| with open("test.jpg", "rb") as f: | |
| image_b64 = base64.b64encode(f.read()).decode() | |
| response = requests.post( | |
| url, | |
| json={ | |
| "inputs": image_b64, | |
| "parameters": {"classes": ["pothole", "asphalt"]} | |
| }, | |
| headers=headers | |
| ) | |
| print(f"Inference: {response.status_code}") | |
| if response.status_code == 200: | |
| results = response.json() | |
| print(f"β Generated {len(results)} masks") | |
| else: | |
| print(f"β Error: {response.text}") | |
| # Test HuggingFace | |
| test_endpoint( | |
| "HuggingFace", | |
| "https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud" | |
| ) | |
| # Test Azure AI Foundry (when deployed) | |
| # test_endpoint( | |
| # "Azure AI Foundry", | |
| # "https://<your-endpoint>.azureml.net/score", | |
| # api_key="<your-key>" | |
| # ) | |
| ``` | |
| --- | |
| ## π Deployment Checklist | |
| ### HuggingFace (Complete) β | |
| - [x] Azure Container Registry created (`sam3acr4hf`) | |
| - [x] Docker image built and pushed | |
| - [x] HuggingFace endpoint created | |
| - [x] Model validated with test image | |
| - [x] Documentation complete | |
| ### Azure AI Foundry (Pending GPU Quota) β³ | |
| - [x] Azure Container Registry exists (`sam3acr`) | |
| - [ ] GPU quota approved | |
| - [ ] Azure AI Foundry workspace created | |
| - [ ] Docker image pushed to `sam3acr` | |
| - [ ] Endpoint deployed | |
| - [ ] API key obtained | |
| - [ ] Endpoint validated | |
| --- | |
| ## π Troubleshooting | |
| ### HuggingFace Issues | |
| See main README.md troubleshooting section. | |
| ### Azure AI Foundry Issues | |
| **Issue**: GPU quota not available | |
| - **Solution**: Request quota increase in Azure Portal β Quotas β ML quotas | |
| **Issue**: Container registry authentication failed | |
| ```bash | |
| az acr login --name sam3acr --expose-token | |
| ``` | |
| **Issue**: Endpoint deployment fails | |
| - Check Azure Activity Log for detailed error | |
| - Verify image is accessible: `az acr repository show --name sam3acr --image sam3-foundry:latest` | |
| **Issue**: Model loading timeout | |
| - Increase deployment timeout in Azure ML Studio | |
| - Consider using smaller instance for testing | |
| --- | |
| ## π‘ Best Practices | |
| 1. **Use same Docker image** for both platforms to ensure consistency | |
| 2. **Tag images with versions** (e.g., `v1.0.0`) for rollback capability | |
| 3. **Test locally first** before pushing to registries | |
| 4. **Monitor costs** on both platforms (HF per-second, Azure per-hour) | |
| 5. **Set up alerts** for endpoint health on both platforms | |
| 6. **Keep API keys secure** (use Azure Key Vault for Azure AI Foundry) | |
| --- | |
| ## π Resources | |
| ### HuggingFace | |
| - [Inference Endpoints Docs](https://huggingface.co/docs/inference-endpoints) | |
| - [Custom Docker Images](https://huggingface.co/docs/inference-endpoints/guides/custom_container) | |
| ### Azure AI Foundry | |
| - [Azure ML Endpoints](https://learn.microsoft.com/azure/machine-learning/concept-endpoints) | |
| - [Deploy Custom Containers](https://learn.microsoft.com/azure/machine-learning/how-to-deploy-custom-container) | |
| - [GPU Quota Requests](https://learn.microsoft.com/azure/machine-learning/how-to-manage-quotas) | |
| --- | |
| **Last Updated**: 2025-11-22 | |
| **Next Step**: Deploy to Azure AI Foundry once GPU quota is approved | |