sam3 / docs /DEPLOYMENT.md

Update HuggingFace endpoint URL after clean rebuild

b2e88b8 2 months ago

9.11 kB

	# Dual Deployment Guide - HuggingFace & Azure AI Foundry

	This repository supports deployment to both HuggingFace Inference Endpoints and Azure AI Foundry using the same codebase and Docker image.

	## 📋 Deployment Overview

	\| Platform \| Status \| Container Registry \| Endpoint \|
	\|----------\|--------\|-------------------\|----------\|
	\| HuggingFace \| ✅ Running \| `sam3acr4hf.azurecr.io` \| https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud \|
	\| Azure AI Foundry \| ⏳ Pending GPU Quota \| `sam3acr.azurecr.io` \| To be deployed \|

	Both deployments use the same Docker image with SAM3Model for static image segmentation.

	---

	## 🚀 HuggingFace Deployment (Current)

	### Status
	✅ DEPLOYED AND RUNNING

	### Registry
	```bash
	sam3acr4hf.azurecr.io/sam3-hf:latest
	```

	### Quick Deploy
	```bash
	# Build and push
	docker build -t sam3acr4hf.azurecr.io/sam3-hf:latest .
	az acr login --name sam3acr4hf
	docker push sam3acr4hf.azurecr.io/sam3-hf:latest

	# Restart endpoint
	python3 << 'EOF'
	from huggingface_hub import HfApi
	api = HfApi()
	endpoint = api.get_inference_endpoint('sam3-segmentation', namespace='Logiroad')
	endpoint.pause()
	endpoint.resume()
	EOF
	```

	### Configuration
	- Hardware: NVIDIA A10G (24GB VRAM)
	- Organization: Logiroad
	- Access: Public
	- Auto-scaling: Enabled (0-5 replicas)

	---

	## 🔷 Azure AI Foundry Deployment (Future)

	### Status
	⏳ WAITING FOR GPU QUOTA

	Once GPU quota is approved, deploy using the same Docker image:

	### Registry
	```bash
	sam3acr.azurecr.io/sam3-foundry:latest
	```

	### Deployment Steps

	#### 1. Build and Push to Azure ACR

	```bash
	# Login to Azure AI Foundry ACR
	az acr login --name sam3acr

	# Build with Azure AI Foundry tag
	docker build -t sam3acr.azurecr.io/sam3-foundry:latest .

	# Push to Azure ACR
	docker push sam3acr.azurecr.io/sam3-foundry:latest
	```

	#### 2. Deploy to Azure AI Foundry

	Using Azure CLI:

	```bash
	# Create Azure AI Foundry endpoint
	az ml online-endpoint create \
	--name sam3-foundry \
	--resource-group productionline-test \
	--workspace-name <your-workspace>

	# Create deployment
	az ml online-deployment create \
	--name sam3-deployment \
	--endpoint sam3-foundry \
	--model-uri sam3acr.azurecr.io/sam3-foundry:latest \
	--instance-type Standard_NC6s_v3 \
	--instance-count 1
	```

	Or using Azure Portal:
	1. Navigate to Azure AI Foundry workspace
	2. Go to Endpoints → Real-time endpoints
	3. Click Create
	4. Select Custom container
	5. Image: `sam3acr.azurecr.io/sam3-foundry:latest`
	6. Instance type: Standard_NC6s_v3 (Tesla V100)
	7. Deploy

	#### 3. Test Azure AI Foundry Endpoint

	```python
	import requests
	import base64

	# Get endpoint URL and key from Azure Portal
	ENDPOINT_URL = "https://<your-endpoint>.azureml.net/score"
	API_KEY = "<your-api-key>"

	with open("test.jpg", "rb") as f:
	image_b64 = base64.b64encode(f.read()).decode()

	response = requests.post(
	ENDPOINT_URL,
	json={
	"inputs": image_b64,
	"parameters": {"classes": ["pothole", "asphalt"]}
	},
	headers={"Authorization": f"Bearer {API_KEY}"}
	)

	print(response.json())
	```

	---

	## 🔄 Unified Deployment Workflow

	Since both platforms use the same Docker image, you can deploy to both simultaneously:

	### Option 1: Separate Tags (Recommended)

	```bash
	# Build once
	docker build -t sam3-base:latest .

	# Tag for HuggingFace
	docker tag sam3-base:latest sam3acr4hf.azurecr.io/sam3-hf:latest

	# Tag for Azure AI Foundry
	docker tag sam3-base:latest sam3acr.azurecr.io/sam3-foundry:latest

	# Push to both registries
	az acr login --name sam3acr4hf
	docker push sam3acr4hf.azurecr.io/sam3-hf:latest

	az acr login --name sam3acr
	docker push sam3acr.azurecr.io/sam3-foundry:latest
	```

	### Option 2: Deploy Script

	Create `deploy_all.sh`:

	```bash
	#!/bin/bash
	set -e

	echo "Building Docker image..."
	docker build -t sam3:latest .

	echo "Pushing to HuggingFace ACR..."
	docker tag sam3:latest sam3acr4hf.azurecr.io/sam3-hf:latest
	az acr login --name sam3acr4hf
	docker push sam3acr4hf.azurecr.io/sam3-hf:latest

	echo "Pushing to Azure AI Foundry ACR..."
	docker tag sam3:latest sam3acr.azurecr.io/sam3-foundry:latest
	az acr login --name sam3acr
	docker push sam3acr.azurecr.io/sam3-foundry:latest

	echo "✅ Deployed to both registries!"
	```

	---

	## 📊 Platform Comparison

	\| Feature \| HuggingFace \| Azure AI Foundry \|
	\|---------\|-------------\|------------------\|
	\| GPU \| NVIDIA A10G (24GB) \| Tesla V100 (16GB) or A100 \|
	\| Auto-scaling \| ✅ Yes (0-5 replicas) \| ✅ Yes (configurable) \|
	\| Authentication \| Public or Token \| API Key required \|
	\| Pricing \| Per-second billing \| Per-hour billing \|
	\| Scale to Zero \| ✅ Yes \| ⚠️ Limited support \|
	\| Integration \| HuggingFace ecosystem \| Azure ML ecosystem \|
	\| Monitoring \| HF Dashboard \| Azure Monitor \|

	---

	## 🔧 Configuration Differences

	### API Authentication

	HuggingFace (current - public):
	```python
	response = requests.post(endpoint_url, json=payload)
	```

	Azure AI Foundry (requires key):
	```python
	response = requests.post(
	endpoint_url,
	json=payload,
	headers={"Authorization": f"Bearer {api_key}"}
	)
	```

	### Environment Variables

	For Azure AI Foundry, you may need to add environment variables:

	```dockerfile
	# Add to Dockerfile if needed for Azure
	ENV AZURE_AI_FOUNDRY=true
	ENV MLFLOW_TRACKING_URI=<your-mlflow-uri>
	```

	### Health Check Endpoints

	Both platforms expect:
	- `GET /health` - Health check
	- `POST /` - Inference endpoint

	Our current `app.py` already supports both! ✅

	---

	## 🧪 Testing Both Deployments

	Create `test_both_platforms.py`:

	```python
	import requests
	import base64

	def test_endpoint(name, url, api_key=None):
	"""Test an endpoint"""
	print(f"\n{'='*60}")
	print(f"Testing {name}")
	print(f"{'='*60}")

	# Health check
	headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}
	health = requests.get(f"{url}/health", headers=headers)
	print(f"Health: {health.status_code}")

	# Inference
	with open("test.jpg", "rb") as f:
	image_b64 = base64.b64encode(f.read()).decode()

	response = requests.post(
	url,
	json={
	"inputs": image_b64,
	"parameters": {"classes": ["pothole", "asphalt"]}
	},
	headers=headers
	)

	print(f"Inference: {response.status_code}")
	if response.status_code == 200:
	results = response.json()
	print(f"✅ Generated {len(results)} masks")
	else:
	print(f"❌ Error: {response.text}")

	# Test HuggingFace
	test_endpoint(
	"HuggingFace",
	"https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud"
	)

	# Test Azure AI Foundry (when deployed)
	# test_endpoint(
	# "Azure AI Foundry",
	# "https://<your-endpoint>.azureml.net/score",
	# api_key="<your-key>"
	# )
	```

	---

	## 📝 Deployment Checklist

	### HuggingFace (Complete) ✅
	- [x] Azure Container Registry created (`sam3acr4hf`)
	- [x] Docker image built and pushed
	- [x] HuggingFace endpoint created
	- [x] Model validated with test image
	- [x] Documentation complete

	### Azure AI Foundry (Pending GPU Quota) ⏳
	- [x] Azure Container Registry exists (`sam3acr`)
	- [ ] GPU quota approved
	- [ ] Azure AI Foundry workspace created
	- [ ] Docker image pushed to `sam3acr`
	- [ ] Endpoint deployed
	- [ ] API key obtained
	- [ ] Endpoint validated

	---

	## 🆘 Troubleshooting

	### HuggingFace Issues
	See main README.md troubleshooting section.

	### Azure AI Foundry Issues

	Issue: GPU quota not available
	- Solution: Request quota increase in Azure Portal → Quotas → ML quotas

	Issue: Container registry authentication failed
	```bash
	az acr login --name sam3acr --expose-token
	```

	Issue: Endpoint deployment fails
	- Check Azure Activity Log for detailed error
	- Verify image is accessible: `az acr repository show --name sam3acr --image sam3-foundry:latest`

	Issue: Model loading timeout
	- Increase deployment timeout in Azure ML Studio
	- Consider using smaller instance for testing

	---

	## 💡 Best Practices

	1. Use same Docker image for both platforms to ensure consistency
	2. Tag images with versions (e.g., `v1.0.0`) for rollback capability
	3. Test locally first before pushing to registries
	4. Monitor costs on both platforms (HF per-second, Azure per-hour)
	5. Set up alerts for endpoint health on both platforms
	6. Keep API keys secure (use Azure Key Vault for Azure AI Foundry)

	---

	## 📚 Resources

	### HuggingFace
	- [Inference Endpoints Docs](https://huggingface.co/docs/inference-endpoints)
	- [Custom Docker Images](https://huggingface.co/docs/inference-endpoints/guides/custom_container)

	### Azure AI Foundry
	- [Azure ML Endpoints](https://learn.microsoft.com/azure/machine-learning/concept-endpoints)
	- [Deploy Custom Containers](https://learn.microsoft.com/azure/machine-learning/how-to-deploy-custom-container)
	- [GPU Quota Requests](https://learn.microsoft.com/azure/machine-learning/how-to-manage-quotas)

	---

	Last Updated: 2025-11-22
	Next Step: Deploy to Azure AI Foundry once GPU quota is approved