sam3 / docs /DEPLOYMENT.md
Thibaut's picture
Update HuggingFace endpoint URL after clean rebuild
b2e88b8
# Dual Deployment Guide - HuggingFace & Azure AI Foundry
This repository supports deployment to both **HuggingFace Inference Endpoints** and **Azure AI Foundry** using the same codebase and Docker image.
## πŸ“‹ Deployment Overview
| Platform | Status | Container Registry | Endpoint |
|----------|--------|-------------------|----------|
| **HuggingFace** | βœ… Running | `sam3acr4hf.azurecr.io` | https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud |
| **Azure AI Foundry** | ⏳ Pending GPU Quota | `sam3acr.azurecr.io` | To be deployed |
Both deployments use the **same Docker image** with SAM3Model for static image segmentation.
---
## πŸš€ HuggingFace Deployment (Current)
### Status
βœ… **DEPLOYED AND RUNNING**
### Registry
```bash
sam3acr4hf.azurecr.io/sam3-hf:latest
```
### Quick Deploy
```bash
# Build and push
docker build -t sam3acr4hf.azurecr.io/sam3-hf:latest .
az acr login --name sam3acr4hf
docker push sam3acr4hf.azurecr.io/sam3-hf:latest
# Restart endpoint
python3 << 'EOF'
from huggingface_hub import HfApi
api = HfApi()
endpoint = api.get_inference_endpoint('sam3-segmentation', namespace='Logiroad')
endpoint.pause()
endpoint.resume()
EOF
```
### Configuration
- **Hardware**: NVIDIA A10G (24GB VRAM)
- **Organization**: Logiroad
- **Access**: Public
- **Auto-scaling**: Enabled (0-5 replicas)
---
## πŸ”· Azure AI Foundry Deployment (Future)
### Status
⏳ **WAITING FOR GPU QUOTA**
Once GPU quota is approved, deploy using the same Docker image:
### Registry
```bash
sam3acr.azurecr.io/sam3-foundry:latest
```
### Deployment Steps
#### 1. Build and Push to Azure ACR
```bash
# Login to Azure AI Foundry ACR
az acr login --name sam3acr
# Build with Azure AI Foundry tag
docker build -t sam3acr.azurecr.io/sam3-foundry:latest .
# Push to Azure ACR
docker push sam3acr.azurecr.io/sam3-foundry:latest
```
#### 2. Deploy to Azure AI Foundry
Using Azure CLI:
```bash
# Create Azure AI Foundry endpoint
az ml online-endpoint create \
--name sam3-foundry \
--resource-group productionline-test \
--workspace-name <your-workspace>
# Create deployment
az ml online-deployment create \
--name sam3-deployment \
--endpoint sam3-foundry \
--model-uri sam3acr.azurecr.io/sam3-foundry:latest \
--instance-type Standard_NC6s_v3 \
--instance-count 1
```
Or using Azure Portal:
1. Navigate to Azure AI Foundry workspace
2. Go to **Endpoints** β†’ **Real-time endpoints**
3. Click **Create**
4. Select **Custom container**
5. Image: `sam3acr.azurecr.io/sam3-foundry:latest`
6. Instance type: **Standard_NC6s_v3** (Tesla V100)
7. Deploy
#### 3. Test Azure AI Foundry Endpoint
```python
import requests
import base64
# Get endpoint URL and key from Azure Portal
ENDPOINT_URL = "https://<your-endpoint>.azureml.net/score"
API_KEY = "<your-api-key>"
with open("test.jpg", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
response = requests.post(
ENDPOINT_URL,
json={
"inputs": image_b64,
"parameters": {"classes": ["pothole", "asphalt"]}
},
headers={"Authorization": f"Bearer {API_KEY}"}
)
print(response.json())
```
---
## πŸ”„ Unified Deployment Workflow
Since both platforms use the **same Docker image**, you can deploy to both simultaneously:
### Option 1: Separate Tags (Recommended)
```bash
# Build once
docker build -t sam3-base:latest .
# Tag for HuggingFace
docker tag sam3-base:latest sam3acr4hf.azurecr.io/sam3-hf:latest
# Tag for Azure AI Foundry
docker tag sam3-base:latest sam3acr.azurecr.io/sam3-foundry:latest
# Push to both registries
az acr login --name sam3acr4hf
docker push sam3acr4hf.azurecr.io/sam3-hf:latest
az acr login --name sam3acr
docker push sam3acr.azurecr.io/sam3-foundry:latest
```
### Option 2: Deploy Script
Create `deploy_all.sh`:
```bash
#!/bin/bash
set -e
echo "Building Docker image..."
docker build -t sam3:latest .
echo "Pushing to HuggingFace ACR..."
docker tag sam3:latest sam3acr4hf.azurecr.io/sam3-hf:latest
az acr login --name sam3acr4hf
docker push sam3acr4hf.azurecr.io/sam3-hf:latest
echo "Pushing to Azure AI Foundry ACR..."
docker tag sam3:latest sam3acr.azurecr.io/sam3-foundry:latest
az acr login --name sam3acr
docker push sam3acr.azurecr.io/sam3-foundry:latest
echo "βœ… Deployed to both registries!"
```
---
## πŸ“Š Platform Comparison
| Feature | HuggingFace | Azure AI Foundry |
|---------|-------------|------------------|
| **GPU** | NVIDIA A10G (24GB) | Tesla V100 (16GB) or A100 |
| **Auto-scaling** | βœ… Yes (0-5 replicas) | βœ… Yes (configurable) |
| **Authentication** | Public or Token | API Key required |
| **Pricing** | Per-second billing | Per-hour billing |
| **Scale to Zero** | βœ… Yes | ⚠️ Limited support |
| **Integration** | HuggingFace ecosystem | Azure ML ecosystem |
| **Monitoring** | HF Dashboard | Azure Monitor |
---
## πŸ”§ Configuration Differences
### API Authentication
**HuggingFace** (current - public):
```python
response = requests.post(endpoint_url, json=payload)
```
**Azure AI Foundry** (requires key):
```python
response = requests.post(
endpoint_url,
json=payload,
headers={"Authorization": f"Bearer {api_key}"}
)
```
### Environment Variables
For Azure AI Foundry, you may need to add environment variables:
```dockerfile
# Add to Dockerfile if needed for Azure
ENV AZURE_AI_FOUNDRY=true
ENV MLFLOW_TRACKING_URI=<your-mlflow-uri>
```
### Health Check Endpoints
Both platforms expect:
- `GET /health` - Health check
- `POST /` - Inference endpoint
Our current `app.py` already supports both! βœ…
---
## πŸ§ͺ Testing Both Deployments
Create `test_both_platforms.py`:
```python
import requests
import base64
def test_endpoint(name, url, api_key=None):
"""Test an endpoint"""
print(f"\n{'='*60}")
print(f"Testing {name}")
print(f"{'='*60}")
# Health check
headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}
health = requests.get(f"{url}/health", headers=headers)
print(f"Health: {health.status_code}")
# Inference
with open("test.jpg", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
response = requests.post(
url,
json={
"inputs": image_b64,
"parameters": {"classes": ["pothole", "asphalt"]}
},
headers=headers
)
print(f"Inference: {response.status_code}")
if response.status_code == 200:
results = response.json()
print(f"βœ… Generated {len(results)} masks")
else:
print(f"❌ Error: {response.text}")
# Test HuggingFace
test_endpoint(
"HuggingFace",
"https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud"
)
# Test Azure AI Foundry (when deployed)
# test_endpoint(
# "Azure AI Foundry",
# "https://<your-endpoint>.azureml.net/score",
# api_key="<your-key>"
# )
```
---
## πŸ“ Deployment Checklist
### HuggingFace (Complete) βœ…
- [x] Azure Container Registry created (`sam3acr4hf`)
- [x] Docker image built and pushed
- [x] HuggingFace endpoint created
- [x] Model validated with test image
- [x] Documentation complete
### Azure AI Foundry (Pending GPU Quota) ⏳
- [x] Azure Container Registry exists (`sam3acr`)
- [ ] GPU quota approved
- [ ] Azure AI Foundry workspace created
- [ ] Docker image pushed to `sam3acr`
- [ ] Endpoint deployed
- [ ] API key obtained
- [ ] Endpoint validated
---
## πŸ†˜ Troubleshooting
### HuggingFace Issues
See main README.md troubleshooting section.
### Azure AI Foundry Issues
**Issue**: GPU quota not available
- **Solution**: Request quota increase in Azure Portal β†’ Quotas β†’ ML quotas
**Issue**: Container registry authentication failed
```bash
az acr login --name sam3acr --expose-token
```
**Issue**: Endpoint deployment fails
- Check Azure Activity Log for detailed error
- Verify image is accessible: `az acr repository show --name sam3acr --image sam3-foundry:latest`
**Issue**: Model loading timeout
- Increase deployment timeout in Azure ML Studio
- Consider using smaller instance for testing
---
## πŸ’‘ Best Practices
1. **Use same Docker image** for both platforms to ensure consistency
2. **Tag images with versions** (e.g., `v1.0.0`) for rollback capability
3. **Test locally first** before pushing to registries
4. **Monitor costs** on both platforms (HF per-second, Azure per-hour)
5. **Set up alerts** for endpoint health on both platforms
6. **Keep API keys secure** (use Azure Key Vault for Azure AI Foundry)
---
## πŸ“š Resources
### HuggingFace
- [Inference Endpoints Docs](https://huggingface.co/docs/inference-endpoints)
- [Custom Docker Images](https://huggingface.co/docs/inference-endpoints/guides/custom_container)
### Azure AI Foundry
- [Azure ML Endpoints](https://learn.microsoft.com/azure/machine-learning/concept-endpoints)
- [Deploy Custom Containers](https://learn.microsoft.com/azure/machine-learning/how-to-deploy-custom-container)
- [GPU Quota Requests](https://learn.microsoft.com/azure/machine-learning/how-to-manage-quotas)
---
**Last Updated**: 2025-11-22
**Next Step**: Deploy to Azure AI Foundry once GPU quota is approved