sam3

File size: 9,112 Bytes

# Dual Deployment Guide - HuggingFace & Azure AI Foundry

This repository supports deployment to both **HuggingFace Inference Endpoints** and **Azure AI Foundry** using the same codebase and Docker image.

## 📋 Deployment Overview

| Platform | Status | Container Registry | Endpoint |
|----------|--------|-------------------|----------|
| **HuggingFace** | ✅ Running | `sam3acr4hf.azurecr.io` | https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud |
| **Azure AI Foundry** | ⏳ Pending GPU Quota | `sam3acr.azurecr.io` | To be deployed |

Both deployments use the **same Docker image** with SAM3Model for static image segmentation.

---

## 🚀 HuggingFace Deployment (Current)

### Status
✅ **DEPLOYED AND RUNNING**

### Registry
```bash
sam3acr4hf.azurecr.io/sam3-hf:latest
```

### Quick Deploy
```bash
# Build and push
docker build -t sam3acr4hf.azurecr.io/sam3-hf:latest .
az acr login --name sam3acr4hf
docker push sam3acr4hf.azurecr.io/sam3-hf:latest

# Restart endpoint
python3 << 'EOF'
from huggingface_hub import HfApi
api = HfApi()
endpoint = api.get_inference_endpoint('sam3-segmentation', namespace='Logiroad')
endpoint.pause()
endpoint.resume()
EOF
```

### Configuration
- **Hardware**: NVIDIA A10G (24GB VRAM)
- **Organization**: Logiroad
- **Access**: Public
- **Auto-scaling**: Enabled (0-5 replicas)

---

## 🔷 Azure AI Foundry Deployment (Future)

### Status
⏳ **WAITING FOR GPU QUOTA**

Once GPU quota is approved, deploy using the same Docker image:

### Registry
```bash
sam3acr.azurecr.io/sam3-foundry:latest
```

### Deployment Steps

#### 1. Build and Push to Azure ACR

```bash
# Login to Azure AI Foundry ACR
az acr login --name sam3acr

# Build with Azure AI Foundry tag
docker build -t sam3acr.azurecr.io/sam3-foundry:latest .

# Push to Azure ACR
docker push sam3acr.azurecr.io/sam3-foundry:latest
```

#### 2. Deploy to Azure AI Foundry

Using Azure CLI:

```bash
# Create Azure AI Foundry endpoint
az ml online-endpoint create \
  --name sam3-foundry \
  --resource-group productionline-test \
  --workspace-name <your-workspace>

# Create deployment
az ml online-deployment create \
  --name sam3-deployment \
  --endpoint sam3-foundry \
  --model-uri sam3acr.azurecr.io/sam3-foundry:latest \
  --instance-type Standard_NC6s_v3 \
  --instance-count 1
```

Or using Azure Portal:
1. Navigate to Azure AI Foundry workspace
2. Go to **Endpoints** → **Real-time endpoints**
3. Click **Create**
4. Select **Custom container**
5. Image: `sam3acr.azurecr.io/sam3-foundry:latest`
6. Instance type: **Standard_NC6s_v3** (Tesla V100)
7. Deploy

#### 3. Test Azure AI Foundry Endpoint

```python
import requests
import base64

# Get endpoint URL and key from Azure Portal
ENDPOINT_URL = "https://<your-endpoint>.azureml.net/score"
API_KEY = "<your-api-key>"

with open("test.jpg", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

response = requests.post(
    ENDPOINT_URL,
    json={
        "inputs": image_b64,
        "parameters": {"classes": ["pothole", "asphalt"]}
    },
    headers={"Authorization": f"Bearer {API_KEY}"}
)

print(response.json())
```

---

## 🔄 Unified Deployment Workflow

Since both platforms use the **same Docker image**, you can deploy to both simultaneously:

### Option 1: Separate Tags (Recommended)

```bash
# Build once
docker build -t sam3-base:latest .

# Tag for HuggingFace
docker tag sam3-base:latest sam3acr4hf.azurecr.io/sam3-hf:latest

# Tag for Azure AI Foundry
docker tag sam3-base:latest sam3acr.azurecr.io/sam3-foundry:latest

# Push to both registries
az acr login --name sam3acr4hf
docker push sam3acr4hf.azurecr.io/sam3-hf:latest

az acr login --name sam3acr
docker push sam3acr.azurecr.io/sam3-foundry:latest
```

### Option 2: Deploy Script

Create `deploy_all.sh`:

```bash
#!/bin/bash
set -e

echo "Building Docker image..."
docker build -t sam3:latest .

echo "Pushing to HuggingFace ACR..."
docker tag sam3:latest sam3acr4hf.azurecr.io/sam3-hf:latest
az acr login --name sam3acr4hf
docker push sam3acr4hf.azurecr.io/sam3-hf:latest

echo "Pushing to Azure AI Foundry ACR..."
docker tag sam3:latest sam3acr.azurecr.io/sam3-foundry:latest
az acr login --name sam3acr
docker push sam3acr.azurecr.io/sam3-foundry:latest

echo "✅ Deployed to both registries!"
```

---

## 📊 Platform Comparison

| Feature | HuggingFace | Azure AI Foundry |
|---------|-------------|------------------|
| **GPU** | NVIDIA A10G (24GB) | Tesla V100 (16GB) or A100 |
| **Auto-scaling** | ✅ Yes (0-5 replicas) | ✅ Yes (configurable) |
| **Authentication** | Public or Token | API Key required |
| **Pricing** | Per-second billing | Per-hour billing |
| **Scale to Zero** | ✅ Yes | ⚠️ Limited support |
| **Integration** | HuggingFace ecosystem | Azure ML ecosystem |
| **Monitoring** | HF Dashboard | Azure Monitor |

---

## 🔧 Configuration Differences

### API Authentication

**HuggingFace** (current - public):
```python
response = requests.post(endpoint_url, json=payload)
```

**Azure AI Foundry** (requires key):
```python
response = requests.post(
    endpoint_url,
    json=payload,
    headers={"Authorization": f"Bearer {api_key}"}
)
```

### Environment Variables

For Azure AI Foundry, you may need to add environment variables:

```dockerfile
# Add to Dockerfile if needed for Azure
ENV AZURE_AI_FOUNDRY=true
ENV MLFLOW_TRACKING_URI=<your-mlflow-uri>
```

### Health Check Endpoints

Both platforms expect:
- `GET /health` - Health check
- `POST /` - Inference endpoint

Our current `app.py` already supports both! ✅

---

## 🧪 Testing Both Deployments

Create `test_both_platforms.py`:

```python
import requests
import base64

def test_endpoint(name, url, api_key=None):
    """Test an endpoint"""
    print(f"\n{'='*60}")
    print(f"Testing {name}")
    print(f"{'='*60}")

    # Health check
    headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}
    health = requests.get(f"{url}/health", headers=headers)
    print(f"Health: {health.status_code}")

    # Inference
    with open("test.jpg", "rb") as f:
        image_b64 = base64.b64encode(f.read()).decode()

    response = requests.post(
        url,
        json={
            "inputs": image_b64,
            "parameters": {"classes": ["pothole", "asphalt"]}
        },
        headers=headers
    )

    print(f"Inference: {response.status_code}")
    if response.status_code == 200:
        results = response.json()
        print(f"✅ Generated {len(results)} masks")
    else:
        print(f"❌ Error: {response.text}")

# Test HuggingFace
test_endpoint(
    "HuggingFace",
    "https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud"
)

# Test Azure AI Foundry (when deployed)
# test_endpoint(
#     "Azure AI Foundry",
#     "https://<your-endpoint>.azureml.net/score",
#     api_key="<your-key>"
# )
```

---

## 📝 Deployment Checklist

### HuggingFace (Complete) ✅
- [x] Azure Container Registry created (`sam3acr4hf`)
- [x] Docker image built and pushed
- [x] HuggingFace endpoint created
- [x] Model validated with test image
- [x] Documentation complete

### Azure AI Foundry (Pending GPU Quota) ⏳
- [x] Azure Container Registry exists (`sam3acr`)
- [ ] GPU quota approved
- [ ] Azure AI Foundry workspace created
- [ ] Docker image pushed to `sam3acr`
- [ ] Endpoint deployed
- [ ] API key obtained
- [ ] Endpoint validated

---

## 🆘 Troubleshooting

### HuggingFace Issues
See main README.md troubleshooting section.

### Azure AI Foundry Issues

**Issue**: GPU quota not available
- **Solution**: Request quota increase in Azure Portal → Quotas → ML quotas

**Issue**: Container registry authentication failed
```bash
az acr login --name sam3acr --expose-token
```

**Issue**: Endpoint deployment fails
- Check Azure Activity Log for detailed error
- Verify image is accessible: `az acr repository show --name sam3acr --image sam3-foundry:latest`

**Issue**: Model loading timeout
- Increase deployment timeout in Azure ML Studio
- Consider using smaller instance for testing

---

## 💡 Best Practices

1. **Use same Docker image** for both platforms to ensure consistency
2. **Tag images with versions** (e.g., `v1.0.0`) for rollback capability
3. **Test locally first** before pushing to registries
4. **Monitor costs** on both platforms (HF per-second, Azure per-hour)
5. **Set up alerts** for endpoint health on both platforms
6. **Keep API keys secure** (use Azure Key Vault for Azure AI Foundry)

---

## 📚 Resources

### HuggingFace
- [Inference Endpoints Docs](https://huggingface.co/docs/inference-endpoints)
- [Custom Docker Images](https://huggingface.co/docs/inference-endpoints/guides/custom_container)

### Azure AI Foundry
- [Azure ML Endpoints](https://learn.microsoft.com/azure/machine-learning/concept-endpoints)
- [Deploy Custom Containers](https://learn.microsoft.com/azure/machine-learning/how-to-deploy-custom-container)
- [GPU Quota Requests](https://learn.microsoft.com/azure/machine-learning/how-to-manage-quotas)

---

**Last Updated**: 2025-11-22
**Next Step**: Deploy to Azure AI Foundry once GPU quota is approved