ASR-finetuning / DEPLOYMENT_GUIDE.md
saadmannan's picture
HF space application - exclude binary PDFs
5554ef1

A newer version of the Gradio SDK is available: 6.4.0

Upgrade

Complete Deployment Guide

Table of Contents

  1. Local Development
  2. Docker Deployment
  3. HuggingFace Spaces
  4. AWS Deployment
  5. Google Cloud
  6. Azure Deployment

Local Development

Prerequisites

# System requirements
- Python 3.10+
- FFmpeg
- 4GB+ RAM
- (Optional) CUDA-capable GPU

Setup

# 1. Clone repository
git clone https://github.com/YOUR_USERNAME/whisper-german-asr.git
cd whisper-german-asr

# 2. Run quick start script
chmod +x scripts/quick_start.sh
./scripts/quick_start.sh

# 3. Start services
# Option A: Gradio Demo
python demo/app.py

# Option B: FastAPI
uvicorn api.main:app --reload

# Option C: Both (separate terminals)
python demo/app.py &
uvicorn api.main:app --port 8000 &

Testing

# Test API
curl -X POST "http://localhost:8000/transcribe" \
  -F "file=@test_audio.wav"

# Test Demo
# Open http://localhost:7860 in browser

Docker Deployment

Quick Start

# Build and run with docker-compose
docker-compose up -d

# View logs
docker-compose logs -f

# Stop services
docker-compose down

Manual Docker Build

# Build image
docker build -t whisper-asr .

# Run API
docker run -d \
  -p 8000:8000 \
  -v $(pwd)/whisper_test_tuned:/app/whisper_test_tuned:ro \
  --name whisper-api \
  whisper-asr

# Run Demo
docker run -d \
  -p 7860:7860 \
  -v $(pwd)/whisper_test_tuned:/app/whisper_test_tuned:ro \
  --name whisper-demo \
  whisper-asr python demo/app.py

Docker with GPU

# Install nvidia-docker2
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

# Run with GPU
docker run -d \
  --gpus all \
  -p 8000:8000 \
  -v $(pwd)/whisper_test_tuned:/app/whisper_test_tuned:ro \
  whisper-asr

HuggingFace Spaces

Method 1: Gradio Space (Recommended)

Step 1: Create Space

  1. Go to https://huggingface.co/spaces
  2. Click "Create new Space"
  3. Settings:
    • Name: whisper-german-asr
    • SDK: Gradio
    • Hardware: CPU Basic (free) or GPU T4 (paid)
    • Visibility: Public

Step 2: Prepare Files

# Create a new directory for Space
mkdir hf-space
cd hf-space

# Copy demo app
cp ../demo/app.py app.py

# Create requirements.txt
cat > requirements.txt << EOF
torch>=2.2.0
transformers>=4.42.0
librosa>=0.10.1
gradio>=4.0.0
soundfile>=0.12.1
EOF

# Create README.md with frontmatter
cat > README.md << EOF
---
title: Whisper German ASR
emoji: 🎙️
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
---

# Whisper German ASR

Fine-tuned Whisper model for German speech recognition.

Try it out by recording or uploading German audio!
EOF

Step 3: Update app.py

# Modify model loading to use HF Hub
def load_model(model_path="YOUR_USERNAME/whisper-small-german"):
    model = WhisperForConditionalGeneration.from_pretrained(model_path)
    processor = WhisperProcessor.from_pretrained(model_path)
    # ... rest of code

Step 4: Push Model to HF Hub (First Time)

# In Python
from transformers import WhisperForConditionalGeneration, WhisperProcessor

model = WhisperForConditionalGeneration.from_pretrained("./whisper_test_tuned")
processor = WhisperProcessor.from_pretrained("openai/whisper-small")

# Push to Hub
model.push_to_hub("YOUR_USERNAME/whisper-small-german")
processor.push_to_hub("YOUR_USERNAME/whisper-small-german")

Step 5: Deploy to Space

# Clone Space repository
git clone https://huggingface.co/spaces/YOUR_USERNAME/whisper-german-asr
cd whisper-german-asr

# Copy files
cp ../hf-space/* .

# Push to Space
git add .
git commit -m "Initial deployment"
git push

Method 2: Docker Space

# Create Dockerfile in Space
FROM python:3.10-slim

WORKDIR /app

RUN apt-get update && apt-get install -y ffmpeg libsndfile1

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY app.py .

CMD ["python", "app.py"]

AWS Deployment

Option 1: ECS Fargate

Step 1: Push Docker Image to ECR

# Create ECR repository
aws ecr create-repository --repository-name whisper-asr

# Login to ECR
aws ecr get-login-password --region us-east-1 | \
  docker login --username AWS --password-stdin \
  YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com

# Tag and push
docker tag whisper-asr:latest \
  YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/whisper-asr:latest
docker push YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/whisper-asr:latest

Step 2: Create ECS Task Definition

{
  "family": "whisper-asr",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "1024",
  "memory": "2048",
  "containerDefinitions": [
    {
      "name": "whisper-api",
      "image": "YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/whisper-asr:latest",
      "portMappings": [
        {
          "containerPort": 8000,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {
          "name": "MODEL_PATH",
          "value": "/app/whisper_test_tuned"
        }
      ]
    }
  ]
}

Step 3: Create ECS Service

aws ecs create-service \
  --cluster default \
  --service-name whisper-asr \
  --task-definition whisper-asr \
  --desired-count 1 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-xxx],securityGroups=[sg-xxx],assignPublicIp=ENABLED}"

Option 2: Lambda + API Gateway

# lambda_function.py
import json
import base64
from transformers import WhisperForConditionalGeneration, WhisperProcessor
import librosa
import io

model = None
processor = None

def load_model():
    global model, processor
    if model is None:
        model = WhisperForConditionalGeneration.from_pretrained("/tmp/model")
        processor = WhisperProcessor.from_pretrained("openai/whisper-small")

def lambda_handler(event, context):
    load_model()
    
    # Decode base64 audio
    audio_data = base64.b64decode(event['body'])
    audio, sr = librosa.load(io.BytesIO(audio_data), sr=16000)
    
    # Transcribe
    input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
    predicted_ids = model.generate(input_features)
    transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
    
    return {
        'statusCode': 200,
        'body': json.dumps({'transcription': transcription})
    }

Google Cloud

Cloud Run Deployment

Step 1: Build and Push to GCR

# Enable APIs
gcloud services enable run.googleapis.com
gcloud services enable containerregistry.googleapis.com

# Build image
gcloud builds submit --tag gcr.io/PROJECT_ID/whisper-asr

# Or use Docker
docker tag whisper-asr gcr.io/PROJECT_ID/whisper-asr
docker push gcr.io/PROJECT_ID/whisper-asr

Step 2: Deploy to Cloud Run

gcloud run deploy whisper-asr \
  --image gcr.io/PROJECT_ID/whisper-asr \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --memory 2Gi \
  --cpu 2 \
  --timeout 300

Step 3: Get Service URL

gcloud run services describe whisper-asr \
  --platform managed \
  --region us-central1 \
  --format 'value(status.url)'

Azure Deployment

Azure Container Instances

Step 1: Push to Azure Container Registry

# Create ACR
az acr create --resource-group myResourceGroup \
  --name whisperasr --sku Basic

# Login
az acr login --name whisperasr

# Tag and push
docker tag whisper-asr whisperasr.azurecr.io/whisper-asr:latest
docker push whisperasr.azurecr.io/whisper-asr:latest

Step 2: Deploy Container Instance

az container create \
  --resource-group myResourceGroup \
  --name whisper-asr \
  --image whisperasr.azurecr.io/whisper-asr:latest \
  --cpu 2 \
  --memory 4 \
  --registry-login-server whisperasr.azurecr.io \
  --registry-username <username> \
  --registry-password <password> \
  --dns-name-label whisper-asr \
  --ports 8000

Production Considerations

Security

  • Use HTTPS (SSL/TLS certificates)
  • Implement rate limiting
  • Add authentication/API keys
  • Validate file uploads
  • Set CORS policies

Monitoring

  • Setup logging (CloudWatch, Stackdriver, etc.)
  • Add health checks
  • Monitor latency and errors
  • Track usage metrics

Scaling

  • Configure auto-scaling
  • Use load balancer
  • Implement caching
  • Consider CDN for static assets

Cost Optimization

  • Use spot/preemptible instances
  • Implement request batching
  • Cache model in memory
  • Monitor and optimize resource usage

Troubleshooting

Common Issues

Model Not Loading

# Check model path
ls -la whisper_test_tuned/

# Check permissions
chmod -R 755 whisper_test_tuned/

Out of Memory

# Reduce batch size
# Use CPU instead of GPU
# Increase container memory

Slow Inference

# Use GPU
# Reduce beam size
# Use smaller model
# Implement caching

Port Already in Use

# Find process
lsof -i :8000

# Kill process
kill -9 <PID>

# Use different port
uvicorn api.main:app --port 8001

Next Steps

  1. Choose deployment platform
  2. Setup CI/CD pipeline
  3. Configure monitoring
  4. Test in production
  5. Optimize performance
  6. Scale as needed

For more help, see: