Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.4.0
Complete Deployment Guide
Table of Contents
Local Development
Prerequisites
# System requirements
- Python 3.10+
- FFmpeg
- 4GB+ RAM
- (Optional) CUDA-capable GPU
Setup
# 1. Clone repository
git clone https://github.com/YOUR_USERNAME/whisper-german-asr.git
cd whisper-german-asr
# 2. Run quick start script
chmod +x scripts/quick_start.sh
./scripts/quick_start.sh
# 3. Start services
# Option A: Gradio Demo
python demo/app.py
# Option B: FastAPI
uvicorn api.main:app --reload
# Option C: Both (separate terminals)
python demo/app.py &
uvicorn api.main:app --port 8000 &
Testing
# Test API
curl -X POST "http://localhost:8000/transcribe" \
-F "file=@test_audio.wav"
# Test Demo
# Open http://localhost:7860 in browser
Docker Deployment
Quick Start
# Build and run with docker-compose
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose down
Manual Docker Build
# Build image
docker build -t whisper-asr .
# Run API
docker run -d \
-p 8000:8000 \
-v $(pwd)/whisper_test_tuned:/app/whisper_test_tuned:ro \
--name whisper-api \
whisper-asr
# Run Demo
docker run -d \
-p 7860:7860 \
-v $(pwd)/whisper_test_tuned:/app/whisper_test_tuned:ro \
--name whisper-demo \
whisper-asr python demo/app.py
Docker with GPU
# Install nvidia-docker2
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
# Run with GPU
docker run -d \
--gpus all \
-p 8000:8000 \
-v $(pwd)/whisper_test_tuned:/app/whisper_test_tuned:ro \
whisper-asr
HuggingFace Spaces
Method 1: Gradio Space (Recommended)
Step 1: Create Space
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Settings:
- Name: whisper-german-asr
- SDK: Gradio
- Hardware: CPU Basic (free) or GPU T4 (paid)
- Visibility: Public
Step 2: Prepare Files
# Create a new directory for Space
mkdir hf-space
cd hf-space
# Copy demo app
cp ../demo/app.py app.py
# Create requirements.txt
cat > requirements.txt << EOF
torch>=2.2.0
transformers>=4.42.0
librosa>=0.10.1
gradio>=4.0.0
soundfile>=0.12.1
EOF
# Create README.md with frontmatter
cat > README.md << EOF
---
title: Whisper German ASR
emoji: 🎙️
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
---
# Whisper German ASR
Fine-tuned Whisper model for German speech recognition.
Try it out by recording or uploading German audio!
EOF
Step 3: Update app.py
# Modify model loading to use HF Hub
def load_model(model_path="YOUR_USERNAME/whisper-small-german"):
model = WhisperForConditionalGeneration.from_pretrained(model_path)
processor = WhisperProcessor.from_pretrained(model_path)
# ... rest of code
Step 4: Push Model to HF Hub (First Time)
# In Python
from transformers import WhisperForConditionalGeneration, WhisperProcessor
model = WhisperForConditionalGeneration.from_pretrained("./whisper_test_tuned")
processor = WhisperProcessor.from_pretrained("openai/whisper-small")
# Push to Hub
model.push_to_hub("YOUR_USERNAME/whisper-small-german")
processor.push_to_hub("YOUR_USERNAME/whisper-small-german")
Step 5: Deploy to Space
# Clone Space repository
git clone https://huggingface.co/spaces/YOUR_USERNAME/whisper-german-asr
cd whisper-german-asr
# Copy files
cp ../hf-space/* .
# Push to Space
git add .
git commit -m "Initial deployment"
git push
Method 2: Docker Space
# Create Dockerfile in Space
FROM python:3.10-slim
WORKDIR /app
RUN apt-get update && apt-get install -y ffmpeg libsndfile1
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY app.py .
CMD ["python", "app.py"]
AWS Deployment
Option 1: ECS Fargate
Step 1: Push Docker Image to ECR
# Create ECR repository
aws ecr create-repository --repository-name whisper-asr
# Login to ECR
aws ecr get-login-password --region us-east-1 | \
docker login --username AWS --password-stdin \
YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com
# Tag and push
docker tag whisper-asr:latest \
YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/whisper-asr:latest
docker push YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/whisper-asr:latest
Step 2: Create ECS Task Definition
{
"family": "whisper-asr",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "1024",
"memory": "2048",
"containerDefinitions": [
{
"name": "whisper-api",
"image": "YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/whisper-asr:latest",
"portMappings": [
{
"containerPort": 8000,
"protocol": "tcp"
}
],
"environment": [
{
"name": "MODEL_PATH",
"value": "/app/whisper_test_tuned"
}
]
}
]
}
Step 3: Create ECS Service
aws ecs create-service \
--cluster default \
--service-name whisper-asr \
--task-definition whisper-asr \
--desired-count 1 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[subnet-xxx],securityGroups=[sg-xxx],assignPublicIp=ENABLED}"
Option 2: Lambda + API Gateway
# lambda_function.py
import json
import base64
from transformers import WhisperForConditionalGeneration, WhisperProcessor
import librosa
import io
model = None
processor = None
def load_model():
global model, processor
if model is None:
model = WhisperForConditionalGeneration.from_pretrained("/tmp/model")
processor = WhisperProcessor.from_pretrained("openai/whisper-small")
def lambda_handler(event, context):
load_model()
# Decode base64 audio
audio_data = base64.b64decode(event['body'])
audio, sr = librosa.load(io.BytesIO(audio_data), sr=16000)
# Transcribe
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
return {
'statusCode': 200,
'body': json.dumps({'transcription': transcription})
}
Google Cloud
Cloud Run Deployment
Step 1: Build and Push to GCR
# Enable APIs
gcloud services enable run.googleapis.com
gcloud services enable containerregistry.googleapis.com
# Build image
gcloud builds submit --tag gcr.io/PROJECT_ID/whisper-asr
# Or use Docker
docker tag whisper-asr gcr.io/PROJECT_ID/whisper-asr
docker push gcr.io/PROJECT_ID/whisper-asr
Step 2: Deploy to Cloud Run
gcloud run deploy whisper-asr \
--image gcr.io/PROJECT_ID/whisper-asr \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--memory 2Gi \
--cpu 2 \
--timeout 300
Step 3: Get Service URL
gcloud run services describe whisper-asr \
--platform managed \
--region us-central1 \
--format 'value(status.url)'
Azure Deployment
Azure Container Instances
Step 1: Push to Azure Container Registry
# Create ACR
az acr create --resource-group myResourceGroup \
--name whisperasr --sku Basic
# Login
az acr login --name whisperasr
# Tag and push
docker tag whisper-asr whisperasr.azurecr.io/whisper-asr:latest
docker push whisperasr.azurecr.io/whisper-asr:latest
Step 2: Deploy Container Instance
az container create \
--resource-group myResourceGroup \
--name whisper-asr \
--image whisperasr.azurecr.io/whisper-asr:latest \
--cpu 2 \
--memory 4 \
--registry-login-server whisperasr.azurecr.io \
--registry-username <username> \
--registry-password <password> \
--dns-name-label whisper-asr \
--ports 8000
Production Considerations
Security
- Use HTTPS (SSL/TLS certificates)
- Implement rate limiting
- Add authentication/API keys
- Validate file uploads
- Set CORS policies
Monitoring
- Setup logging (CloudWatch, Stackdriver, etc.)
- Add health checks
- Monitor latency and errors
- Track usage metrics
Scaling
- Configure auto-scaling
- Use load balancer
- Implement caching
- Consider CDN for static assets
Cost Optimization
- Use spot/preemptible instances
- Implement request batching
- Cache model in memory
- Monitor and optimize resource usage
Troubleshooting
Common Issues
Model Not Loading
# Check model path
ls -la whisper_test_tuned/
# Check permissions
chmod -R 755 whisper_test_tuned/
Out of Memory
# Reduce batch size
# Use CPU instead of GPU
# Increase container memory
Slow Inference
# Use GPU
# Reduce beam size
# Use smaller model
# Implement caching
Port Already in Use
# Find process
lsof -i :8000
# Kill process
kill -9 <PID>
# Use different port
uvicorn api.main:app --port 8001
Next Steps
- Choose deployment platform
- Setup CI/CD pipeline
- Configure monitoring
- Test in production
- Optimize performance
- Scale as needed
For more help, see: