# Complete Deployment Guide ## Table of Contents 1. [Local Development](#local-development) 2. [Docker Deployment](#docker-deployment) 3. [HuggingFace Spaces](#huggingface-spaces) 4. [AWS Deployment](#aws-deployment) 5. [Google Cloud](#google-cloud) 6. [Azure Deployment](#azure-deployment) --- ## Local Development ### Prerequisites ```bash # System requirements - Python 3.10+ - FFmpeg - 4GB+ RAM - (Optional) CUDA-capable GPU ``` ### Setup ```bash # 1. Clone repository git clone https://github.com/YOUR_USERNAME/whisper-german-asr.git cd whisper-german-asr # 2. Run quick start script chmod +x scripts/quick_start.sh ./scripts/quick_start.sh # 3. Start services # Option A: Gradio Demo python demo/app.py # Option B: FastAPI uvicorn api.main:app --reload # Option C: Both (separate terminals) python demo/app.py & uvicorn api.main:app --port 8000 & ``` ### Testing ```bash # Test API curl -X POST "http://localhost:8000/transcribe" \ -F "file=@test_audio.wav" # Test Demo # Open http://localhost:7860 in browser ``` --- ## Docker Deployment ### Quick Start ```bash # Build and run with docker-compose docker-compose up -d # View logs docker-compose logs -f # Stop services docker-compose down ``` ### Manual Docker Build ```bash # Build image docker build -t whisper-asr . # Run API docker run -d \ -p 8000:8000 \ -v $(pwd)/whisper_test_tuned:/app/whisper_test_tuned:ro \ --name whisper-api \ whisper-asr # Run Demo docker run -d \ -p 7860:7860 \ -v $(pwd)/whisper_test_tuned:/app/whisper_test_tuned:ro \ --name whisper-demo \ whisper-asr python demo/app.py ``` ### Docker with GPU ```bash # Install nvidia-docker2 # https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html # Run with GPU docker run -d \ --gpus all \ -p 8000:8000 \ -v $(pwd)/whisper_test_tuned:/app/whisper_test_tuned:ro \ whisper-asr ``` --- ## HuggingFace Spaces ### Method 1: Gradio Space (Recommended) #### Step 1: Create Space 1. Go to https://huggingface.co/spaces 2. Click "Create new Space" 3. Settings: - **Name:** whisper-german-asr - **SDK:** Gradio - **Hardware:** CPU Basic (free) or GPU T4 (paid) - **Visibility:** Public #### Step 2: Prepare Files ```bash # Create a new directory for Space mkdir hf-space cd hf-space # Copy demo app cp ../demo/app.py app.py # Create requirements.txt cat > requirements.txt << EOF torch>=2.2.0 transformers>=4.42.0 librosa>=0.10.1 gradio>=4.0.0 soundfile>=0.12.1 EOF # Create README.md with frontmatter cat > README.md << EOF --- title: Whisper German ASR emoji: 🎙️ colorFrom: blue colorTo: green sdk: gradio sdk_version: 4.0.0 app_file: app.py pinned: false license: mit --- # Whisper German ASR Fine-tuned Whisper model for German speech recognition. Try it out by recording or uploading German audio! EOF ``` #### Step 3: Update app.py ```python # Modify model loading to use HF Hub def load_model(model_path="YOUR_USERNAME/whisper-small-german"): model = WhisperForConditionalGeneration.from_pretrained(model_path) processor = WhisperProcessor.from_pretrained(model_path) # ... rest of code ``` #### Step 4: Push Model to HF Hub (First Time) ```python # In Python from transformers import WhisperForConditionalGeneration, WhisperProcessor model = WhisperForConditionalGeneration.from_pretrained("./whisper_test_tuned") processor = WhisperProcessor.from_pretrained("openai/whisper-small") # Push to Hub model.push_to_hub("YOUR_USERNAME/whisper-small-german") processor.push_to_hub("YOUR_USERNAME/whisper-small-german") ``` #### Step 5: Deploy to Space ```bash # Clone Space repository git clone https://huggingface.co/spaces/YOUR_USERNAME/whisper-german-asr cd whisper-german-asr # Copy files cp ../hf-space/* . # Push to Space git add . git commit -m "Initial deployment" git push ``` ### Method 2: Docker Space ```dockerfile # Create Dockerfile in Space FROM python:3.10-slim WORKDIR /app RUN apt-get update && apt-get install -y ffmpeg libsndfile1 COPY requirements.txt . RUN pip install -r requirements.txt COPY app.py . CMD ["python", "app.py"] ``` --- ## AWS Deployment ### Option 1: ECS Fargate #### Step 1: Push Docker Image to ECR ```bash # Create ECR repository aws ecr create-repository --repository-name whisper-asr # Login to ECR aws ecr get-login-password --region us-east-1 | \ docker login --username AWS --password-stdin \ YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com # Tag and push docker tag whisper-asr:latest \ YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/whisper-asr:latest docker push YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/whisper-asr:latest ``` #### Step 2: Create ECS Task Definition ```json { "family": "whisper-asr", "networkMode": "awsvpc", "requiresCompatibilities": ["FARGATE"], "cpu": "1024", "memory": "2048", "containerDefinitions": [ { "name": "whisper-api", "image": "YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/whisper-asr:latest", "portMappings": [ { "containerPort": 8000, "protocol": "tcp" } ], "environment": [ { "name": "MODEL_PATH", "value": "/app/whisper_test_tuned" } ] } ] } ``` #### Step 3: Create ECS Service ```bash aws ecs create-service \ --cluster default \ --service-name whisper-asr \ --task-definition whisper-asr \ --desired-count 1 \ --launch-type FARGATE \ --network-configuration "awsvpcConfiguration={subnets=[subnet-xxx],securityGroups=[sg-xxx],assignPublicIp=ENABLED}" ``` ### Option 2: Lambda + API Gateway ```python # lambda_function.py import json import base64 from transformers import WhisperForConditionalGeneration, WhisperProcessor import librosa import io model = None processor = None def load_model(): global model, processor if model is None: model = WhisperForConditionalGeneration.from_pretrained("/tmp/model") processor = WhisperProcessor.from_pretrained("openai/whisper-small") def lambda_handler(event, context): load_model() # Decode base64 audio audio_data = base64.b64decode(event['body']) audio, sr = librosa.load(io.BytesIO(audio_data), sr=16000) # Transcribe input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features predicted_ids = model.generate(input_features) transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0] return { 'statusCode': 200, 'body': json.dumps({'transcription': transcription}) } ``` --- ## Google Cloud ### Cloud Run Deployment #### Step 1: Build and Push to GCR ```bash # Enable APIs gcloud services enable run.googleapis.com gcloud services enable containerregistry.googleapis.com # Build image gcloud builds submit --tag gcr.io/PROJECT_ID/whisper-asr # Or use Docker docker tag whisper-asr gcr.io/PROJECT_ID/whisper-asr docker push gcr.io/PROJECT_ID/whisper-asr ``` #### Step 2: Deploy to Cloud Run ```bash gcloud run deploy whisper-asr \ --image gcr.io/PROJECT_ID/whisper-asr \ --platform managed \ --region us-central1 \ --allow-unauthenticated \ --memory 2Gi \ --cpu 2 \ --timeout 300 ``` #### Step 3: Get Service URL ```bash gcloud run services describe whisper-asr \ --platform managed \ --region us-central1 \ --format 'value(status.url)' ``` --- ## Azure Deployment ### Azure Container Instances #### Step 1: Push to Azure Container Registry ```bash # Create ACR az acr create --resource-group myResourceGroup \ --name whisperasr --sku Basic # Login az acr login --name whisperasr # Tag and push docker tag whisper-asr whisperasr.azurecr.io/whisper-asr:latest docker push whisperasr.azurecr.io/whisper-asr:latest ``` #### Step 2: Deploy Container Instance ```bash az container create \ --resource-group myResourceGroup \ --name whisper-asr \ --image whisperasr.azurecr.io/whisper-asr:latest \ --cpu 2 \ --memory 4 \ --registry-login-server whisperasr.azurecr.io \ --registry-username \ --registry-password \ --dns-name-label whisper-asr \ --ports 8000 ``` --- ## Production Considerations ### Security - [ ] Use HTTPS (SSL/TLS certificates) - [ ] Implement rate limiting - [ ] Add authentication/API keys - [ ] Validate file uploads - [ ] Set CORS policies ### Monitoring - [ ] Setup logging (CloudWatch, Stackdriver, etc.) - [ ] Add health checks - [ ] Monitor latency and errors - [ ] Track usage metrics ### Scaling - [ ] Configure auto-scaling - [ ] Use load balancer - [ ] Implement caching - [ ] Consider CDN for static assets ### Cost Optimization - [ ] Use spot/preemptible instances - [ ] Implement request batching - [ ] Cache model in memory - [ ] Monitor and optimize resource usage --- ## Troubleshooting ### Common Issues **Model Not Loading** ```bash # Check model path ls -la whisper_test_tuned/ # Check permissions chmod -R 755 whisper_test_tuned/ ``` **Out of Memory** ```bash # Reduce batch size # Use CPU instead of GPU # Increase container memory ``` **Slow Inference** ```bash # Use GPU # Reduce beam size # Use smaller model # Implement caching ``` **Port Already in Use** ```bash # Find process lsof -i :8000 # Kill process kill -9 # Use different port uvicorn api.main:app --port 8001 ``` --- ## Next Steps 1. Choose deployment platform 2. Setup CI/CD pipeline 3. Configure monitoring 4. Test in production 5. Optimize performance 6. Scale as needed For more help, see: - [README.md](README.md) - [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - [CONTRIBUTING.md](CONTRIBUTING.md)