Spaces:
Runtime error
Runtime error
Deployment Guide
This guide covers various deployment options for the Quran Transcription API.
Table of Contents
Local Development
Quick Start
# Install dependencies
python setup.py
# Create environment file
cp .env.example .env
# Start development server
uvicorn main:app --reload --host 0.0.0.0 --port 8000
Access the API at: http://localhost:8000/docs
Development with GPU
# Check GPU availability
python -c "import torch; print(torch.cuda.is_available())"
# Start server (GPU will be auto-detected)
uvicorn main:app --reload
Production with Gunicorn
Gunicorn is recommended for production deployments with better process management.
Installation
pip install gunicorn
Configuration
Create gunicorn.conf.py:
# Server socket
bind = "0.0.0.0:8000"
backlog = 2048
# Worker processes
workers = 1 # For single GPU/CPU, use 1 worker
worker_class = "uvicorn.workers.UvicornWorker"
worker_connections = 1000
# Timeouts (important for large audio files)
timeout = 300
graceful_timeout = 30
keepalive = 2
# Logging
accesslog = "-"
errorlog = "-"
loglevel = "info"
# Process naming
proc_name = "quran-api"
# Server mechanics
daemon = False
pidfile = None
umask = 0
user = None
group = None
tmp_upload_dir = None
# SSL (if needed)
# keyfile = "/path/to/keyfile"
# certfile = "/path/to/certfile"
# ca_certs = "/path/to/ca_certs"
Running Gunicorn
# Single worker (recommended)
gunicorn -c gunicorn.conf.py main:app
# With environment file
set CUDA_VISIBLE_DEVICES=0
gunicorn -c gunicorn.conf.py main:app
Docker Deployment
Build and Run
# Build image
docker build -t quran-api:latest .
# Run container
docker run -p 8000:8000 \
-e CUDA_VISIBLE_DEVICES=0 \
-e COMPUTE_TYPE=float16 \
quran-api:latest
Docker Compose
# Start services
docker-compose up -d
# View logs
docker-compose logs -f quran-api
# Stop services
docker-compose down
# Remove volumes
docker-compose down -v
GPU Support in Docker
For GPU support, install NVIDIA Docker runtime:
# Install nvidia-docker
# https://github.com/NVIDIA/nvidia-docker
# Update docker-compose.yml to enable GPU
# (see docker-compose.yml for GPU configuration)
# Run with GPU
docker-compose up -d
Cloud Deployment
AWS EC2
Instance Requirements
- Type: g4dn.xlarge (GPU) or t3.medium (CPU-only)
- GPU: NVIDIA T4 for cost-effectiveness
- Storage: 50GB+ SSD
- RAM: 16GB+
Setup Steps
# 1. SSH into instance
ssh -i your-key.pem ec2-user@your-instance-ip
# 2. Install dependencies
sudo yum update -y
sudo yum install -y python3.10 python3-pip
# 3. Install NVIDIA drivers (for GPU instances)
sudo yum install -y gcc kernel-devel
# Download NVIDIA driver from https://www.nvidia.com/Download/driverDetails.aspx
# 4. Clone project
git clone https://github.com/your-repo/quran-app-ai.git
cd quran-app-ai/whisper-backend
# 5. Install application
python -m pip install -r requirements.txt
# 6. Create environment file
cp .env.example .env
nano .env # Edit with your settings
# 7. Create systemd service
sudo nano /etc/systemd/system/quran-api.service
Systemd Service File
[Unit]
Description=Quran Transcription API
After=network.target
[Service]
Type=notify
User=ec2-user
WorkingDirectory=/home/ec2-user/quran-app-ai/whisper-backend
Environment="PATH=/home/ec2-user/.local/bin"
Environment="CUDA_VISIBLE_DEVICES=0"
ExecStart=/usr/local/bin/gunicorn -c gunicorn.conf.py main:app
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
# Enable and start service
sudo systemctl daemon-reload
sudo systemctl enable quran-api
sudo systemctl start quran-api
# Check status
sudo systemctl status quran-api
sudo journalctl -u quran-api -f
Google Cloud Run
# 1. Ensure you have gcloud CLI installed
gcloud init
# 2. Build and push Docker image
gcloud builds submit --tag gcr.io/PROJECT_ID/quran-api
# 3. Deploy to Cloud Run
gcloud run deploy quran-api \
--image gcr.io/PROJECT_ID/quran-api \
--platform managed \
--region us-central1 \
--memory 8Gi \
--cpu 4 \
--timeout 600 \
--set-env-vars COMPUTE_TYPE=int8,CORS_ORIGINS=https://yourdomain.com
Heroku Deployment
Note: Heroku free tier may not have sufficient resources. Consider paid dynos.
# 1. Install Heroku CLI
# https://devcenter.heroku.com/articles/heroku-cli
# 2. Login
heroku login
# 3. Create app
heroku create your-app-name
# 4. Create Procfile
echo 'web: gunicorn -c gunicorn.conf.py main:app' > Procfile
# 5. Set environment variables
heroku config:set COMPUTE_TYPE=int8
heroku config:set CUDA_VISIBLE_DEVICES=""
# 6. Deploy
git push heroku main
Monitoring and Maintenance
Health Monitoring
# Check API health
curl http://localhost:8000/health
# Monitor logs (Docker)
docker-compose logs -f quran-api
# Monitor logs (Systemd)
journalctl -u quran-api -f
Database/Cache (Optional)
For scaling, add Redis for caching:
# In docker-compose.yml
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis_data:/data
volumes:
redis_data:
Backup Strategy
# Backup model cache
tar -czf quran-models-backup.tar.gz ~/.cache/huggingface/
# Upload to S3
aws s3 cp quran-models-backup.tar.gz s3://your-bucket/backups/
Performance Tuning
Environment Variables
# Reduce memory footprint
COMPUTE_TYPE=int8
# Optimize processing
WORKERS=1
TIMEOUT=300
# GPU Configuration
CUDA_VISIBLE_DEVICES=0,1 # Multiple GPUs
# Logging
LOG_LEVEL=WARNING # Reduce logging overhead
Load Testing
# Install locust
pip install locust
# Create locustfile.py
# Run tests
locust -f locustfile.py -u 10 -r 1 --headless -t 1m
Troubleshooting
Out of Memory
# Reduce workers
WORKERS=1
# Use smaller compute type
COMPUTE_TYPE=int8
# Check memory usage
free -h # Linux
Get-Process | Sort-Object WorkingSet64 -Descending | Select -First 10 # Windows
Slow Requests
# Check GPU utilization
nvidia-smi
# Check CPU
top # Linux
Get-Process | Where-Object {$_.Handles -gt 900} | Sort-Object Handles # Windows
# Profile application
pip install py-spy
py-spy record -o profile.svg --pid <pid>
Model Download Issues
# Pre-download model
python -c "from faster_whisper import WhisperModel; WhisperModel('OdyAsh/faster-whisper-base-ar-quran')"
# Specify cache directory
export HF_HOME=/path/to/cache
Security
HTTPS/TLS
# Generate self-signed certificate
openssl req -x509 -newkey rsa:4096 -nodes -out cert.pem -keyout key.pem -days 365
# Use with Gunicorn
gunicorn --certfile=cert.pem --keyfile=key.pem --ssl-version=TLSv1_2 main:app
Rate Limiting
# Install slowapi
pip install slowapi
# Add to main.py
from slowapi import Limiter
from slowapi.util import get_remote_address
limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter
@app.post("/transcribe")
@limiter.limit("10/minute")
async def transcribe(request: Request, file: UploadFile = File(...)):
...
API Key Authentication
from fastapi.security import HTTPBearer
security = HTTPBearer()
@app.post("/transcribe")
async def transcribe(
credentials: HTTPAuthCredentials = Depends(security),
file: UploadFile = File(...)
):
if credentials.credentials != "YOUR_SECRET_KEY":
raise HTTPException(status_code=403, detail="Invalid API key")
...
Maintenance
Update Model
# Clear cache
rm -rf ~/.cache/huggingface/
# Model will be re-downloaded on next request
View Logs
# Docker
docker-compose logs --tail 100 quran-api
# Systemd
journalctl -u quran-api --since "2 hours ago"
# Gunicorn access log
tail -f /var/log/gunicorn/access.log
For more information, see the main README.md file.