ishraq-quran-backend / DEPLOYMENT.md
nsakib161's picture
Fresh start: Configure for HF Spaces
991ca47

Deployment Guide

This guide covers various deployment options for the Quran Transcription API.

Table of Contents

  1. Local Development
  2. Production with Gunicorn
  3. Docker Deployment
  4. Cloud Deployment

Local Development

Quick Start

# Install dependencies
python setup.py

# Create environment file
cp .env.example .env

# Start development server
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Access the API at: http://localhost:8000/docs

Development with GPU

# Check GPU availability
python -c "import torch; print(torch.cuda.is_available())"

# Start server (GPU will be auto-detected)
uvicorn main:app --reload

Production with Gunicorn

Gunicorn is recommended for production deployments with better process management.

Installation

pip install gunicorn

Configuration

Create gunicorn.conf.py:

# Server socket
bind = "0.0.0.0:8000"
backlog = 2048

# Worker processes
workers = 1  # For single GPU/CPU, use 1 worker
worker_class = "uvicorn.workers.UvicornWorker"
worker_connections = 1000

# Timeouts (important for large audio files)
timeout = 300
graceful_timeout = 30
keepalive = 2

# Logging
accesslog = "-"
errorlog = "-"
loglevel = "info"

# Process naming
proc_name = "quran-api"

# Server mechanics
daemon = False
pidfile = None
umask = 0
user = None
group = None
tmp_upload_dir = None

# SSL (if needed)
# keyfile = "/path/to/keyfile"
# certfile = "/path/to/certfile"
# ca_certs = "/path/to/ca_certs"

Running Gunicorn

# Single worker (recommended)
gunicorn -c gunicorn.conf.py main:app

# With environment file
set CUDA_VISIBLE_DEVICES=0
gunicorn -c gunicorn.conf.py main:app

Docker Deployment

Build and Run

# Build image
docker build -t quran-api:latest .

# Run container
docker run -p 8000:8000 \
  -e CUDA_VISIBLE_DEVICES=0 \
  -e COMPUTE_TYPE=float16 \
  quran-api:latest

Docker Compose

# Start services
docker-compose up -d

# View logs
docker-compose logs -f quran-api

# Stop services
docker-compose down

# Remove volumes
docker-compose down -v

GPU Support in Docker

For GPU support, install NVIDIA Docker runtime:

# Install nvidia-docker
# https://github.com/NVIDIA/nvidia-docker

# Update docker-compose.yml to enable GPU
# (see docker-compose.yml for GPU configuration)

# Run with GPU
docker-compose up -d

Cloud Deployment

AWS EC2

Instance Requirements

  • Type: g4dn.xlarge (GPU) or t3.medium (CPU-only)
  • GPU: NVIDIA T4 for cost-effectiveness
  • Storage: 50GB+ SSD
  • RAM: 16GB+

Setup Steps

# 1. SSH into instance
ssh -i your-key.pem ec2-user@your-instance-ip

# 2. Install dependencies
sudo yum update -y
sudo yum install -y python3.10 python3-pip

# 3. Install NVIDIA drivers (for GPU instances)
sudo yum install -y gcc kernel-devel
# Download NVIDIA driver from https://www.nvidia.com/Download/driverDetails.aspx

# 4. Clone project
git clone https://github.com/your-repo/quran-app-ai.git
cd quran-app-ai/whisper-backend

# 5. Install application
python -m pip install -r requirements.txt

# 6. Create environment file
cp .env.example .env
nano .env  # Edit with your settings

# 7. Create systemd service
sudo nano /etc/systemd/system/quran-api.service

Systemd Service File

[Unit]
Description=Quran Transcription API
After=network.target

[Service]
Type=notify
User=ec2-user
WorkingDirectory=/home/ec2-user/quran-app-ai/whisper-backend
Environment="PATH=/home/ec2-user/.local/bin"
Environment="CUDA_VISIBLE_DEVICES=0"
ExecStart=/usr/local/bin/gunicorn -c gunicorn.conf.py main:app
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
# Enable and start service
sudo systemctl daemon-reload
sudo systemctl enable quran-api
sudo systemctl start quran-api

# Check status
sudo systemctl status quran-api
sudo journalctl -u quran-api -f

Google Cloud Run

# 1. Ensure you have gcloud CLI installed
gcloud init

# 2. Build and push Docker image
gcloud builds submit --tag gcr.io/PROJECT_ID/quran-api

# 3. Deploy to Cloud Run
gcloud run deploy quran-api \
  --image gcr.io/PROJECT_ID/quran-api \
  --platform managed \
  --region us-central1 \
  --memory 8Gi \
  --cpu 4 \
  --timeout 600 \
  --set-env-vars COMPUTE_TYPE=int8,CORS_ORIGINS=https://yourdomain.com

Heroku Deployment

Note: Heroku free tier may not have sufficient resources. Consider paid dynos.

# 1. Install Heroku CLI
# https://devcenter.heroku.com/articles/heroku-cli

# 2. Login
heroku login

# 3. Create app
heroku create your-app-name

# 4. Create Procfile
echo 'web: gunicorn -c gunicorn.conf.py main:app' > Procfile

# 5. Set environment variables
heroku config:set COMPUTE_TYPE=int8
heroku config:set CUDA_VISIBLE_DEVICES=""

# 6. Deploy
git push heroku main

Monitoring and Maintenance

Health Monitoring

# Check API health
curl http://localhost:8000/health

# Monitor logs (Docker)
docker-compose logs -f quran-api

# Monitor logs (Systemd)
journalctl -u quran-api -f

Database/Cache (Optional)

For scaling, add Redis for caching:

# In docker-compose.yml
redis:
  image: redis:7-alpine
  ports:
    - "6379:6379"
  volumes:
    - redis_data:/data

volumes:
  redis_data:

Backup Strategy

# Backup model cache
tar -czf quran-models-backup.tar.gz ~/.cache/huggingface/

# Upload to S3
aws s3 cp quran-models-backup.tar.gz s3://your-bucket/backups/

Performance Tuning

Environment Variables

# Reduce memory footprint
COMPUTE_TYPE=int8

# Optimize processing
WORKERS=1
TIMEOUT=300

# GPU Configuration
CUDA_VISIBLE_DEVICES=0,1  # Multiple GPUs

# Logging
LOG_LEVEL=WARNING  # Reduce logging overhead

Load Testing

# Install locust
pip install locust

# Create locustfile.py
# Run tests
locust -f locustfile.py -u 10 -r 1 --headless -t 1m

Troubleshooting

Out of Memory

# Reduce workers
WORKERS=1

# Use smaller compute type
COMPUTE_TYPE=int8

# Check memory usage
free -h  # Linux
Get-Process | Sort-Object WorkingSet64 -Descending | Select -First 10  # Windows

Slow Requests

# Check GPU utilization
nvidia-smi

# Check CPU
top  # Linux
Get-Process | Where-Object {$_.Handles -gt 900} | Sort-Object Handles  # Windows

# Profile application
pip install py-spy
py-spy record -o profile.svg --pid <pid>

Model Download Issues

# Pre-download model
python -c "from faster_whisper import WhisperModel; WhisperModel('OdyAsh/faster-whisper-base-ar-quran')"

# Specify cache directory
export HF_HOME=/path/to/cache

Security

HTTPS/TLS

# Generate self-signed certificate
openssl req -x509 -newkey rsa:4096 -nodes -out cert.pem -keyout key.pem -days 365

# Use with Gunicorn
gunicorn --certfile=cert.pem --keyfile=key.pem --ssl-version=TLSv1_2 main:app

Rate Limiting

# Install slowapi
pip install slowapi

# Add to main.py
from slowapi import Limiter
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter

@app.post("/transcribe")
@limiter.limit("10/minute")
async def transcribe(request: Request, file: UploadFile = File(...)):
    ...

API Key Authentication

from fastapi.security import HTTPBearer

security = HTTPBearer()

@app.post("/transcribe")
async def transcribe(
    credentials: HTTPAuthCredentials = Depends(security),
    file: UploadFile = File(...)
):
    if credentials.credentials != "YOUR_SECRET_KEY":
        raise HTTPException(status_code=403, detail="Invalid API key")
    ...

Maintenance

Update Model

# Clear cache
rm -rf ~/.cache/huggingface/

# Model will be re-downloaded on next request

View Logs

# Docker
docker-compose logs --tail 100 quran-api

# Systemd
journalctl -u quran-api --since "2 hours ago"

# Gunicorn access log
tail -f /var/log/gunicorn/access.log

For more information, see the main README.md file.