# Deployment Strategy ## Current Target: Local Development The MVP focuses on getting the system working locally before worrying about production infrastructure. ### Local Development Architecture ```mermaid graph TB subgraph DevMachine["Developer Machine (MacOS/Linux)"] Frontend["Frontend (React + Vite)
http://localhost:5173"] API["Backend API (FastAPI)
http://localhost:8000"] Redis["Redis (Queue + Cache)
localhost:6379"] Worker["Celery Worker (GPU-enabled)
- Demucs model (~350MB)
- basic-pitch model (~30MB)"] Storage["Local Storage
- /tmp/rescored/audio/ (temp)
- /tmp/rescored/outputs/ (MusicXML)"] Frontend -->|HTTP/WS| API API --> Redis Redis --> Worker Worker -.-> Storage end ``` ### Setup Requirements **Hardware**: - **GPU**: Apple Silicon (M1/M2/M3/M4 with MPS) OR NVIDIA GPU with 4GB+ VRAM - Alternative: Run on CPU (10-15x slower, acceptable for development) - **RAM**: 16GB+ recommended - **Disk**: 10GB for models and temp files **Software**: - **Python 3.10** (required for madmom compatibility) - **Node.js 18+** - **Redis 7+** - **FFmpeg** - **YouTube cookies** (required as of December 2024) ### Docker Compose Setup (Recommended) ```yaml # docker-compose.yml version: '3.8' services: redis: image: redis:7-alpine ports: - "6379:6379" volumes: - redis_data:/data api: build: ./backend ports: - "8000:8000" environment: - REDIS_URL=redis://redis:6379 - STORAGE_PATH=/app/storage volumes: - ./backend:/app - storage:/app/storage depends_on: - redis worker: build: ./backend command: celery -A tasks worker --loglevel=info environment: - REDIS_URL=redis://redis:6379 - STORAGE_PATH=/app/storage volumes: - ./backend:/app - storage:/app/storage deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] depends_on: - redis frontend: build: ./frontend ports: - "5173:5173" volumes: - ./frontend:/app environment: - VITE_API_URL=http://localhost:8000 volumes: redis_data: storage: ``` **Benefits**: - One command to start everything: `docker-compose up` - Consistent environment across developers - GPU passthrough handled automatically - Easy cleanup **Limitations**: - Slower hot reload than native - GPU support requires Docker Desktop on Mac (experimental) ### Quick Start (Recommended) Use the provided shell scripts to start/stop all services: ```bash # From project root ./start.sh # View logs tail -f logs/api.log # Backend API tail -f logs/worker.log # Celery worker tail -f logs/frontend.log # Frontend # Stop all services ./stop.sh ``` **What `start.sh` does:** 1. Starts Redis (if not already running via Homebrew) 2. Activates Python 3.10 venv 3. Starts Backend API (uvicorn) in background 4. Starts Celery Worker (--pool=solo for macOS) in background 5. Starts Frontend (npm run dev) in background 6. Writes all logs to `logs/` directory **Services available at:** - Frontend: http://localhost:5173 - Backend API: http://localhost:8000 - API Docs: http://localhost:8000/docs ### Manual Setup (Alternative) If you prefer to run services manually in separate terminals: **Terminal 1 - Redis (macOS with Homebrew)**: ```bash brew services start redis redis-cli ping # Should return PONG ``` **Terminal 2 - Backend API**: ```bash cd backend source .venv/bin/activate uvicorn main:app --host 0.0.0.0 --port 8000 --reload ``` **Terminal 3 - Celery Worker**: ```bash cd backend source .venv/bin/activate # Use --pool=solo on macOS to avoid fork() crashes with ML libraries celery -A tasks worker --loglevel=info --pool=solo ``` **Terminal 4 - Frontend**: ```bash cd frontend npm run dev ``` **Benefits**: - Easier debugging (separate terminal per service) - More control over each service - See output in real-time **Limitations**: - Managing 4 terminals - Need to manually stop each service --- ## Future: Production Deployment ### Phase 2 - Proof of Concept Deployment **Goal**: Share with friends, small beta test (< 100 users) **Architecture**: ```mermaid graph TB Users[Users] Users --> Vercel["Vercel
Frontend (React SPA)"] Vercel -->|HTTPS| Render["Render/Railway
Backend API (FastAPI)"] Render --> Upstash["Upstash Redis
Job queue"] Upstash --> Modal["Modal
GPU workers (Demucs + basic-pitch)"] Modal --> R2["Cloudflare R2
Audio/MusicXML storage"] ``` **Components**: | Service | Provider | Why | Cost (est.) | |---------|----------|-----|-------------| | Frontend | Vercel | Free tier, great DX | $0 | | Backend API | Render/Railway | Easy deploy, free tier | $0-7/month | | Redis | Upstash | Serverless, free tier | $0 | | GPU Workers | Modal | Pay-per-use GPU | $0.50/hour GPU time | | Storage | Cloudflare R2 | Cheap, S3-compatible | $0.015/GB | **Estimated Monthly Cost**: $10-50 for 100 users doing ~5 transcriptions/month - 500 jobs/month × 2 min/job = 16 GPU hours/month = ~$8 - Storage: 100GB = ~$1.50 - Backend: Free tier (Render) or $7/month **Deployment Flow**: 1. Push to `main` branch 2. Vercel auto-deploys frontend 3. Render auto-deploys API from Dockerfile 4. Modal workers pull latest image on invocation **Limitations**: - Cold starts (workers take 10-20s to start) - No auto-scaling of API (single instance) - Limited monitoring --- ### Phase 3 - Production Scale **Goal**: Support 1000+ users, high availability **Architecture**: ```mermaid graph TB Users[Users] Users --> CDN["Cloudflare CDN
Cached static assets"] CDN --> ALB["AWS ALB
Load balancer"] ALB --> ECS["ECS Fargate
API servers (auto-scaling 2-10)"] ECS --> ElastiCache["ElastiCache Redis
Job queue (HA, multi-AZ)"] ElastiCache --> GPU["Modal/Runpod
GPU workers (auto-scaling 1-20)"] GPU --> S3["S3
Audio/MusicXML storage"] ECS --> RDS["RDS PostgreSQL
User accounts, job history"] ``` **Infrastructure**: | Component | Service | Scaling | Cost | |-----------|---------|---------|------| | CDN | Cloudflare | Global edge caching | $20/month | | API | ECS Fargate | 2-10 instances, CPU-based autoscaling | $50-200/month | | Redis | ElastiCache | Multi-AZ, 2 nodes | $50/month | | Workers | Modal | 1-20 GPU instances, queue-depth scaling | $500-2000/month | | Storage | S3 | Lifecycle policies (delete after 30 days) | $50-100/month | | DB | RDS PostgreSQL | Multi-AZ, auto-scaling storage | $50-100/month | | Monitoring | Datadog/Sentry | Error tracking, metrics | $50/month | **Estimated Monthly Cost**: $800-2500 for 10k transcriptions/month **Features**: - **Auto-scaling**: API scales on CPU, workers scale on queue depth - **High availability**: Multi-AZ for DB and Redis - **Monitoring**: Full observability (logs, metrics, traces) - **Security**: VPC, encryption at rest, HTTPS everywhere - **CI/CD**: GitHub Actions, blue-green deployments - **Rate limiting**: Per-user quotas, IP-based throttling **Deployment Pipeline**: 1. PR opened → GitHub Actions runs tests 2. Merge to `main` → Docker images built and pushed to ECR 3. ECS updates task definitions (rolling update) 4. Modal pulls new worker image on next invocation 5. Cloudflare cache invalidated for frontend assets --- ## GPU Infrastructure Deep Dive ### Local GPU (Development) **Supported**: - NVIDIA GPUs with CUDA 11.8+ support - Apple Silicon (MPS backend) - experimental, slower **Setup**: ```bash # Check GPU nvidia-smi # Install PyTorch with CUDA pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 ``` **Fallback to CPU**: ```python # In worker code device = "cuda" if torch.cuda.is_available() else "cpu" ``` **Processing Time**: - GPU (RTX 3080): ~45 seconds per 3-minute song - CPU (M1 Max): ~8 minutes per 3-minute song --- ### Serverless GPU (Production) **Option 1: Modal** (Recommended) **Pros**: - Fast cold starts (10-20 seconds) - Per-second billing - No idle GPU cost - Great Python support - Volumes for model caching **Cons**: - Newer platform (less proven) - US-only regions currently **Example Worker**: ```python import modal stub = modal.Stub("rescored") @stub.function( gpu="A10G", # NVIDIA A10G (24GB VRAM) timeout=600, volumes={"/models": modal.Volume.from_name("model-cache")} ) def process_audio(job_id: str, audio_url: str): # Demucs + basic-pitch processing pass ``` **Cost**: ~$0.60/hour for A10G GPU --- **Option 2: RunPod Serverless** **Pros**: - Cheaper than Modal ($0.30-0.50/hour) - More GPU options - Global regions **Cons**: - Slower cold starts (30-60 seconds) - More manual setup --- **Option 3: AWS SageMaker/Lambda** **Pros**: - AWS ecosystem integration - Well-documented **Cons**: - Expensive for small workloads - Slow cold starts - More complex setup --- **Decision for Production**: Start with Modal, evaluate RunPod if cost becomes issue. --- ## Storage Strategy ### MVP (Local) - Temp files: `/tmp/rescored/` - Cleanup: Manual or cron job ### Production - **Temp audio**: S3 with 1-day lifecycle policy (delete after processing) - **Output files**: S3 Standard for 30 days, then delete OR: - S3 Intelligent-Tiering if keeping long-term - **Model files**: Baked into Docker image or cached volume (Modal) **S3 Bucket Structure**: ``` s3://rescored-prod/ temp-audio/ {job_id}.wav # Delete after 1 day separated-stems/ {job_id}/ drums.wav bass.wav ... # Delete after 1 day outputs/ {job_id}.musicxml # Keep for 30 days {job_id}.midi # Keep for 30 days ``` --- ## Scaling Bottlenecks ### What Scales Easily - Frontend (static assets, CDN) - API servers (stateless, horizontal scaling) - Redis (managed service auto-scaling) ### What Doesn't Scale Easily - **GPU workers**: Expensive, limited availability - **Source separation**: CPU/GPU bound, can't optimize much - **Model loading**: Large models (350MB) slow cold starts ### Mitigation Strategies 1. **Pre-warm workers**: Keep 1-2 GPU workers hot during peak hours 2. **Model caching**: Use Modal volumes or Docker layers 3. **Queue prioritization**: Premium users get faster processing 4. **Job batching**: Process multiple songs on same GPU instance (future) 5. **Progressive results**: Return piano transcription first, other instruments later --- ## Cost Optimization ### Development - Use CPU for small tests (slower but free) - Limit worker parallelism to 1 ### Production - **Lifecycle policies**: Delete temp files after 1 day, outputs after 30 days - **Reserved capacity**: If consistent load, reserve GPU instances (50% cheaper) - **Spot instances**: Use for non-urgent jobs (70% cheaper, can be interrupted) - **CDN caching**: Aggressive caching for static assets (frontend, model files) - **Compression**: Gzip API responses, compress audio files before storage --- ## Monitoring & Observability ### Metrics to Track - **API**: Request rate, latency, error rate - **Workers**: Queue depth, processing time per stage, GPU utilization - **Costs**: GPU hours used, storage size, API requests ### Logging - **Structured logs**: JSON format with job_id, user_id, stage - **Centralized**: CloudWatch, Datadog, or Loki ### Alerting - Worker failures exceeding 5% of jobs - Queue depth over 100 jobs (need more workers) - GPU utilization below 50% (over-provisioned) - API error rate over 1% --- ## Security Considerations ### Local Development - No auth needed - Redis on localhost only - CORS enabled for `localhost:5173` ### Production - **HTTPS only**: Enforce TLS for API and WebSocket - **API authentication**: JWT tokens for user sessions - **Rate limiting**: 10 jobs per user per hour - **Input validation**: Check YouTube URL format, max video length - **Secrets management**: Use environment variables or AWS Secrets Manager - **VPC**: API and workers in private subnets - **File scanning**: Check uploaded files for malware (if allowing file uploads) --- ## Disaster Recovery ### Backups - **Redis**: Daily snapshots to S3 - **PostgreSQL**: Automated daily backups (RDS), 7-day retention - **Code**: GitHub (already version controlled) - **Models**: Re-downloadable, no backup needed ### Incident Response - **Worker failure**: Job retried automatically (Celery) - **API crash**: ECS restarts container, ALB routes to healthy instance - **Redis failure**: ElastiCache auto-failover to standby - **Complete outage**: Deploy from last known good commit, restore DB from backup --- ## Next Steps 1. Get local development working with Docker Compose 2. Test full pipeline end-to-end with sample YouTube videos 3. Deploy proof-of-concept to Vercel + Modal for beta testing 4. Collect metrics on processing time, costs, user feedback 5. Scale to production architecture if product gains traction See [Audio Processing Pipeline](../backend/pipeline.md) for implementation details.