| # Deployment Strategy | |
| ## Current Target: Local Development | |
| The MVP focuses on getting the system working locally before worrying about production infrastructure. | |
| ### Local Development Architecture | |
| ```mermaid | |
| graph TB | |
| subgraph DevMachine["Developer Machine (MacOS/Linux)"] | |
| Frontend["Frontend (React + Vite)<br/>http://localhost:5173"] | |
| API["Backend API (FastAPI)<br/>http://localhost:8000"] | |
| Redis["Redis (Queue + Cache)<br/>localhost:6379"] | |
| Worker["Celery Worker (GPU-enabled)<br/>- Demucs model (~350MB)<br/>- basic-pitch model (~30MB)"] | |
| Storage["Local Storage<br/>- /tmp/rescored/audio/ (temp)<br/>- /tmp/rescored/outputs/ (MusicXML)"] | |
| Frontend -->|HTTP/WS| API | |
| API --> Redis | |
| Redis --> Worker | |
| Worker -.-> Storage | |
| end | |
| ``` | |
| ### Setup Requirements | |
| **Hardware**: | |
| - **GPU**: Apple Silicon (M1/M2/M3/M4 with MPS) OR NVIDIA GPU with 4GB+ VRAM | |
| - Alternative: Run on CPU (10-15x slower, acceptable for development) | |
| - **RAM**: 16GB+ recommended | |
| - **Disk**: 10GB for models and temp files | |
| **Software**: | |
| - **Python 3.10** (required for madmom compatibility) | |
| - **Node.js 18+** | |
| - **Redis 7+** | |
| - **FFmpeg** | |
| - **YouTube cookies** (required as of December 2024) | |
| ### Docker Compose Setup (Recommended) | |
| ```yaml | |
| # docker-compose.yml | |
| version: '3.8' | |
| services: | |
| redis: | |
| image: redis:7-alpine | |
| ports: | |
| - "6379:6379" | |
| volumes: | |
| - redis_data:/data | |
| api: | |
| build: ./backend | |
| ports: | |
| - "8000:8000" | |
| environment: | |
| - REDIS_URL=redis://redis:6379 | |
| - STORAGE_PATH=/app/storage | |
| volumes: | |
| - ./backend:/app | |
| - storage:/app/storage | |
| depends_on: | |
| - redis | |
| worker: | |
| build: ./backend | |
| command: celery -A tasks worker --loglevel=info | |
| environment: | |
| - REDIS_URL=redis://redis:6379 | |
| - STORAGE_PATH=/app/storage | |
| volumes: | |
| - ./backend:/app | |
| - storage:/app/storage | |
| deploy: | |
| resources: | |
| reservations: | |
| devices: | |
| - driver: nvidia | |
| count: 1 | |
| capabilities: [gpu] | |
| depends_on: | |
| - redis | |
| frontend: | |
| build: ./frontend | |
| ports: | |
| - "5173:5173" | |
| volumes: | |
| - ./frontend:/app | |
| environment: | |
| - VITE_API_URL=http://localhost:8000 | |
| volumes: | |
| redis_data: | |
| storage: | |
| ``` | |
| **Benefits**: | |
| - One command to start everything: `docker-compose up` | |
| - Consistent environment across developers | |
| - GPU passthrough handled automatically | |
| - Easy cleanup | |
| **Limitations**: | |
| - Slower hot reload than native | |
| - GPU support requires Docker Desktop on Mac (experimental) | |
| ### Quick Start (Recommended) | |
| Use the provided shell scripts to start/stop all services: | |
| ```bash | |
| # From project root | |
| ./start.sh | |
| # View logs | |
| tail -f logs/api.log # Backend API | |
| tail -f logs/worker.log # Celery worker | |
| tail -f logs/frontend.log # Frontend | |
| # Stop all services | |
| ./stop.sh | |
| ``` | |
| **What `start.sh` does:** | |
| 1. Starts Redis (if not already running via Homebrew) | |
| 2. Activates Python 3.10 venv | |
| 3. Starts Backend API (uvicorn) in background | |
| 4. Starts Celery Worker (--pool=solo for macOS) in background | |
| 5. Starts Frontend (npm run dev) in background | |
| 6. Writes all logs to `logs/` directory | |
| **Services available at:** | |
| - Frontend: http://localhost:5173 | |
| - Backend API: http://localhost:8000 | |
| - API Docs: http://localhost:8000/docs | |
| ### Manual Setup (Alternative) | |
| If you prefer to run services manually in separate terminals: | |
| **Terminal 1 - Redis (macOS with Homebrew)**: | |
| ```bash | |
| brew services start redis | |
| redis-cli ping # Should return PONG | |
| ``` | |
| **Terminal 2 - Backend API**: | |
| ```bash | |
| cd backend | |
| source .venv/bin/activate | |
| uvicorn main:app --host 0.0.0.0 --port 8000 --reload | |
| ``` | |
| **Terminal 3 - Celery Worker**: | |
| ```bash | |
| cd backend | |
| source .venv/bin/activate | |
| # Use --pool=solo on macOS to avoid fork() crashes with ML libraries | |
| celery -A tasks worker --loglevel=info --pool=solo | |
| ``` | |
| **Terminal 4 - Frontend**: | |
| ```bash | |
| cd frontend | |
| npm run dev | |
| ``` | |
| **Benefits**: | |
| - Easier debugging (separate terminal per service) | |
| - More control over each service | |
| - See output in real-time | |
| **Limitations**: | |
| - Managing 4 terminals | |
| - Need to manually stop each service | |
| --- | |
| ## Future: Production Deployment | |
| ### Phase 2 - Proof of Concept Deployment | |
| **Goal**: Share with friends, small beta test (< 100 users) | |
| **Architecture**: | |
| ```mermaid | |
| graph TB | |
| Users[Users] | |
| Users --> Vercel["Vercel<br/>Frontend (React SPA)"] | |
| Vercel -->|HTTPS| Render["Render/Railway<br/>Backend API (FastAPI)"] | |
| Render --> Upstash["Upstash Redis<br/>Job queue"] | |
| Upstash --> Modal["Modal<br/>GPU workers (Demucs + basic-pitch)"] | |
| Modal --> R2["Cloudflare R2<br/>Audio/MusicXML storage"] | |
| ``` | |
| **Components**: | |
| | Service | Provider | Why | Cost (est.) | | |
| |---------|----------|-----|-------------| | |
| | Frontend | Vercel | Free tier, great DX | $0 | | |
| | Backend API | Render/Railway | Easy deploy, free tier | $0-7/month | | |
| | Redis | Upstash | Serverless, free tier | $0 | | |
| | GPU Workers | Modal | Pay-per-use GPU | $0.50/hour GPU time | | |
| | Storage | Cloudflare R2 | Cheap, S3-compatible | $0.015/GB | | |
| **Estimated Monthly Cost**: $10-50 for 100 users doing ~5 transcriptions/month | |
| - 500 jobs/month × 2 min/job = 16 GPU hours/month = ~$8 | |
| - Storage: 100GB = ~$1.50 | |
| - Backend: Free tier (Render) or $7/month | |
| **Deployment Flow**: | |
| 1. Push to `main` branch | |
| 2. Vercel auto-deploys frontend | |
| 3. Render auto-deploys API from Dockerfile | |
| 4. Modal workers pull latest image on invocation | |
| **Limitations**: | |
| - Cold starts (workers take 10-20s to start) | |
| - No auto-scaling of API (single instance) | |
| - Limited monitoring | |
| --- | |
| ### Phase 3 - Production Scale | |
| **Goal**: Support 1000+ users, high availability | |
| **Architecture**: | |
| ```mermaid | |
| graph TB | |
| Users[Users] | |
| Users --> CDN["Cloudflare CDN<br/>Cached static assets"] | |
| CDN --> ALB["AWS ALB<br/>Load balancer"] | |
| ALB --> ECS["ECS Fargate<br/>API servers (auto-scaling 2-10)"] | |
| ECS --> ElastiCache["ElastiCache Redis<br/>Job queue (HA, multi-AZ)"] | |
| ElastiCache --> GPU["Modal/Runpod<br/>GPU workers (auto-scaling 1-20)"] | |
| GPU --> S3["S3<br/>Audio/MusicXML storage"] | |
| ECS --> RDS["RDS PostgreSQL<br/>User accounts, job history"] | |
| ``` | |
| **Infrastructure**: | |
| | Component | Service | Scaling | Cost | | |
| |-----------|---------|---------|------| | |
| | CDN | Cloudflare | Global edge caching | $20/month | | |
| | API | ECS Fargate | 2-10 instances, CPU-based autoscaling | $50-200/month | | |
| | Redis | ElastiCache | Multi-AZ, 2 nodes | $50/month | | |
| | Workers | Modal | 1-20 GPU instances, queue-depth scaling | $500-2000/month | | |
| | Storage | S3 | Lifecycle policies (delete after 30 days) | $50-100/month | | |
| | DB | RDS PostgreSQL | Multi-AZ, auto-scaling storage | $50-100/month | | |
| | Monitoring | Datadog/Sentry | Error tracking, metrics | $50/month | | |
| **Estimated Monthly Cost**: $800-2500 for 10k transcriptions/month | |
| **Features**: | |
| - **Auto-scaling**: API scales on CPU, workers scale on queue depth | |
| - **High availability**: Multi-AZ for DB and Redis | |
| - **Monitoring**: Full observability (logs, metrics, traces) | |
| - **Security**: VPC, encryption at rest, HTTPS everywhere | |
| - **CI/CD**: GitHub Actions, blue-green deployments | |
| - **Rate limiting**: Per-user quotas, IP-based throttling | |
| **Deployment Pipeline**: | |
| 1. PR opened → GitHub Actions runs tests | |
| 2. Merge to `main` → Docker images built and pushed to ECR | |
| 3. ECS updates task definitions (rolling update) | |
| 4. Modal pulls new worker image on next invocation | |
| 5. Cloudflare cache invalidated for frontend assets | |
| --- | |
| ## GPU Infrastructure Deep Dive | |
| ### Local GPU (Development) | |
| **Supported**: | |
| - NVIDIA GPUs with CUDA 11.8+ support | |
| - Apple Silicon (MPS backend) - experimental, slower | |
| **Setup**: | |
| ```bash | |
| # Check GPU | |
| nvidia-smi | |
| # Install PyTorch with CUDA | |
| pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 | |
| ``` | |
| **Fallback to CPU**: | |
| ```python | |
| # In worker code | |
| device = "cuda" if torch.cuda.is_available() else "cpu" | |
| ``` | |
| **Processing Time**: | |
| - GPU (RTX 3080): ~45 seconds per 3-minute song | |
| - CPU (M1 Max): ~8 minutes per 3-minute song | |
| --- | |
| ### Serverless GPU (Production) | |
| **Option 1: Modal** (Recommended) | |
| **Pros**: | |
| - Fast cold starts (10-20 seconds) | |
| - Per-second billing | |
| - No idle GPU cost | |
| - Great Python support | |
| - Volumes for model caching | |
| **Cons**: | |
| - Newer platform (less proven) | |
| - US-only regions currently | |
| **Example Worker**: | |
| ```python | |
| import modal | |
| stub = modal.Stub("rescored") | |
| @stub.function( | |
| gpu="A10G", # NVIDIA A10G (24GB VRAM) | |
| timeout=600, | |
| volumes={"/models": modal.Volume.from_name("model-cache")} | |
| ) | |
| def process_audio(job_id: str, audio_url: str): | |
| # Demucs + basic-pitch processing | |
| pass | |
| ``` | |
| **Cost**: ~$0.60/hour for A10G GPU | |
| --- | |
| **Option 2: RunPod Serverless** | |
| **Pros**: | |
| - Cheaper than Modal ($0.30-0.50/hour) | |
| - More GPU options | |
| - Global regions | |
| **Cons**: | |
| - Slower cold starts (30-60 seconds) | |
| - More manual setup | |
| --- | |
| **Option 3: AWS SageMaker/Lambda** | |
| **Pros**: | |
| - AWS ecosystem integration | |
| - Well-documented | |
| **Cons**: | |
| - Expensive for small workloads | |
| - Slow cold starts | |
| - More complex setup | |
| --- | |
| **Decision for Production**: Start with Modal, evaluate RunPod if cost becomes issue. | |
| --- | |
| ## Storage Strategy | |
| ### MVP (Local) | |
| - Temp files: `/tmp/rescored/` | |
| - Cleanup: Manual or cron job | |
| ### Production | |
| - **Temp audio**: S3 with 1-day lifecycle policy (delete after processing) | |
| - **Output files**: S3 Standard for 30 days, then delete OR: | |
| - S3 Intelligent-Tiering if keeping long-term | |
| - **Model files**: Baked into Docker image or cached volume (Modal) | |
| **S3 Bucket Structure**: | |
| ``` | |
| s3://rescored-prod/ | |
| temp-audio/ | |
| {job_id}.wav # Delete after 1 day | |
| separated-stems/ | |
| {job_id}/ | |
| drums.wav | |
| bass.wav | |
| ... # Delete after 1 day | |
| outputs/ | |
| {job_id}.musicxml # Keep for 30 days | |
| {job_id}.midi # Keep for 30 days | |
| ``` | |
| --- | |
| ## Scaling Bottlenecks | |
| ### What Scales Easily | |
| - Frontend (static assets, CDN) | |
| - API servers (stateless, horizontal scaling) | |
| - Redis (managed service auto-scaling) | |
| ### What Doesn't Scale Easily | |
| - **GPU workers**: Expensive, limited availability | |
| - **Source separation**: CPU/GPU bound, can't optimize much | |
| - **Model loading**: Large models (350MB) slow cold starts | |
| ### Mitigation Strategies | |
| 1. **Pre-warm workers**: Keep 1-2 GPU workers hot during peak hours | |
| 2. **Model caching**: Use Modal volumes or Docker layers | |
| 3. **Queue prioritization**: Premium users get faster processing | |
| 4. **Job batching**: Process multiple songs on same GPU instance (future) | |
| 5. **Progressive results**: Return piano transcription first, other instruments later | |
| --- | |
| ## Cost Optimization | |
| ### Development | |
| - Use CPU for small tests (slower but free) | |
| - Limit worker parallelism to 1 | |
| ### Production | |
| - **Lifecycle policies**: Delete temp files after 1 day, outputs after 30 days | |
| - **Reserved capacity**: If consistent load, reserve GPU instances (50% cheaper) | |
| - **Spot instances**: Use for non-urgent jobs (70% cheaper, can be interrupted) | |
| - **CDN caching**: Aggressive caching for static assets (frontend, model files) | |
| - **Compression**: Gzip API responses, compress audio files before storage | |
| --- | |
| ## Monitoring & Observability | |
| ### Metrics to Track | |
| - **API**: Request rate, latency, error rate | |
| - **Workers**: Queue depth, processing time per stage, GPU utilization | |
| - **Costs**: GPU hours used, storage size, API requests | |
| ### Logging | |
| - **Structured logs**: JSON format with job_id, user_id, stage | |
| - **Centralized**: CloudWatch, Datadog, or Loki | |
| ### Alerting | |
| - Worker failures exceeding 5% of jobs | |
| - Queue depth over 100 jobs (need more workers) | |
| - GPU utilization below 50% (over-provisioned) | |
| - API error rate over 1% | |
| --- | |
| ## Security Considerations | |
| ### Local Development | |
| - No auth needed | |
| - Redis on localhost only | |
| - CORS enabled for `localhost:5173` | |
| ### Production | |
| - **HTTPS only**: Enforce TLS for API and WebSocket | |
| - **API authentication**: JWT tokens for user sessions | |
| - **Rate limiting**: 10 jobs per user per hour | |
| - **Input validation**: Check YouTube URL format, max video length | |
| - **Secrets management**: Use environment variables or AWS Secrets Manager | |
| - **VPC**: API and workers in private subnets | |
| - **File scanning**: Check uploaded files for malware (if allowing file uploads) | |
| --- | |
| ## Disaster Recovery | |
| ### Backups | |
| - **Redis**: Daily snapshots to S3 | |
| - **PostgreSQL**: Automated daily backups (RDS), 7-day retention | |
| - **Code**: GitHub (already version controlled) | |
| - **Models**: Re-downloadable, no backup needed | |
| ### Incident Response | |
| - **Worker failure**: Job retried automatically (Celery) | |
| - **API crash**: ECS restarts container, ALB routes to healthy instance | |
| - **Redis failure**: ElastiCache auto-failover to standby | |
| - **Complete outage**: Deploy from last known good commit, restore DB from backup | |
| --- | |
| ## Next Steps | |
| 1. Get local development working with Docker Compose | |
| 2. Test full pipeline end-to-end with sample YouTube videos | |
| 3. Deploy proof-of-concept to Vercel + Modal for beta testing | |
| 4. Collect metrics on processing time, costs, user feedback | |
| 5. Scale to production architecture if product gains traction | |
| See [Audio Processing Pipeline](../backend/pipeline.md) for implementation details. | |