Deployment Strategy
Current Target: Local Development
The MVP focuses on getting the system working locally before worrying about production infrastructure.
Local Development Architecture
graph TB
subgraph DevMachine["Developer Machine (MacOS/Linux)"]
Frontend["Frontend (React + Vite)<br/>http://localhost:5173"]
API["Backend API (FastAPI)<br/>http://localhost:8000"]
Redis["Redis (Queue + Cache)<br/>localhost:6379"]
Worker["Celery Worker (GPU-enabled)<br/>- Demucs model (~350MB)<br/>- basic-pitch model (~30MB)"]
Storage["Local Storage<br/>- /tmp/rescored/audio/ (temp)<br/>- /tmp/rescored/outputs/ (MusicXML)"]
Frontend -->|HTTP/WS| API
API --> Redis
Redis --> Worker
Worker -.-> Storage
end
Setup Requirements
Hardware:
- GPU: Apple Silicon (M1/M2/M3/M4 with MPS) OR NVIDIA GPU with 4GB+ VRAM
- Alternative: Run on CPU (10-15x slower, acceptable for development)
- RAM: 16GB+ recommended
- Disk: 10GB for models and temp files
Software:
- Python 3.10 (required for madmom compatibility)
- Node.js 18+
- Redis 7+
- FFmpeg
- YouTube cookies (required as of December 2024)
Docker Compose Setup (Recommended)
# docker-compose.yml
version: '3.8'
services:
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis_data:/data
api:
build: ./backend
ports:
- "8000:8000"
environment:
- REDIS_URL=redis://redis:6379
- STORAGE_PATH=/app/storage
volumes:
- ./backend:/app
- storage:/app/storage
depends_on:
- redis
worker:
build: ./backend
command: celery -A tasks worker --loglevel=info
environment:
- REDIS_URL=redis://redis:6379
- STORAGE_PATH=/app/storage
volumes:
- ./backend:/app
- storage:/app/storage
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
depends_on:
- redis
frontend:
build: ./frontend
ports:
- "5173:5173"
volumes:
- ./frontend:/app
environment:
- VITE_API_URL=http://localhost:8000
volumes:
redis_data:
storage:
Benefits:
- One command to start everything:
docker-compose up - Consistent environment across developers
- GPU passthrough handled automatically
- Easy cleanup
Limitations:
- Slower hot reload than native
- GPU support requires Docker Desktop on Mac (experimental)
Quick Start (Recommended)
Use the provided shell scripts to start/stop all services:
# From project root
./start.sh
# View logs
tail -f logs/api.log # Backend API
tail -f logs/worker.log # Celery worker
tail -f logs/frontend.log # Frontend
# Stop all services
./stop.sh
What start.sh does:
- Starts Redis (if not already running via Homebrew)
- Activates Python 3.10 venv
- Starts Backend API (uvicorn) in background
- Starts Celery Worker (--pool=solo for macOS) in background
- Starts Frontend (npm run dev) in background
- Writes all logs to
logs/directory
Services available at:
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
Manual Setup (Alternative)
If you prefer to run services manually in separate terminals:
Terminal 1 - Redis (macOS with Homebrew):
brew services start redis
redis-cli ping # Should return PONG
Terminal 2 - Backend API:
cd backend
source .venv/bin/activate
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
Terminal 3 - Celery Worker:
cd backend
source .venv/bin/activate
# Use --pool=solo on macOS to avoid fork() crashes with ML libraries
celery -A tasks worker --loglevel=info --pool=solo
Terminal 4 - Frontend:
cd frontend
npm run dev
Benefits:
- Easier debugging (separate terminal per service)
- More control over each service
- See output in real-time
Limitations:
- Managing 4 terminals
- Need to manually stop each service
Future: Production Deployment
Phase 2 - Proof of Concept Deployment
Goal: Share with friends, small beta test (< 100 users)
Architecture:
graph TB
Users[Users]
Users --> Vercel["Vercel<br/>Frontend (React SPA)"]
Vercel -->|HTTPS| Render["Render/Railway<br/>Backend API (FastAPI)"]
Render --> Upstash["Upstash Redis<br/>Job queue"]
Upstash --> Modal["Modal<br/>GPU workers (Demucs + basic-pitch)"]
Modal --> R2["Cloudflare R2<br/>Audio/MusicXML storage"]
Components:
| Service | Provider | Why | Cost (est.) |
|---|---|---|---|
| Frontend | Vercel | Free tier, great DX | $0 |
| Backend API | Render/Railway | Easy deploy, free tier | $0-7/month |
| Redis | Upstash | Serverless, free tier | $0 |
| GPU Workers | Modal | Pay-per-use GPU | $0.50/hour GPU time |
| Storage | Cloudflare R2 | Cheap, S3-compatible | $0.015/GB |
Estimated Monthly Cost: $10-50 for 100 users doing ~5 transcriptions/month
- 500 jobs/month × 2 min/job = 16 GPU hours/month = ~$8
- Storage: 100GB = ~$1.50
- Backend: Free tier (Render) or $7/month
Deployment Flow:
- Push to
mainbranch - Vercel auto-deploys frontend
- Render auto-deploys API from Dockerfile
- Modal workers pull latest image on invocation
Limitations:
- Cold starts (workers take 10-20s to start)
- No auto-scaling of API (single instance)
- Limited monitoring
Phase 3 - Production Scale
Goal: Support 1000+ users, high availability
Architecture:
graph TB
Users[Users]
Users --> CDN["Cloudflare CDN<br/>Cached static assets"]
CDN --> ALB["AWS ALB<br/>Load balancer"]
ALB --> ECS["ECS Fargate<br/>API servers (auto-scaling 2-10)"]
ECS --> ElastiCache["ElastiCache Redis<br/>Job queue (HA, multi-AZ)"]
ElastiCache --> GPU["Modal/Runpod<br/>GPU workers (auto-scaling 1-20)"]
GPU --> S3["S3<br/>Audio/MusicXML storage"]
ECS --> RDS["RDS PostgreSQL<br/>User accounts, job history"]
Infrastructure:
| Component | Service | Scaling | Cost |
|---|---|---|---|
| CDN | Cloudflare | Global edge caching | $20/month |
| API | ECS Fargate | 2-10 instances, CPU-based autoscaling | $50-200/month |
| Redis | ElastiCache | Multi-AZ, 2 nodes | $50/month |
| Workers | Modal | 1-20 GPU instances, queue-depth scaling | $500-2000/month |
| Storage | S3 | Lifecycle policies (delete after 30 days) | $50-100/month |
| DB | RDS PostgreSQL | Multi-AZ, auto-scaling storage | $50-100/month |
| Monitoring | Datadog/Sentry | Error tracking, metrics | $50/month |
Estimated Monthly Cost: $800-2500 for 10k transcriptions/month
Features:
- Auto-scaling: API scales on CPU, workers scale on queue depth
- High availability: Multi-AZ for DB and Redis
- Monitoring: Full observability (logs, metrics, traces)
- Security: VPC, encryption at rest, HTTPS everywhere
- CI/CD: GitHub Actions, blue-green deployments
- Rate limiting: Per-user quotas, IP-based throttling
Deployment Pipeline:
- PR opened → GitHub Actions runs tests
- Merge to
main→ Docker images built and pushed to ECR - ECS updates task definitions (rolling update)
- Modal pulls new worker image on next invocation
- Cloudflare cache invalidated for frontend assets
GPU Infrastructure Deep Dive
Local GPU (Development)
Supported:
- NVIDIA GPUs with CUDA 11.8+ support
- Apple Silicon (MPS backend) - experimental, slower
Setup:
# Check GPU
nvidia-smi
# Install PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Fallback to CPU:
# In worker code
device = "cuda" if torch.cuda.is_available() else "cpu"
Processing Time:
- GPU (RTX 3080): ~45 seconds per 3-minute song
- CPU (M1 Max): ~8 minutes per 3-minute song
Serverless GPU (Production)
Option 1: Modal (Recommended)
Pros:
- Fast cold starts (10-20 seconds)
- Per-second billing
- No idle GPU cost
- Great Python support
- Volumes for model caching
Cons:
- Newer platform (less proven)
- US-only regions currently
Example Worker:
import modal
stub = modal.Stub("rescored")
@stub.function(
gpu="A10G", # NVIDIA A10G (24GB VRAM)
timeout=600,
volumes={"/models": modal.Volume.from_name("model-cache")}
)
def process_audio(job_id: str, audio_url: str):
# Demucs + basic-pitch processing
pass
Cost: ~$0.60/hour for A10G GPU
Option 2: RunPod Serverless
Pros:
- Cheaper than Modal ($0.30-0.50/hour)
- More GPU options
- Global regions
Cons:
- Slower cold starts (30-60 seconds)
- More manual setup
Option 3: AWS SageMaker/Lambda
Pros:
- AWS ecosystem integration
- Well-documented
Cons:
- Expensive for small workloads
- Slow cold starts
- More complex setup
Decision for Production: Start with Modal, evaluate RunPod if cost becomes issue.
Storage Strategy
MVP (Local)
- Temp files:
/tmp/rescored/ - Cleanup: Manual or cron job
Production
- Temp audio: S3 with 1-day lifecycle policy (delete after processing)
- Output files: S3 Standard for 30 days, then delete OR:
- S3 Intelligent-Tiering if keeping long-term
- Model files: Baked into Docker image or cached volume (Modal)
S3 Bucket Structure:
s3://rescored-prod/
temp-audio/
{job_id}.wav # Delete after 1 day
separated-stems/
{job_id}/
drums.wav
bass.wav
... # Delete after 1 day
outputs/
{job_id}.musicxml # Keep for 30 days
{job_id}.midi # Keep for 30 days
Scaling Bottlenecks
What Scales Easily
- Frontend (static assets, CDN)
- API servers (stateless, horizontal scaling)
- Redis (managed service auto-scaling)
What Doesn't Scale Easily
- GPU workers: Expensive, limited availability
- Source separation: CPU/GPU bound, can't optimize much
- Model loading: Large models (350MB) slow cold starts
Mitigation Strategies
- Pre-warm workers: Keep 1-2 GPU workers hot during peak hours
- Model caching: Use Modal volumes or Docker layers
- Queue prioritization: Premium users get faster processing
- Job batching: Process multiple songs on same GPU instance (future)
- Progressive results: Return piano transcription first, other instruments later
Cost Optimization
Development
- Use CPU for small tests (slower but free)
- Limit worker parallelism to 1
Production
- Lifecycle policies: Delete temp files after 1 day, outputs after 30 days
- Reserved capacity: If consistent load, reserve GPU instances (50% cheaper)
- Spot instances: Use for non-urgent jobs (70% cheaper, can be interrupted)
- CDN caching: Aggressive caching for static assets (frontend, model files)
- Compression: Gzip API responses, compress audio files before storage
Monitoring & Observability
Metrics to Track
- API: Request rate, latency, error rate
- Workers: Queue depth, processing time per stage, GPU utilization
- Costs: GPU hours used, storage size, API requests
Logging
- Structured logs: JSON format with job_id, user_id, stage
- Centralized: CloudWatch, Datadog, or Loki
Alerting
- Worker failures exceeding 5% of jobs
- Queue depth over 100 jobs (need more workers)
- GPU utilization below 50% (over-provisioned)
- API error rate over 1%
Security Considerations
Local Development
- No auth needed
- Redis on localhost only
- CORS enabled for
localhost:5173
Production
- HTTPS only: Enforce TLS for API and WebSocket
- API authentication: JWT tokens for user sessions
- Rate limiting: 10 jobs per user per hour
- Input validation: Check YouTube URL format, max video length
- Secrets management: Use environment variables or AWS Secrets Manager
- VPC: API and workers in private subnets
- File scanning: Check uploaded files for malware (if allowing file uploads)
Disaster Recovery
Backups
- Redis: Daily snapshots to S3
- PostgreSQL: Automated daily backups (RDS), 7-day retention
- Code: GitHub (already version controlled)
- Models: Re-downloadable, no backup needed
Incident Response
- Worker failure: Job retried automatically (Celery)
- API crash: ECS restarts container, ALB routes to healthy instance
- Redis failure: ElastiCache auto-failover to standby
- Complete outage: Deploy from last known good commit, restore DB from backup
Next Steps
- Get local development working with Docker Compose
- Test full pipeline end-to-end with sample YouTube videos
- Deploy proof-of-concept to Vercel + Modal for beta testing
- Collect metrics on processing time, costs, user feedback
- Scale to production architecture if product gains traction
See Audio Processing Pipeline for implementation details.