# Deployment Strategy

## Current Target: Local Development

The MVP focuses on getting the system working locally before worrying about production infrastructure.

### Local Development Architecture

```mermaid
graph TB
    subgraph DevMachine["Developer Machine (MacOS/Linux)"]
        Frontend["Frontend (React + Vite)<br/>http://localhost:5173"]
        API["Backend API (FastAPI)<br/>http://localhost:8000"]
        Redis["Redis (Queue + Cache)<br/>localhost:6379"]
        Worker["Celery Worker (GPU-enabled)<br/>- Demucs model (~350MB)<br/>- basic-pitch model (~30MB)"]
        Storage["Local Storage<br/>- /tmp/rescored/audio/ (temp)<br/>- /tmp/rescored/outputs/ (MusicXML)"]

        Frontend -->|HTTP/WS| API
        API --> Redis
        Redis --> Worker
        Worker -.-> Storage
    end
```

### Setup Requirements

**Hardware**:
- **GPU**: Apple Silicon (M1/M2/M3/M4 with MPS) OR NVIDIA GPU with 4GB+ VRAM
  - Alternative: Run on CPU (10-15x slower, acceptable for development)
- **RAM**: 16GB+ recommended
- **Disk**: 10GB for models and temp files

**Software**:
- **Python 3.10** (required for madmom compatibility)
- **Node.js 18+**
- **Redis 7+**
- **FFmpeg**
- **YouTube cookies** (required as of December 2024)

### Docker Compose Setup (Recommended)

```yaml
# docker-compose.yml
version: '3.8'

services:
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data

  api:
    build: ./backend
    ports:
      - "8000:8000"
    environment:
      - REDIS_URL=redis://redis:6379
      - STORAGE_PATH=/app/storage
    volumes:
      - ./backend:/app
      - storage:/app/storage
    depends_on:
      - redis

  worker:
    build: ./backend
    command: celery -A tasks worker --loglevel=info
    environment:
      - REDIS_URL=redis://redis:6379
      - STORAGE_PATH=/app/storage
    volumes:
      - ./backend:/app
      - storage:/app/storage
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    depends_on:
      - redis

  frontend:
    build: ./frontend
    ports:
      - "5173:5173"
    volumes:
      - ./frontend:/app
    environment:
      - VITE_API_URL=http://localhost:8000

volumes:
  redis_data:
  storage:
```

**Benefits**:
- One command to start everything: `docker-compose up`
- Consistent environment across developers
- GPU passthrough handled automatically
- Easy cleanup

**Limitations**:
- Slower hot reload than native
- GPU support requires Docker Desktop on Mac (experimental)

### Quick Start (Recommended)

Use the provided shell scripts to start/stop all services:

```bash
# From project root
./start.sh

# View logs
tail -f logs/api.log      # Backend API
tail -f logs/worker.log   # Celery worker
tail -f logs/frontend.log # Frontend

# Stop all services
./stop.sh
```

**What `start.sh` does:**
1. Starts Redis (if not already running via Homebrew)
2. Activates Python 3.10 venv
3. Starts Backend API (uvicorn) in background
4. Starts Celery Worker (--pool=solo for macOS) in background
5. Starts Frontend (npm run dev) in background
6. Writes all logs to `logs/` directory

**Services available at:**
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs

### Manual Setup (Alternative)

If you prefer to run services manually in separate terminals:

**Terminal 1 - Redis (macOS with Homebrew)**:
```bash
brew services start redis
redis-cli ping  # Should return PONG
```

**Terminal 2 - Backend API**:
```bash
cd backend
source .venv/bin/activate
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
```

**Terminal 3 - Celery Worker**:
```bash
cd backend
source .venv/bin/activate
# Use --pool=solo on macOS to avoid fork() crashes with ML libraries
celery -A tasks worker --loglevel=info --pool=solo
```

**Terminal 4 - Frontend**:
```bash
cd frontend
npm run dev
```

**Benefits**:
- Easier debugging (separate terminal per service)
- More control over each service
- See output in real-time

**Limitations**:
- Managing 4 terminals
- Need to manually stop each service

---

## Future: Production Deployment

### Phase 2 - Proof of Concept Deployment

**Goal**: Share with friends, small beta test (< 100 users)

**Architecture**:
```mermaid
graph TB
    Users[Users]
    Users --> Vercel["Vercel<br/>Frontend (React SPA)"]
    Vercel -->|HTTPS| Render["Render/Railway<br/>Backend API (FastAPI)"]
    Render --> Upstash["Upstash Redis<br/>Job queue"]
    Upstash --> Modal["Modal<br/>GPU workers (Demucs + basic-pitch)"]
    Modal --> R2["Cloudflare R2<br/>Audio/MusicXML storage"]
```

**Components**:

| Service | Provider | Why | Cost (est.) |
|---------|----------|-----|-------------|
| Frontend | Vercel | Free tier, great DX | $0 |
| Backend API | Render/Railway | Easy deploy, free tier | $0-7/month |
| Redis | Upstash | Serverless, free tier | $0 |
| GPU Workers | Modal | Pay-per-use GPU | $0.50/hour GPU time |
| Storage | Cloudflare R2 | Cheap, S3-compatible | $0.015/GB |

**Estimated Monthly Cost**: $10-50 for 100 users doing ~5 transcriptions/month
- 500 jobs/month × 2 min/job = 16 GPU hours/month = ~$8
- Storage: 100GB = ~$1.50
- Backend: Free tier (Render) or $7/month

**Deployment Flow**:
1. Push to `main` branch
2. Vercel auto-deploys frontend
3. Render auto-deploys API from Dockerfile
4. Modal workers pull latest image on invocation

**Limitations**:
- Cold starts (workers take 10-20s to start)
- No auto-scaling of API (single instance)
- Limited monitoring

---

### Phase 3 - Production Scale

**Goal**: Support 1000+ users, high availability

**Architecture**:
```mermaid
graph TB
    Users[Users]
    Users --> CDN["Cloudflare CDN<br/>Cached static assets"]
    CDN --> ALB["AWS ALB<br/>Load balancer"]
    ALB --> ECS["ECS Fargate<br/>API servers (auto-scaling 2-10)"]
    ECS --> ElastiCache["ElastiCache Redis<br/>Job queue (HA, multi-AZ)"]
    ElastiCache --> GPU["Modal/Runpod<br/>GPU workers (auto-scaling 1-20)"]
    GPU --> S3["S3<br/>Audio/MusicXML storage"]
    ECS --> RDS["RDS PostgreSQL<br/>User accounts, job history"]
```

**Infrastructure**:

| Component | Service | Scaling | Cost |
|-----------|---------|---------|------|
| CDN | Cloudflare | Global edge caching | $20/month |
| API | ECS Fargate | 2-10 instances, CPU-based autoscaling | $50-200/month |
| Redis | ElastiCache | Multi-AZ, 2 nodes | $50/month |
| Workers | Modal | 1-20 GPU instances, queue-depth scaling | $500-2000/month |
| Storage | S3 | Lifecycle policies (delete after 30 days) | $50-100/month |
| DB | RDS PostgreSQL | Multi-AZ, auto-scaling storage | $50-100/month |
| Monitoring | Datadog/Sentry | Error tracking, metrics | $50/month |

**Estimated Monthly Cost**: $800-2500 for 10k transcriptions/month

**Features**:
- **Auto-scaling**: API scales on CPU, workers scale on queue depth
- **High availability**: Multi-AZ for DB and Redis
- **Monitoring**: Full observability (logs, metrics, traces)
- **Security**: VPC, encryption at rest, HTTPS everywhere
- **CI/CD**: GitHub Actions, blue-green deployments
- **Rate limiting**: Per-user quotas, IP-based throttling

**Deployment Pipeline**:
1. PR opened → GitHub Actions runs tests
2. Merge to `main` → Docker images built and pushed to ECR
3. ECS updates task definitions (rolling update)
4. Modal pulls new worker image on next invocation
5. Cloudflare cache invalidated for frontend assets

---

## GPU Infrastructure Deep Dive

### Local GPU (Development)

**Supported**:
- NVIDIA GPUs with CUDA 11.8+ support
- Apple Silicon (MPS backend) - experimental, slower

**Setup**:
```bash
# Check GPU
nvidia-smi

# Install PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
```

**Fallback to CPU**:
```python
# In worker code
device = "cuda" if torch.cuda.is_available() else "cpu"
```

**Processing Time**:
- GPU (RTX 3080): ~45 seconds per 3-minute song
- CPU (M1 Max): ~8 minutes per 3-minute song

---

### Serverless GPU (Production)

**Option 1: Modal** (Recommended)

**Pros**:
- Fast cold starts (10-20 seconds)
- Per-second billing
- No idle GPU cost
- Great Python support
- Volumes for model caching

**Cons**:
- Newer platform (less proven)
- US-only regions currently

**Example Worker**:
```python
import modal

stub = modal.Stub("rescored")

@stub.function(
    gpu="A10G",  # NVIDIA A10G (24GB VRAM)
    timeout=600,
    volumes={"/models": modal.Volume.from_name("model-cache")}
)
def process_audio(job_id: str, audio_url: str):
    # Demucs + basic-pitch processing
    pass
```

**Cost**: ~$0.60/hour for A10G GPU

---

**Option 2: RunPod Serverless**

**Pros**:
- Cheaper than Modal ($0.30-0.50/hour)
- More GPU options
- Global regions

**Cons**:
- Slower cold starts (30-60 seconds)
- More manual setup

---

**Option 3: AWS SageMaker/Lambda**

**Pros**:
- AWS ecosystem integration
- Well-documented

**Cons**:
- Expensive for small workloads
- Slow cold starts
- More complex setup

---

**Decision for Production**: Start with Modal, evaluate RunPod if cost becomes issue.

---

## Storage Strategy

### MVP (Local)
- Temp files: `/tmp/rescored/`
- Cleanup: Manual or cron job

### Production
- **Temp audio**: S3 with 1-day lifecycle policy (delete after processing)
- **Output files**: S3 Standard for 30 days, then delete OR:
  - S3 Intelligent-Tiering if keeping long-term
- **Model files**: Baked into Docker image or cached volume (Modal)

**S3 Bucket Structure**:
```
s3://rescored-prod/
  temp-audio/
    {job_id}.wav          # Delete after 1 day
  separated-stems/
    {job_id}/
      drums.wav
      bass.wav
      ...                 # Delete after 1 day
  outputs/
    {job_id}.musicxml     # Keep for 30 days
    {job_id}.midi         # Keep for 30 days
```

---

## Scaling Bottlenecks

### What Scales Easily
- Frontend (static assets, CDN)
- API servers (stateless, horizontal scaling)
- Redis (managed service auto-scaling)

### What Doesn't Scale Easily
- **GPU workers**: Expensive, limited availability
- **Source separation**: CPU/GPU bound, can't optimize much
- **Model loading**: Large models (350MB) slow cold starts

### Mitigation Strategies
1. **Pre-warm workers**: Keep 1-2 GPU workers hot during peak hours
2. **Model caching**: Use Modal volumes or Docker layers
3. **Queue prioritization**: Premium users get faster processing
4. **Job batching**: Process multiple songs on same GPU instance (future)
5. **Progressive results**: Return piano transcription first, other instruments later

---

## Cost Optimization

### Development
- Use CPU for small tests (slower but free)
- Limit worker parallelism to 1

### Production
- **Lifecycle policies**: Delete temp files after 1 day, outputs after 30 days
- **Reserved capacity**: If consistent load, reserve GPU instances (50% cheaper)
- **Spot instances**: Use for non-urgent jobs (70% cheaper, can be interrupted)
- **CDN caching**: Aggressive caching for static assets (frontend, model files)
- **Compression**: Gzip API responses, compress audio files before storage

---

## Monitoring & Observability

### Metrics to Track
- **API**: Request rate, latency, error rate
- **Workers**: Queue depth, processing time per stage, GPU utilization
- **Costs**: GPU hours used, storage size, API requests

### Logging
- **Structured logs**: JSON format with job_id, user_id, stage
- **Centralized**: CloudWatch, Datadog, or Loki

### Alerting
- Worker failures exceeding 5% of jobs
- Queue depth over 100 jobs (need more workers)
- GPU utilization below 50% (over-provisioned)
- API error rate over 1%

---

## Security Considerations

### Local Development
- No auth needed
- Redis on localhost only
- CORS enabled for `localhost:5173`

### Production
- **HTTPS only**: Enforce TLS for API and WebSocket
- **API authentication**: JWT tokens for user sessions
- **Rate limiting**: 10 jobs per user per hour
- **Input validation**: Check YouTube URL format, max video length
- **Secrets management**: Use environment variables or AWS Secrets Manager
- **VPC**: API and workers in private subnets
- **File scanning**: Check uploaded files for malware (if allowing file uploads)

---

## Disaster Recovery

### Backups
- **Redis**: Daily snapshots to S3
- **PostgreSQL**: Automated daily backups (RDS), 7-day retention
- **Code**: GitHub (already version controlled)
- **Models**: Re-downloadable, no backup needed

### Incident Response
- **Worker failure**: Job retried automatically (Celery)
- **API crash**: ECS restarts container, ALB routes to healthy instance
- **Redis failure**: ElastiCache auto-failover to standby
- **Complete outage**: Deploy from last known good commit, restore DB from backup

---

## Next Steps

1. Get local development working with Docker Compose
2. Test full pipeline end-to-end with sample YouTube videos
3. Deploy proof-of-concept to Vercel + Modal for beta testing
4. Collect metrics on processing time, costs, user feedback
5. Scale to production architecture if product gains traction

See [Audio Processing Pipeline](../backend/pipeline.md) for implementation details.