# Technology Stack & Decisions

## Overview

This document details the technology choices for Rescored, including alternatives considered and trade-offs that informed each decision.

## Frontend Technologies

### UI Framework: React

**Chosen**: React 18+

**Why**:
- Largest ecosystem for music-related JavaScript libraries
- VexFlow and Tone.js have good React integration patterns
- Component model fits notation editing (each measure/staff as component)
- Excellent dev tooling (React DevTools, Fast Refresh)
- Familiarity and hiring pool

**Alternatives Considered**:

| Option | Pros | Cons | Why Not Chosen |
|--------|------|------|----------------|
| Vue 3 | Simpler API, lighter weight | Smaller ecosystem for music libraries | Less community support for music notation |
| Svelte | Excellent performance, less boilerplate | Immature ecosystem | Risk for complex audio/notation needs |
| Vanilla JS | Full control, no framework overhead | Much more code to manage state | Notation editing is complex, need good state management |

**Decision**: React's ecosystem and component model outweigh its learning curve.

---

### Notation Rendering: VexFlow

**Chosen**: VexFlow 4.x

**Why**:
- Pure JavaScript, runs entirely in browser
- Programmatic API for rendering notation (good for editing)
- Generates clean SVG that we can attach event listeners to
- Active maintenance, good documentation
- Used in production by Flat.io, Soundslice

**Alternatives Considered**:

| Option | Pros | Cons | Why Not Chosen |
|--------|------|------|----------------|
| OpenSheetMusicDisplay (OSMD) | Better MusicXML support, prettier output | Harder to build editing on top, heavier bundle | Optimized for display, not editing |
| music21.js | Pythonic API, good theory support | Limited rendering, not designed for web | Better as backend tool |
| abcjs | Lightweight, simple syntax | ABC notation less standard than MusicXML | MusicXML is industry standard |
| Custom renderer | Full control | Months of work to match VexFlow quality | Not worth reinventing wheel |

**Decision**: VexFlow strikes the best balance between rendering quality and edit-ability.

---

### Audio Playback: Tone.js

**Chosen**: Tone.js 14+

**Why**:
- High-level abstractions over Web Audio API
- Built-in scheduling for precise timing
- Multiple synthesis methods (samples, FM, AM)
- Transport controls (play, pause, seek, loop)
- MIDI playback support via `Tone.Sampler`

**Alternatives Considered**:

| Option | Pros | Cons | Why Not Chosen |
|--------|------|------|----------------|
| Web Audio API (raw) | Maximum control, no dependencies | Requires lots of boilerplate | Too low-level for quick MVP |
| Howler.js | Simple API, good for sound effects | Not designed for music, no MIDI | No timing control for notation sync |
| MIDIjs | Simple MIDI playback | Limited synthesis, GM soundfonts | Lower quality sound than Tone.js samplers |
| SoundFont2.js | Authentic GM sounds | Large file sizes, older API | Tone.js can load SoundFonts if needed |

**Decision**: Tone.js provides the right abstraction level for MIDI playback with good sound quality.

---

### State Management: Zustand

**Chosen**: Zustand (tentative)

**Why**:
- Minimal boilerplate compared to Redux
- Works well with React hooks
- Good for global state (notation data, playback state)
- Small bundle size (~1KB)

**Alternatives Considered**:

| Option | Pros | Cons | Why Not Chosen |
|--------|------|------|----------------|
| Redux Toolkit | Battle-tested, great DevTools | More boilerplate, steeper learning curve | Overkill for MVP |
| React Context | Built-in, no deps | Performance issues with frequent updates | Notation editing has lots of updates |
| Jotai/Recoil | Atomic state, very modern | Newer, smaller ecosystem | Zustand more proven |
| Local state only | Simplest | Hard to share state across components | Need global notation state |

**Decision**: Zustand for MVP, can migrate to Redux if needed later.

---

## Backend Technologies

### API Framework: FastAPI

**Chosen**: FastAPI (Python 3.11+)

**Why**:
- Async Python (critical for WebSocket connections)
- Auto-generated OpenAPI docs (Swagger UI)
- Native WebSocket support
- Type hints for better code quality
- Integrates well with Python ML libraries (Demucs, basic-pitch)
- Excellent performance (on par with Node.js)

**Alternatives Considered**:

| Option | Pros | Cons | Why Not Chosen |
|--------|------|------|----------------|
| Node.js (Express) | Async by default, JavaScript everywhere | Worse ML library support | ML models are Python-first |
| Flask | Simple, well-known | No async support, manual WebSocket setup | FastAPI is modern Flask |
| Django | Full-featured, admin panel | Heavy, slower, less async support | Overkill for API-only service |
| Go (Gin/Fiber) | Excellent performance | Weaker ML ecosystem, FFI overhead | Python has better audio/ML tools |

**Decision**: FastAPI combines async support with Python's ML ecosystem.

---

### Task Queue: Celery + Redis

**Chosen**: Celery 5.x with Redis as broker

**Why**:
- Industry standard for async Python tasks
- Reliable, battle-tested in production
- Priority queues (transcription vs. export jobs)
- Automatic retries and error handling
- Redis is fast, simple, good for both queue and caching

**Alternatives Considered**:

| Option | Pros | Cons | Why Not Chosen |
|--------|------|------|----------------|
| RQ (Redis Queue) | Simpler API than Celery | Fewer features, less ecosystem | Need advanced features (priorities, chaining) |
| Dramatiq | Modern, better API than Celery | Smaller community, less mature | Celery's ecosystem worth the complexity |
| BullMQ (Node) | Excellent, modern | Requires Node backend | Using Python for ML libraries |
| Cloud tasks (GCP/AWS) | Managed service, no infrastructure | Vendor lock-in, cold starts | Local dev first |

**Decision**: Celery's maturity and feature set justify the learning curve.

---

## ML/Audio Technologies

### Source Separation: Demucs

**Chosen**: Demucs v4 (Meta Research)

**Why**:
- State-of-the-art audio separation quality (MDX leaderboard winner)
- 4-stem model (drums, bass, vocals, other) is good default
- 6-stem model available (drums, bass, vocals, guitar, piano, other)
- Open-source, MIT license
- PyTorch model, runs on GPU

**Alternatives Considered**:

| Option | Pros | Cons | Why Not Chosen |
|--------|------|------|----------------|
| Spleeter | Faster, lighter | Lower quality, no longer actively developed | Quality matters more than speed |
| X-UMX | Open-source, good quality | Slower than Demucs | Demucs quality worth extra time |
| commercial APIs | No GPU needed, better quality | Costly ($0.10+/song), privacy concerns | Local processing preferred for MVP |

**Decision**: Demucs offers best quality for a self-hosted solution.

---

### Transcription: YourMT3+ (Primary) + basic-pitch (Fallback)

**Chosen**: YourMT3+ (KAIST) with automatic fallback to basic-pitch (Spotify)

**Why YourMT3+**:
- **80-85% accuracy** vs 70% for basic-pitch
- State-of-the-art multi-instrument transcription model
- Mixture of Experts architecture for better quality
- Perceiver-TF encoder with RoPE position encoding
- Trained on diverse datasets (30k+ songs, 13 instrument classes)
- Open-source, actively maintained
- Optimized for Apple Silicon (MPS) with float16 precision (14x speedup)

**Why basic-pitch as Fallback**:
- Polyphonic transcription (multiple notes at once)
- Lighter weight, faster inference
- Simple setup, no model download required
- Good baseline quality (70% accuracy)
- Automatically used if YourMT3+ unavailable

**Alternatives Considered**:

| Option | Pros | Cons | Why Not Chosen |
|--------|------|------|----------------|
| MT3 (Music Transformer) | Google's latest, multi-instrument aware | Slower, larger model, harder to run | YourMT3+ more accurate |
| Omnizart | Multi-instrument, good documentation | Lower accuracy than YourMT3+, slower | Removed in favor of YourMT3+ |
| Tony (pYIN) | Excellent for monophonic | Only monophonic | Need polyphonic support |
| commercial APIs | Better quality | Expensive, privacy concerns | Local processing preferred |

**Decision**: YourMT3+ offers the best accuracy for self-hosted solution with intelligent fallback to basic-pitch for reliability.

---

## File Formats

### Primary Format: MusicXML

**Chosen**: MusicXML 4.0

**Why**:
- Industry-standard interchange format
- Supported by all major notation software (Finale, Sibelius, MuseScore, Dorico)
- Preserves notation semantics (clefs, articulations, lyrics)
- Human-readable XML (good for debugging)
- VexFlow can parse it directly

**Alternatives Considered**:

| Option | Pros | Cons | Why Not Chosen |
|--------|------|------|----------------|
| MIDI | Universal, compact, great for playback | No notation info (clefs, staff layout) | Complementary, not replacement |
| MEI (Music Encoding Initiative) | More expressive than MusicXML | Less tool support, steeper learning curve | MusicXML more widely adopted |
| ABC Notation | Human-readable text | Limited notation features, less standard | Better for folk music than general use |
| Proprietary (Finale .musx) | Native to notation software | Requires specific tools to read | MusicXML is open standard |

**Decision**: MusicXML is the universal standard for notation exchange.

---

### Intermediate Format: MIDI

**Chosen**: MIDI 1.0 (SMF Type 1)

**Why**:
- Universal output format from transcription models
- Easy to convert to MusicXML
- Useful for export option
- Tone.js plays MIDI directly

**Why Not Sufficient Alone**:
- Lacks notation semantics (clefs, key signatures, measure boundaries)
- No staff layout information
- Ambiguous rhythmic notation

---

## Development Tools

### Python Package Manager: uv or Poetry

**Chosen**: uv (recommended) or Poetry

**Why**:
- Reproducible builds with lock files
- Virtual environment management
- Faster than pip for large dependencies (PyTorch, etc.)

---

### Frontend Build Tool: Vite

**Chosen**: Vite

**Why**:
- Fast dev server with HMR
- Modern, best-in-class DX
- Great for React apps
- Smaller bundles than Webpack

---

### Containerization: Docker

**Chosen**: Docker + Docker Compose

**Why**:
- Consistent dev environment across machines
- Easy GPU passthrough for Demucs
- Simplifies Redis, API, worker orchestration

---

## Infrastructure (Future)

### Frontend Hosting: Vercel

**Recommended**: Vercel

**Why**:
- Excellent React/Vite support
- Global CDN
- Preview deployments for PRs
- Free tier is generous

**Alternative**: Netlify, Cloudflare Pages, AWS S3 + CloudFront

---

### Backend Hosting: Cloud Run or Modal

**Recommended**: Modal (for GPU workers)

**Why**:
- Serverless GPU containers
- Pay-per-use (no idle GPU cost)
- Fast cold starts
- Good Python support

**Alternative**: AWS ECS with GPU instances, GCP Cloud Run (CPU only, need separate GPU service)

---

### Database: PostgreSQL (future)

**Not needed for MVP** (using Redis for job state)

**When to add**:
- User accounts and auth
- Persistent job history
- Sharing features

---

## Decision Criteria Summary

When evaluating technologies, we prioritized:

1. **Quality Over Speed**: Better transcription/rendering > faster processing
2. **Open Source First**: Avoid vendor lock-in, control costs
3. **Python for ML**: Ecosystem too strong to ignore
4. **Standard Formats**: MusicXML/MIDI over proprietary
5. **Proven Tech**: Prefer mature libraries over bleeding edge
6. **Developer Experience**: Good docs and tooling matter

## Trade-off Examples

### Demucs vs. Spleeter
- **Chose Demucs**: Better quality worth 2x processing time
- **Rationale**: Users wait minutes anyway, quality is paramount

### VexFlow vs. OSMD
- **Chose VexFlow**: Editing capability > slightly better rendering
- **Rationale**: Users will edit output, need programmatic access

### FastAPI vs. Django
- **Chose FastAPI**: Async WebSocket support > admin panel
- **Rationale**: Real-time updates critical, don't need admin UI

## Next Steps

See [Deployment Strategy](deployment.md) for how these technologies deploy.