rescored / docs /architecture /tech-stack.md
calebhan's picture
yourmt3 integration and refactor
75d3906
# Technology Stack & Decisions
## Overview
This document details the technology choices for Rescored, including alternatives considered and trade-offs that informed each decision.
## Frontend Technologies
### UI Framework: React
**Chosen**: React 18+
**Why**:
- Largest ecosystem for music-related JavaScript libraries
- VexFlow and Tone.js have good React integration patterns
- Component model fits notation editing (each measure/staff as component)
- Excellent dev tooling (React DevTools, Fast Refresh)
- Familiarity and hiring pool
**Alternatives Considered**:
| Option | Pros | Cons | Why Not Chosen |
|--------|------|------|----------------|
| Vue 3 | Simpler API, lighter weight | Smaller ecosystem for music libraries | Less community support for music notation |
| Svelte | Excellent performance, less boilerplate | Immature ecosystem | Risk for complex audio/notation needs |
| Vanilla JS | Full control, no framework overhead | Much more code to manage state | Notation editing is complex, need good state management |
**Decision**: React's ecosystem and component model outweigh its learning curve.
---
### Notation Rendering: VexFlow
**Chosen**: VexFlow 4.x
**Why**:
- Pure JavaScript, runs entirely in browser
- Programmatic API for rendering notation (good for editing)
- Generates clean SVG that we can attach event listeners to
- Active maintenance, good documentation
- Used in production by Flat.io, Soundslice
**Alternatives Considered**:
| Option | Pros | Cons | Why Not Chosen |
|--------|------|------|----------------|
| OpenSheetMusicDisplay (OSMD) | Better MusicXML support, prettier output | Harder to build editing on top, heavier bundle | Optimized for display, not editing |
| music21.js | Pythonic API, good theory support | Limited rendering, not designed for web | Better as backend tool |
| abcjs | Lightweight, simple syntax | ABC notation less standard than MusicXML | MusicXML is industry standard |
| Custom renderer | Full control | Months of work to match VexFlow quality | Not worth reinventing wheel |
**Decision**: VexFlow strikes the best balance between rendering quality and edit-ability.
---
### Audio Playback: Tone.js
**Chosen**: Tone.js 14+
**Why**:
- High-level abstractions over Web Audio API
- Built-in scheduling for precise timing
- Multiple synthesis methods (samples, FM, AM)
- Transport controls (play, pause, seek, loop)
- MIDI playback support via `Tone.Sampler`
**Alternatives Considered**:
| Option | Pros | Cons | Why Not Chosen |
|--------|------|------|----------------|
| Web Audio API (raw) | Maximum control, no dependencies | Requires lots of boilerplate | Too low-level for quick MVP |
| Howler.js | Simple API, good for sound effects | Not designed for music, no MIDI | No timing control for notation sync |
| MIDIjs | Simple MIDI playback | Limited synthesis, GM soundfonts | Lower quality sound than Tone.js samplers |
| SoundFont2.js | Authentic GM sounds | Large file sizes, older API | Tone.js can load SoundFonts if needed |
**Decision**: Tone.js provides the right abstraction level for MIDI playback with good sound quality.
---
### State Management: Zustand
**Chosen**: Zustand (tentative)
**Why**:
- Minimal boilerplate compared to Redux
- Works well with React hooks
- Good for global state (notation data, playback state)
- Small bundle size (~1KB)
**Alternatives Considered**:
| Option | Pros | Cons | Why Not Chosen |
|--------|------|------|----------------|
| Redux Toolkit | Battle-tested, great DevTools | More boilerplate, steeper learning curve | Overkill for MVP |
| React Context | Built-in, no deps | Performance issues with frequent updates | Notation editing has lots of updates |
| Jotai/Recoil | Atomic state, very modern | Newer, smaller ecosystem | Zustand more proven |
| Local state only | Simplest | Hard to share state across components | Need global notation state |
**Decision**: Zustand for MVP, can migrate to Redux if needed later.
---
## Backend Technologies
### API Framework: FastAPI
**Chosen**: FastAPI (Python 3.11+)
**Why**:
- Async Python (critical for WebSocket connections)
- Auto-generated OpenAPI docs (Swagger UI)
- Native WebSocket support
- Type hints for better code quality
- Integrates well with Python ML libraries (Demucs, basic-pitch)
- Excellent performance (on par with Node.js)
**Alternatives Considered**:
| Option | Pros | Cons | Why Not Chosen |
|--------|------|------|----------------|
| Node.js (Express) | Async by default, JavaScript everywhere | Worse ML library support | ML models are Python-first |
| Flask | Simple, well-known | No async support, manual WebSocket setup | FastAPI is modern Flask |
| Django | Full-featured, admin panel | Heavy, slower, less async support | Overkill for API-only service |
| Go (Gin/Fiber) | Excellent performance | Weaker ML ecosystem, FFI overhead | Python has better audio/ML tools |
**Decision**: FastAPI combines async support with Python's ML ecosystem.
---
### Task Queue: Celery + Redis
**Chosen**: Celery 5.x with Redis as broker
**Why**:
- Industry standard for async Python tasks
- Reliable, battle-tested in production
- Priority queues (transcription vs. export jobs)
- Automatic retries and error handling
- Redis is fast, simple, good for both queue and caching
**Alternatives Considered**:
| Option | Pros | Cons | Why Not Chosen |
|--------|------|------|----------------|
| RQ (Redis Queue) | Simpler API than Celery | Fewer features, less ecosystem | Need advanced features (priorities, chaining) |
| Dramatiq | Modern, better API than Celery | Smaller community, less mature | Celery's ecosystem worth the complexity |
| BullMQ (Node) | Excellent, modern | Requires Node backend | Using Python for ML libraries |
| Cloud tasks (GCP/AWS) | Managed service, no infrastructure | Vendor lock-in, cold starts | Local dev first |
**Decision**: Celery's maturity and feature set justify the learning curve.
---
## ML/Audio Technologies
### Source Separation: Demucs
**Chosen**: Demucs v4 (Meta Research)
**Why**:
- State-of-the-art audio separation quality (MDX leaderboard winner)
- 4-stem model (drums, bass, vocals, other) is good default
- 6-stem model available (drums, bass, vocals, guitar, piano, other)
- Open-source, MIT license
- PyTorch model, runs on GPU
**Alternatives Considered**:
| Option | Pros | Cons | Why Not Chosen |
|--------|------|------|----------------|
| Spleeter | Faster, lighter | Lower quality, no longer actively developed | Quality matters more than speed |
| X-UMX | Open-source, good quality | Slower than Demucs | Demucs quality worth extra time |
| commercial APIs | No GPU needed, better quality | Costly ($0.10+/song), privacy concerns | Local processing preferred for MVP |
**Decision**: Demucs offers best quality for a self-hosted solution.
---
### Transcription: YourMT3+ (Primary) + basic-pitch (Fallback)
**Chosen**: YourMT3+ (KAIST) with automatic fallback to basic-pitch (Spotify)
**Why YourMT3+**:
- **80-85% accuracy** vs 70% for basic-pitch
- State-of-the-art multi-instrument transcription model
- Mixture of Experts architecture for better quality
- Perceiver-TF encoder with RoPE position encoding
- Trained on diverse datasets (30k+ songs, 13 instrument classes)
- Open-source, actively maintained
- Optimized for Apple Silicon (MPS) with float16 precision (14x speedup)
**Why basic-pitch as Fallback**:
- Polyphonic transcription (multiple notes at once)
- Lighter weight, faster inference
- Simple setup, no model download required
- Good baseline quality (70% accuracy)
- Automatically used if YourMT3+ unavailable
**Alternatives Considered**:
| Option | Pros | Cons | Why Not Chosen |
|--------|------|------|----------------|
| MT3 (Music Transformer) | Google's latest, multi-instrument aware | Slower, larger model, harder to run | YourMT3+ more accurate |
| Omnizart | Multi-instrument, good documentation | Lower accuracy than YourMT3+, slower | Removed in favor of YourMT3+ |
| Tony (pYIN) | Excellent for monophonic | Only monophonic | Need polyphonic support |
| commercial APIs | Better quality | Expensive, privacy concerns | Local processing preferred |
**Decision**: YourMT3+ offers the best accuracy for self-hosted solution with intelligent fallback to basic-pitch for reliability.
---
## File Formats
### Primary Format: MusicXML
**Chosen**: MusicXML 4.0
**Why**:
- Industry-standard interchange format
- Supported by all major notation software (Finale, Sibelius, MuseScore, Dorico)
- Preserves notation semantics (clefs, articulations, lyrics)
- Human-readable XML (good for debugging)
- VexFlow can parse it directly
**Alternatives Considered**:
| Option | Pros | Cons | Why Not Chosen |
|--------|------|------|----------------|
| MIDI | Universal, compact, great for playback | No notation info (clefs, staff layout) | Complementary, not replacement |
| MEI (Music Encoding Initiative) | More expressive than MusicXML | Less tool support, steeper learning curve | MusicXML more widely adopted |
| ABC Notation | Human-readable text | Limited notation features, less standard | Better for folk music than general use |
| Proprietary (Finale .musx) | Native to notation software | Requires specific tools to read | MusicXML is open standard |
**Decision**: MusicXML is the universal standard for notation exchange.
---
### Intermediate Format: MIDI
**Chosen**: MIDI 1.0 (SMF Type 1)
**Why**:
- Universal output format from transcription models
- Easy to convert to MusicXML
- Useful for export option
- Tone.js plays MIDI directly
**Why Not Sufficient Alone**:
- Lacks notation semantics (clefs, key signatures, measure boundaries)
- No staff layout information
- Ambiguous rhythmic notation
---
## Development Tools
### Python Package Manager: uv or Poetry
**Chosen**: uv (recommended) or Poetry
**Why**:
- Reproducible builds with lock files
- Virtual environment management
- Faster than pip for large dependencies (PyTorch, etc.)
---
### Frontend Build Tool: Vite
**Chosen**: Vite
**Why**:
- Fast dev server with HMR
- Modern, best-in-class DX
- Great for React apps
- Smaller bundles than Webpack
---
### Containerization: Docker
**Chosen**: Docker + Docker Compose
**Why**:
- Consistent dev environment across machines
- Easy GPU passthrough for Demucs
- Simplifies Redis, API, worker orchestration
---
## Infrastructure (Future)
### Frontend Hosting: Vercel
**Recommended**: Vercel
**Why**:
- Excellent React/Vite support
- Global CDN
- Preview deployments for PRs
- Free tier is generous
**Alternative**: Netlify, Cloudflare Pages, AWS S3 + CloudFront
---
### Backend Hosting: Cloud Run or Modal
**Recommended**: Modal (for GPU workers)
**Why**:
- Serverless GPU containers
- Pay-per-use (no idle GPU cost)
- Fast cold starts
- Good Python support
**Alternative**: AWS ECS with GPU instances, GCP Cloud Run (CPU only, need separate GPU service)
---
### Database: PostgreSQL (future)
**Not needed for MVP** (using Redis for job state)
**When to add**:
- User accounts and auth
- Persistent job history
- Sharing features
---
## Decision Criteria Summary
When evaluating technologies, we prioritized:
1. **Quality Over Speed**: Better transcription/rendering > faster processing
2. **Open Source First**: Avoid vendor lock-in, control costs
3. **Python for ML**: Ecosystem too strong to ignore
4. **Standard Formats**: MusicXML/MIDI over proprietary
5. **Proven Tech**: Prefer mature libraries over bleeding edge
6. **Developer Experience**: Good docs and tooling matter
## Trade-off Examples
### Demucs vs. Spleeter
- **Chose Demucs**: Better quality worth 2x processing time
- **Rationale**: Users wait minutes anyway, quality is paramount
### VexFlow vs. OSMD
- **Chose VexFlow**: Editing capability > slightly better rendering
- **Rationale**: Users will edit output, need programmatic access
### FastAPI vs. Django
- **Chose FastAPI**: Async WebSocket support > admin panel
- **Rationale**: Real-time updates critical, don't need admin UI
## Next Steps
See [Deployment Strategy](deployment.md) for how these technologies deploy.