| # Technology Stack & Decisions | |
| ## Overview | |
| This document details the technology choices for Rescored, including alternatives considered and trade-offs that informed each decision. | |
| ## Frontend Technologies | |
| ### UI Framework: React | |
| **Chosen**: React 18+ | |
| **Why**: | |
| - Largest ecosystem for music-related JavaScript libraries | |
| - VexFlow and Tone.js have good React integration patterns | |
| - Component model fits notation editing (each measure/staff as component) | |
| - Excellent dev tooling (React DevTools, Fast Refresh) | |
| - Familiarity and hiring pool | |
| **Alternatives Considered**: | |
| | Option | Pros | Cons | Why Not Chosen | | |
| |--------|------|------|----------------| | |
| | Vue 3 | Simpler API, lighter weight | Smaller ecosystem for music libraries | Less community support for music notation | | |
| | Svelte | Excellent performance, less boilerplate | Immature ecosystem | Risk for complex audio/notation needs | | |
| | Vanilla JS | Full control, no framework overhead | Much more code to manage state | Notation editing is complex, need good state management | | |
| **Decision**: React's ecosystem and component model outweigh its learning curve. | |
| --- | |
| ### Notation Rendering: VexFlow | |
| **Chosen**: VexFlow 4.x | |
| **Why**: | |
| - Pure JavaScript, runs entirely in browser | |
| - Programmatic API for rendering notation (good for editing) | |
| - Generates clean SVG that we can attach event listeners to | |
| - Active maintenance, good documentation | |
| - Used in production by Flat.io, Soundslice | |
| **Alternatives Considered**: | |
| | Option | Pros | Cons | Why Not Chosen | | |
| |--------|------|------|----------------| | |
| | OpenSheetMusicDisplay (OSMD) | Better MusicXML support, prettier output | Harder to build editing on top, heavier bundle | Optimized for display, not editing | | |
| | music21.js | Pythonic API, good theory support | Limited rendering, not designed for web | Better as backend tool | | |
| | abcjs | Lightweight, simple syntax | ABC notation less standard than MusicXML | MusicXML is industry standard | | |
| | Custom renderer | Full control | Months of work to match VexFlow quality | Not worth reinventing wheel | | |
| **Decision**: VexFlow strikes the best balance between rendering quality and edit-ability. | |
| --- | |
| ### Audio Playback: Tone.js | |
| **Chosen**: Tone.js 14+ | |
| **Why**: | |
| - High-level abstractions over Web Audio API | |
| - Built-in scheduling for precise timing | |
| - Multiple synthesis methods (samples, FM, AM) | |
| - Transport controls (play, pause, seek, loop) | |
| - MIDI playback support via `Tone.Sampler` | |
| **Alternatives Considered**: | |
| | Option | Pros | Cons | Why Not Chosen | | |
| |--------|------|------|----------------| | |
| | Web Audio API (raw) | Maximum control, no dependencies | Requires lots of boilerplate | Too low-level for quick MVP | | |
| | Howler.js | Simple API, good for sound effects | Not designed for music, no MIDI | No timing control for notation sync | | |
| | MIDIjs | Simple MIDI playback | Limited synthesis, GM soundfonts | Lower quality sound than Tone.js samplers | | |
| | SoundFont2.js | Authentic GM sounds | Large file sizes, older API | Tone.js can load SoundFonts if needed | | |
| **Decision**: Tone.js provides the right abstraction level for MIDI playback with good sound quality. | |
| --- | |
| ### State Management: Zustand | |
| **Chosen**: Zustand (tentative) | |
| **Why**: | |
| - Minimal boilerplate compared to Redux | |
| - Works well with React hooks | |
| - Good for global state (notation data, playback state) | |
| - Small bundle size (~1KB) | |
| **Alternatives Considered**: | |
| | Option | Pros | Cons | Why Not Chosen | | |
| |--------|------|------|----------------| | |
| | Redux Toolkit | Battle-tested, great DevTools | More boilerplate, steeper learning curve | Overkill for MVP | | |
| | React Context | Built-in, no deps | Performance issues with frequent updates | Notation editing has lots of updates | | |
| | Jotai/Recoil | Atomic state, very modern | Newer, smaller ecosystem | Zustand more proven | | |
| | Local state only | Simplest | Hard to share state across components | Need global notation state | | |
| **Decision**: Zustand for MVP, can migrate to Redux if needed later. | |
| --- | |
| ## Backend Technologies | |
| ### API Framework: FastAPI | |
| **Chosen**: FastAPI (Python 3.11+) | |
| **Why**: | |
| - Async Python (critical for WebSocket connections) | |
| - Auto-generated OpenAPI docs (Swagger UI) | |
| - Native WebSocket support | |
| - Type hints for better code quality | |
| - Integrates well with Python ML libraries (Demucs, basic-pitch) | |
| - Excellent performance (on par with Node.js) | |
| **Alternatives Considered**: | |
| | Option | Pros | Cons | Why Not Chosen | | |
| |--------|------|------|----------------| | |
| | Node.js (Express) | Async by default, JavaScript everywhere | Worse ML library support | ML models are Python-first | | |
| | Flask | Simple, well-known | No async support, manual WebSocket setup | FastAPI is modern Flask | | |
| | Django | Full-featured, admin panel | Heavy, slower, less async support | Overkill for API-only service | | |
| | Go (Gin/Fiber) | Excellent performance | Weaker ML ecosystem, FFI overhead | Python has better audio/ML tools | | |
| **Decision**: FastAPI combines async support with Python's ML ecosystem. | |
| --- | |
| ### Task Queue: Celery + Redis | |
| **Chosen**: Celery 5.x with Redis as broker | |
| **Why**: | |
| - Industry standard for async Python tasks | |
| - Reliable, battle-tested in production | |
| - Priority queues (transcription vs. export jobs) | |
| - Automatic retries and error handling | |
| - Redis is fast, simple, good for both queue and caching | |
| **Alternatives Considered**: | |
| | Option | Pros | Cons | Why Not Chosen | | |
| |--------|------|------|----------------| | |
| | RQ (Redis Queue) | Simpler API than Celery | Fewer features, less ecosystem | Need advanced features (priorities, chaining) | | |
| | Dramatiq | Modern, better API than Celery | Smaller community, less mature | Celery's ecosystem worth the complexity | | |
| | BullMQ (Node) | Excellent, modern | Requires Node backend | Using Python for ML libraries | | |
| | Cloud tasks (GCP/AWS) | Managed service, no infrastructure | Vendor lock-in, cold starts | Local dev first | | |
| **Decision**: Celery's maturity and feature set justify the learning curve. | |
| --- | |
| ## ML/Audio Technologies | |
| ### Source Separation: Demucs | |
| **Chosen**: Demucs v4 (Meta Research) | |
| **Why**: | |
| - State-of-the-art audio separation quality (MDX leaderboard winner) | |
| - 4-stem model (drums, bass, vocals, other) is good default | |
| - 6-stem model available (drums, bass, vocals, guitar, piano, other) | |
| - Open-source, MIT license | |
| - PyTorch model, runs on GPU | |
| **Alternatives Considered**: | |
| | Option | Pros | Cons | Why Not Chosen | | |
| |--------|------|------|----------------| | |
| | Spleeter | Faster, lighter | Lower quality, no longer actively developed | Quality matters more than speed | | |
| | X-UMX | Open-source, good quality | Slower than Demucs | Demucs quality worth extra time | | |
| | commercial APIs | No GPU needed, better quality | Costly ($0.10+/song), privacy concerns | Local processing preferred for MVP | | |
| **Decision**: Demucs offers best quality for a self-hosted solution. | |
| --- | |
| ### Transcription: YourMT3+ (Primary) + basic-pitch (Fallback) | |
| **Chosen**: YourMT3+ (KAIST) with automatic fallback to basic-pitch (Spotify) | |
| **Why YourMT3+**: | |
| - **80-85% accuracy** vs 70% for basic-pitch | |
| - State-of-the-art multi-instrument transcription model | |
| - Mixture of Experts architecture for better quality | |
| - Perceiver-TF encoder with RoPE position encoding | |
| - Trained on diverse datasets (30k+ songs, 13 instrument classes) | |
| - Open-source, actively maintained | |
| - Optimized for Apple Silicon (MPS) with float16 precision (14x speedup) | |
| **Why basic-pitch as Fallback**: | |
| - Polyphonic transcription (multiple notes at once) | |
| - Lighter weight, faster inference | |
| - Simple setup, no model download required | |
| - Good baseline quality (70% accuracy) | |
| - Automatically used if YourMT3+ unavailable | |
| **Alternatives Considered**: | |
| | Option | Pros | Cons | Why Not Chosen | | |
| |--------|------|------|----------------| | |
| | MT3 (Music Transformer) | Google's latest, multi-instrument aware | Slower, larger model, harder to run | YourMT3+ more accurate | | |
| | Omnizart | Multi-instrument, good documentation | Lower accuracy than YourMT3+, slower | Removed in favor of YourMT3+ | | |
| | Tony (pYIN) | Excellent for monophonic | Only monophonic | Need polyphonic support | | |
| | commercial APIs | Better quality | Expensive, privacy concerns | Local processing preferred | | |
| **Decision**: YourMT3+ offers the best accuracy for self-hosted solution with intelligent fallback to basic-pitch for reliability. | |
| --- | |
| ## File Formats | |
| ### Primary Format: MusicXML | |
| **Chosen**: MusicXML 4.0 | |
| **Why**: | |
| - Industry-standard interchange format | |
| - Supported by all major notation software (Finale, Sibelius, MuseScore, Dorico) | |
| - Preserves notation semantics (clefs, articulations, lyrics) | |
| - Human-readable XML (good for debugging) | |
| - VexFlow can parse it directly | |
| **Alternatives Considered**: | |
| | Option | Pros | Cons | Why Not Chosen | | |
| |--------|------|------|----------------| | |
| | MIDI | Universal, compact, great for playback | No notation info (clefs, staff layout) | Complementary, not replacement | | |
| | MEI (Music Encoding Initiative) | More expressive than MusicXML | Less tool support, steeper learning curve | MusicXML more widely adopted | | |
| | ABC Notation | Human-readable text | Limited notation features, less standard | Better for folk music than general use | | |
| | Proprietary (Finale .musx) | Native to notation software | Requires specific tools to read | MusicXML is open standard | | |
| **Decision**: MusicXML is the universal standard for notation exchange. | |
| --- | |
| ### Intermediate Format: MIDI | |
| **Chosen**: MIDI 1.0 (SMF Type 1) | |
| **Why**: | |
| - Universal output format from transcription models | |
| - Easy to convert to MusicXML | |
| - Useful for export option | |
| - Tone.js plays MIDI directly | |
| **Why Not Sufficient Alone**: | |
| - Lacks notation semantics (clefs, key signatures, measure boundaries) | |
| - No staff layout information | |
| - Ambiguous rhythmic notation | |
| --- | |
| ## Development Tools | |
| ### Python Package Manager: uv or Poetry | |
| **Chosen**: uv (recommended) or Poetry | |
| **Why**: | |
| - Reproducible builds with lock files | |
| - Virtual environment management | |
| - Faster than pip for large dependencies (PyTorch, etc.) | |
| --- | |
| ### Frontend Build Tool: Vite | |
| **Chosen**: Vite | |
| **Why**: | |
| - Fast dev server with HMR | |
| - Modern, best-in-class DX | |
| - Great for React apps | |
| - Smaller bundles than Webpack | |
| --- | |
| ### Containerization: Docker | |
| **Chosen**: Docker + Docker Compose | |
| **Why**: | |
| - Consistent dev environment across machines | |
| - Easy GPU passthrough for Demucs | |
| - Simplifies Redis, API, worker orchestration | |
| --- | |
| ## Infrastructure (Future) | |
| ### Frontend Hosting: Vercel | |
| **Recommended**: Vercel | |
| **Why**: | |
| - Excellent React/Vite support | |
| - Global CDN | |
| - Preview deployments for PRs | |
| - Free tier is generous | |
| **Alternative**: Netlify, Cloudflare Pages, AWS S3 + CloudFront | |
| --- | |
| ### Backend Hosting: Cloud Run or Modal | |
| **Recommended**: Modal (for GPU workers) | |
| **Why**: | |
| - Serverless GPU containers | |
| - Pay-per-use (no idle GPU cost) | |
| - Fast cold starts | |
| - Good Python support | |
| **Alternative**: AWS ECS with GPU instances, GCP Cloud Run (CPU only, need separate GPU service) | |
| --- | |
| ### Database: PostgreSQL (future) | |
| **Not needed for MVP** (using Redis for job state) | |
| **When to add**: | |
| - User accounts and auth | |
| - Persistent job history | |
| - Sharing features | |
| --- | |
| ## Decision Criteria Summary | |
| When evaluating technologies, we prioritized: | |
| 1. **Quality Over Speed**: Better transcription/rendering > faster processing | |
| 2. **Open Source First**: Avoid vendor lock-in, control costs | |
| 3. **Python for ML**: Ecosystem too strong to ignore | |
| 4. **Standard Formats**: MusicXML/MIDI over proprietary | |
| 5. **Proven Tech**: Prefer mature libraries over bleeding edge | |
| 6. **Developer Experience**: Good docs and tooling matter | |
| ## Trade-off Examples | |
| ### Demucs vs. Spleeter | |
| - **Chose Demucs**: Better quality worth 2x processing time | |
| - **Rationale**: Users wait minutes anyway, quality is paramount | |
| ### VexFlow vs. OSMD | |
| - **Chose VexFlow**: Editing capability > slightly better rendering | |
| - **Rationale**: Users will edit output, need programmatic access | |
| ### FastAPI vs. Django | |
| - **Chose FastAPI**: Async WebSocket support > admin panel | |
| - **Rationale**: Real-time updates critical, don't need admin UI | |
| ## Next Steps | |
| See [Deployment Strategy](deployment.md) for how these technologies deploy. | |