Technology Stack & Decisions
Overview
This document details the technology choices for Rescored, including alternatives considered and trade-offs that informed each decision.
Frontend Technologies
UI Framework: React
Chosen: React 18+
Why:
- Largest ecosystem for music-related JavaScript libraries
- VexFlow and Tone.js have good React integration patterns
- Component model fits notation editing (each measure/staff as component)
- Excellent dev tooling (React DevTools, Fast Refresh)
- Familiarity and hiring pool
Alternatives Considered:
| Option | Pros | Cons | Why Not Chosen |
|---|---|---|---|
| Vue 3 | Simpler API, lighter weight | Smaller ecosystem for music libraries | Less community support for music notation |
| Svelte | Excellent performance, less boilerplate | Immature ecosystem | Risk for complex audio/notation needs |
| Vanilla JS | Full control, no framework overhead | Much more code to manage state | Notation editing is complex, need good state management |
Decision: React's ecosystem and component model outweigh its learning curve.
Notation Rendering: VexFlow
Chosen: VexFlow 4.x
Why:
- Pure JavaScript, runs entirely in browser
- Programmatic API for rendering notation (good for editing)
- Generates clean SVG that we can attach event listeners to
- Active maintenance, good documentation
- Used in production by Flat.io, Soundslice
Alternatives Considered:
| Option | Pros | Cons | Why Not Chosen |
|---|---|---|---|
| OpenSheetMusicDisplay (OSMD) | Better MusicXML support, prettier output | Harder to build editing on top, heavier bundle | Optimized for display, not editing |
| music21.js | Pythonic API, good theory support | Limited rendering, not designed for web | Better as backend tool |
| abcjs | Lightweight, simple syntax | ABC notation less standard than MusicXML | MusicXML is industry standard |
| Custom renderer | Full control | Months of work to match VexFlow quality | Not worth reinventing wheel |
Decision: VexFlow strikes the best balance between rendering quality and edit-ability.
Audio Playback: Tone.js
Chosen: Tone.js 14+
Why:
- High-level abstractions over Web Audio API
- Built-in scheduling for precise timing
- Multiple synthesis methods (samples, FM, AM)
- Transport controls (play, pause, seek, loop)
- MIDI playback support via
Tone.Sampler
Alternatives Considered:
| Option | Pros | Cons | Why Not Chosen |
|---|---|---|---|
| Web Audio API (raw) | Maximum control, no dependencies | Requires lots of boilerplate | Too low-level for quick MVP |
| Howler.js | Simple API, good for sound effects | Not designed for music, no MIDI | No timing control for notation sync |
| MIDIjs | Simple MIDI playback | Limited synthesis, GM soundfonts | Lower quality sound than Tone.js samplers |
| SoundFont2.js | Authentic GM sounds | Large file sizes, older API | Tone.js can load SoundFonts if needed |
Decision: Tone.js provides the right abstraction level for MIDI playback with good sound quality.
State Management: Zustand
Chosen: Zustand (tentative)
Why:
- Minimal boilerplate compared to Redux
- Works well with React hooks
- Good for global state (notation data, playback state)
- Small bundle size (~1KB)
Alternatives Considered:
| Option | Pros | Cons | Why Not Chosen |
|---|---|---|---|
| Redux Toolkit | Battle-tested, great DevTools | More boilerplate, steeper learning curve | Overkill for MVP |
| React Context | Built-in, no deps | Performance issues with frequent updates | Notation editing has lots of updates |
| Jotai/Recoil | Atomic state, very modern | Newer, smaller ecosystem | Zustand more proven |
| Local state only | Simplest | Hard to share state across components | Need global notation state |
Decision: Zustand for MVP, can migrate to Redux if needed later.
Backend Technologies
API Framework: FastAPI
Chosen: FastAPI (Python 3.11+)
Why:
- Async Python (critical for WebSocket connections)
- Auto-generated OpenAPI docs (Swagger UI)
- Native WebSocket support
- Type hints for better code quality
- Integrates well with Python ML libraries (Demucs, basic-pitch)
- Excellent performance (on par with Node.js)
Alternatives Considered:
| Option | Pros | Cons | Why Not Chosen |
|---|---|---|---|
| Node.js (Express) | Async by default, JavaScript everywhere | Worse ML library support | ML models are Python-first |
| Flask | Simple, well-known | No async support, manual WebSocket setup | FastAPI is modern Flask |
| Django | Full-featured, admin panel | Heavy, slower, less async support | Overkill for API-only service |
| Go (Gin/Fiber) | Excellent performance | Weaker ML ecosystem, FFI overhead | Python has better audio/ML tools |
Decision: FastAPI combines async support with Python's ML ecosystem.
Task Queue: Celery + Redis
Chosen: Celery 5.x with Redis as broker
Why:
- Industry standard for async Python tasks
- Reliable, battle-tested in production
- Priority queues (transcription vs. export jobs)
- Automatic retries and error handling
- Redis is fast, simple, good for both queue and caching
Alternatives Considered:
| Option | Pros | Cons | Why Not Chosen |
|---|---|---|---|
| RQ (Redis Queue) | Simpler API than Celery | Fewer features, less ecosystem | Need advanced features (priorities, chaining) |
| Dramatiq | Modern, better API than Celery | Smaller community, less mature | Celery's ecosystem worth the complexity |
| BullMQ (Node) | Excellent, modern | Requires Node backend | Using Python for ML libraries |
| Cloud tasks (GCP/AWS) | Managed service, no infrastructure | Vendor lock-in, cold starts | Local dev first |
Decision: Celery's maturity and feature set justify the learning curve.
ML/Audio Technologies
Source Separation: Demucs
Chosen: Demucs v4 (Meta Research)
Why:
- State-of-the-art audio separation quality (MDX leaderboard winner)
- 4-stem model (drums, bass, vocals, other) is good default
- 6-stem model available (drums, bass, vocals, guitar, piano, other)
- Open-source, MIT license
- PyTorch model, runs on GPU
Alternatives Considered:
| Option | Pros | Cons | Why Not Chosen |
|---|---|---|---|
| Spleeter | Faster, lighter | Lower quality, no longer actively developed | Quality matters more than speed |
| X-UMX | Open-source, good quality | Slower than Demucs | Demucs quality worth extra time |
| commercial APIs | No GPU needed, better quality | Costly ($0.10+/song), privacy concerns | Local processing preferred for MVP |
Decision: Demucs offers best quality for a self-hosted solution.
Transcription: YourMT3+ (Primary) + basic-pitch (Fallback)
Chosen: YourMT3+ (KAIST) with automatic fallback to basic-pitch (Spotify)
Why YourMT3+:
- 80-85% accuracy vs 70% for basic-pitch
- State-of-the-art multi-instrument transcription model
- Mixture of Experts architecture for better quality
- Perceiver-TF encoder with RoPE position encoding
- Trained on diverse datasets (30k+ songs, 13 instrument classes)
- Open-source, actively maintained
- Optimized for Apple Silicon (MPS) with float16 precision (14x speedup)
Why basic-pitch as Fallback:
- Polyphonic transcription (multiple notes at once)
- Lighter weight, faster inference
- Simple setup, no model download required
- Good baseline quality (70% accuracy)
- Automatically used if YourMT3+ unavailable
Alternatives Considered:
| Option | Pros | Cons | Why Not Chosen |
|---|---|---|---|
| MT3 (Music Transformer) | Google's latest, multi-instrument aware | Slower, larger model, harder to run | YourMT3+ more accurate |
| Omnizart | Multi-instrument, good documentation | Lower accuracy than YourMT3+, slower | Removed in favor of YourMT3+ |
| Tony (pYIN) | Excellent for monophonic | Only monophonic | Need polyphonic support |
| commercial APIs | Better quality | Expensive, privacy concerns | Local processing preferred |
Decision: YourMT3+ offers the best accuracy for self-hosted solution with intelligent fallback to basic-pitch for reliability.
File Formats
Primary Format: MusicXML
Chosen: MusicXML 4.0
Why:
- Industry-standard interchange format
- Supported by all major notation software (Finale, Sibelius, MuseScore, Dorico)
- Preserves notation semantics (clefs, articulations, lyrics)
- Human-readable XML (good for debugging)
- VexFlow can parse it directly
Alternatives Considered:
| Option | Pros | Cons | Why Not Chosen |
|---|---|---|---|
| MIDI | Universal, compact, great for playback | No notation info (clefs, staff layout) | Complementary, not replacement |
| MEI (Music Encoding Initiative) | More expressive than MusicXML | Less tool support, steeper learning curve | MusicXML more widely adopted |
| ABC Notation | Human-readable text | Limited notation features, less standard | Better for folk music than general use |
| Proprietary (Finale .musx) | Native to notation software | Requires specific tools to read | MusicXML is open standard |
Decision: MusicXML is the universal standard for notation exchange.
Intermediate Format: MIDI
Chosen: MIDI 1.0 (SMF Type 1)
Why:
- Universal output format from transcription models
- Easy to convert to MusicXML
- Useful for export option
- Tone.js plays MIDI directly
Why Not Sufficient Alone:
- Lacks notation semantics (clefs, key signatures, measure boundaries)
- No staff layout information
- Ambiguous rhythmic notation
Development Tools
Python Package Manager: uv or Poetry
Chosen: uv (recommended) or Poetry
Why:
- Reproducible builds with lock files
- Virtual environment management
- Faster than pip for large dependencies (PyTorch, etc.)
Frontend Build Tool: Vite
Chosen: Vite
Why:
- Fast dev server with HMR
- Modern, best-in-class DX
- Great for React apps
- Smaller bundles than Webpack
Containerization: Docker
Chosen: Docker + Docker Compose
Why:
- Consistent dev environment across machines
- Easy GPU passthrough for Demucs
- Simplifies Redis, API, worker orchestration
Infrastructure (Future)
Frontend Hosting: Vercel
Recommended: Vercel
Why:
- Excellent React/Vite support
- Global CDN
- Preview deployments for PRs
- Free tier is generous
Alternative: Netlify, Cloudflare Pages, AWS S3 + CloudFront
Backend Hosting: Cloud Run or Modal
Recommended: Modal (for GPU workers)
Why:
- Serverless GPU containers
- Pay-per-use (no idle GPU cost)
- Fast cold starts
- Good Python support
Alternative: AWS ECS with GPU instances, GCP Cloud Run (CPU only, need separate GPU service)
Database: PostgreSQL (future)
Not needed for MVP (using Redis for job state)
When to add:
- User accounts and auth
- Persistent job history
- Sharing features
Decision Criteria Summary
When evaluating technologies, we prioritized:
- Quality Over Speed: Better transcription/rendering > faster processing
- Open Source First: Avoid vendor lock-in, control costs
- Python for ML: Ecosystem too strong to ignore
- Standard Formats: MusicXML/MIDI over proprietary
- Proven Tech: Prefer mature libraries over bleeding edge
- Developer Experience: Good docs and tooling matter
Trade-off Examples
Demucs vs. Spleeter
- Chose Demucs: Better quality worth 2x processing time
- Rationale: Users wait minutes anyway, quality is paramount
VexFlow vs. OSMD
- Chose VexFlow: Editing capability > slightly better rendering
- Rationale: Users will edit output, need programmatic access
FastAPI vs. Django
- Chose FastAPI: Async WebSocket support > admin panel
- Rationale: Real-time updates critical, don't need admin UI
Next Steps
See Deployment Strategy for how these technologies deploy.