Spaces:

calebhan
/

rescored

Running

App Files Files Community

rescored / docs /architecture /tech-stack.md

calebhan

yourmt3 integration and refactor

75d3906 about 1 month ago

preview code

raw

history blame contribute delete

12.1 kB

	# Technology Stack & Decisions

	## Overview

	This document details the technology choices for Rescored, including alternatives considered and trade-offs that informed each decision.

	## Frontend Technologies

	### UI Framework: React

	Chosen: React 18+

	Why:
	- Largest ecosystem for music-related JavaScript libraries
	- VexFlow and Tone.js have good React integration patterns
	- Component model fits notation editing (each measure/staff as component)
	- Excellent dev tooling (React DevTools, Fast Refresh)
	- Familiarity and hiring pool

	Alternatives Considered:

	\| Option \| Pros \| Cons \| Why Not Chosen \|
	\|--------\|------\|------\|----------------\|
	\| Vue 3 \| Simpler API, lighter weight \| Smaller ecosystem for music libraries \| Less community support for music notation \|
	\| Svelte \| Excellent performance, less boilerplate \| Immature ecosystem \| Risk for complex audio/notation needs \|
	\| Vanilla JS \| Full control, no framework overhead \| Much more code to manage state \| Notation editing is complex, need good state management \|

	Decision: React's ecosystem and component model outweigh its learning curve.

	---

	### Notation Rendering: VexFlow

	Chosen: VexFlow 4.x

	Why:
	- Pure JavaScript, runs entirely in browser
	- Programmatic API for rendering notation (good for editing)
	- Generates clean SVG that we can attach event listeners to
	- Active maintenance, good documentation
	- Used in production by Flat.io, Soundslice

	Alternatives Considered:

	\| Option \| Pros \| Cons \| Why Not Chosen \|
	\|--------\|------\|------\|----------------\|
	\| OpenSheetMusicDisplay (OSMD) \| Better MusicXML support, prettier output \| Harder to build editing on top, heavier bundle \| Optimized for display, not editing \|
	\| music21.js \| Pythonic API, good theory support \| Limited rendering, not designed for web \| Better as backend tool \|
	\| abcjs \| Lightweight, simple syntax \| ABC notation less standard than MusicXML \| MusicXML is industry standard \|
	\| Custom renderer \| Full control \| Months of work to match VexFlow quality \| Not worth reinventing wheel \|

	Decision: VexFlow strikes the best balance between rendering quality and edit-ability.

	---

	### Audio Playback: Tone.js

	Chosen: Tone.js 14+

	Why:
	- High-level abstractions over Web Audio API
	- Built-in scheduling for precise timing
	- Multiple synthesis methods (samples, FM, AM)
	- Transport controls (play, pause, seek, loop)
	- MIDI playback support via `Tone.Sampler`

	Alternatives Considered:

	\| Option \| Pros \| Cons \| Why Not Chosen \|
	\|--------\|------\|------\|----------------\|
	\| Web Audio API (raw) \| Maximum control, no dependencies \| Requires lots of boilerplate \| Too low-level for quick MVP \|
	\| Howler.js \| Simple API, good for sound effects \| Not designed for music, no MIDI \| No timing control for notation sync \|
	\| MIDIjs \| Simple MIDI playback \| Limited synthesis, GM soundfonts \| Lower quality sound than Tone.js samplers \|
	\| SoundFont2.js \| Authentic GM sounds \| Large file sizes, older API \| Tone.js can load SoundFonts if needed \|

	Decision: Tone.js provides the right abstraction level for MIDI playback with good sound quality.

	---

	### State Management: Zustand

	Chosen: Zustand (tentative)

	Why:
	- Minimal boilerplate compared to Redux
	- Works well with React hooks
	- Good for global state (notation data, playback state)
	- Small bundle size (~1KB)

	Alternatives Considered:

	\| Option \| Pros \| Cons \| Why Not Chosen \|
	\|--------\|------\|------\|----------------\|
	\| Redux Toolkit \| Battle-tested, great DevTools \| More boilerplate, steeper learning curve \| Overkill for MVP \|
	\| React Context \| Built-in, no deps \| Performance issues with frequent updates \| Notation editing has lots of updates \|
	\| Jotai/Recoil \| Atomic state, very modern \| Newer, smaller ecosystem \| Zustand more proven \|
	\| Local state only \| Simplest \| Hard to share state across components \| Need global notation state \|

	Decision: Zustand for MVP, can migrate to Redux if needed later.

	---

	## Backend Technologies

	### API Framework: FastAPI

	Chosen: FastAPI (Python 3.11+)

	Why:
	- Async Python (critical for WebSocket connections)
	- Auto-generated OpenAPI docs (Swagger UI)
	- Native WebSocket support
	- Type hints for better code quality
	- Integrates well with Python ML libraries (Demucs, basic-pitch)
	- Excellent performance (on par with Node.js)

	Alternatives Considered:

	\| Option \| Pros \| Cons \| Why Not Chosen \|
	\|--------\|------\|------\|----------------\|
	\| Node.js (Express) \| Async by default, JavaScript everywhere \| Worse ML library support \| ML models are Python-first \|
	\| Flask \| Simple, well-known \| No async support, manual WebSocket setup \| FastAPI is modern Flask \|
	\| Django \| Full-featured, admin panel \| Heavy, slower, less async support \| Overkill for API-only service \|
	\| Go (Gin/Fiber) \| Excellent performance \| Weaker ML ecosystem, FFI overhead \| Python has better audio/ML tools \|

	Decision: FastAPI combines async support with Python's ML ecosystem.

	---

	### Task Queue: Celery + Redis

	Chosen: Celery 5.x with Redis as broker

	Why:
	- Industry standard for async Python tasks
	- Reliable, battle-tested in production
	- Priority queues (transcription vs. export jobs)
	- Automatic retries and error handling
	- Redis is fast, simple, good for both queue and caching

	Alternatives Considered:

	\| Option \| Pros \| Cons \| Why Not Chosen \|
	\|--------\|------\|------\|----------------\|
	\| RQ (Redis Queue) \| Simpler API than Celery \| Fewer features, less ecosystem \| Need advanced features (priorities, chaining) \|
	\| Dramatiq \| Modern, better API than Celery \| Smaller community, less mature \| Celery's ecosystem worth the complexity \|
	\| BullMQ (Node) \| Excellent, modern \| Requires Node backend \| Using Python for ML libraries \|
	\| Cloud tasks (GCP/AWS) \| Managed service, no infrastructure \| Vendor lock-in, cold starts \| Local dev first \|

	Decision: Celery's maturity and feature set justify the learning curve.

	---

	## ML/Audio Technologies

	### Source Separation: Demucs

	Chosen: Demucs v4 (Meta Research)

	Why:
	- State-of-the-art audio separation quality (MDX leaderboard winner)
	- 4-stem model (drums, bass, vocals, other) is good default
	- 6-stem model available (drums, bass, vocals, guitar, piano, other)
	- Open-source, MIT license
	- PyTorch model, runs on GPU

	Alternatives Considered:

	\| Option \| Pros \| Cons \| Why Not Chosen \|
	\|--------\|------\|------\|----------------\|
	\| Spleeter \| Faster, lighter \| Lower quality, no longer actively developed \| Quality matters more than speed \|
	\| X-UMX \| Open-source, good quality \| Slower than Demucs \| Demucs quality worth extra time \|
	\| commercial APIs \| No GPU needed, better quality \| Costly ($0.10+/song), privacy concerns \| Local processing preferred for MVP \|

	Decision: Demucs offers best quality for a self-hosted solution.

	---

	### Transcription: YourMT3+ (Primary) + basic-pitch (Fallback)

	Chosen: YourMT3+ (KAIST) with automatic fallback to basic-pitch (Spotify)

	Why YourMT3+:
	- 80-85% accuracy vs 70% for basic-pitch
	- State-of-the-art multi-instrument transcription model
	- Mixture of Experts architecture for better quality
	- Perceiver-TF encoder with RoPE position encoding
	- Trained on diverse datasets (30k+ songs, 13 instrument classes)
	- Open-source, actively maintained
	- Optimized for Apple Silicon (MPS) with float16 precision (14x speedup)

	Why basic-pitch as Fallback:
	- Polyphonic transcription (multiple notes at once)
	- Lighter weight, faster inference
	- Simple setup, no model download required
	- Good baseline quality (70% accuracy)
	- Automatically used if YourMT3+ unavailable

	Alternatives Considered:

	\| Option \| Pros \| Cons \| Why Not Chosen \|
	\|--------\|------\|------\|----------------\|
	\| MT3 (Music Transformer) \| Google's latest, multi-instrument aware \| Slower, larger model, harder to run \| YourMT3+ more accurate \|
	\| Omnizart \| Multi-instrument, good documentation \| Lower accuracy than YourMT3+, slower \| Removed in favor of YourMT3+ \|
	\| Tony (pYIN) \| Excellent for monophonic \| Only monophonic \| Need polyphonic support \|
	\| commercial APIs \| Better quality \| Expensive, privacy concerns \| Local processing preferred \|

	Decision: YourMT3+ offers the best accuracy for self-hosted solution with intelligent fallback to basic-pitch for reliability.

	---

	## File Formats

	### Primary Format: MusicXML

	Chosen: MusicXML 4.0

	Why:
	- Industry-standard interchange format
	- Supported by all major notation software (Finale, Sibelius, MuseScore, Dorico)
	- Preserves notation semantics (clefs, articulations, lyrics)
	- Human-readable XML (good for debugging)
	- VexFlow can parse it directly

	Alternatives Considered:

	\| Option \| Pros \| Cons \| Why Not Chosen \|
	\|--------\|------\|------\|----------------\|
	\| MIDI \| Universal, compact, great for playback \| No notation info (clefs, staff layout) \| Complementary, not replacement \|
	\| MEI (Music Encoding Initiative) \| More expressive than MusicXML \| Less tool support, steeper learning curve \| MusicXML more widely adopted \|
	\| ABC Notation \| Human-readable text \| Limited notation features, less standard \| Better for folk music than general use \|
	\| Proprietary (Finale .musx) \| Native to notation software \| Requires specific tools to read \| MusicXML is open standard \|

	Decision: MusicXML is the universal standard for notation exchange.

	---

	### Intermediate Format: MIDI

	Chosen: MIDI 1.0 (SMF Type 1)

	Why:
	- Universal output format from transcription models
	- Easy to convert to MusicXML
	- Useful for export option
	- Tone.js plays MIDI directly

	Why Not Sufficient Alone:
	- Lacks notation semantics (clefs, key signatures, measure boundaries)
	- No staff layout information
	- Ambiguous rhythmic notation

	---

	## Development Tools

	### Python Package Manager: uv or Poetry

	Chosen: uv (recommended) or Poetry

	Why:
	- Reproducible builds with lock files
	- Virtual environment management
	- Faster than pip for large dependencies (PyTorch, etc.)

	---

	### Frontend Build Tool: Vite

	Chosen: Vite

	Why:
	- Fast dev server with HMR
	- Modern, best-in-class DX
	- Great for React apps
	- Smaller bundles than Webpack

	---

	### Containerization: Docker

	Chosen: Docker + Docker Compose

	Why:
	- Consistent dev environment across machines
	- Easy GPU passthrough for Demucs
	- Simplifies Redis, API, worker orchestration

	---

	## Infrastructure (Future)

	### Frontend Hosting: Vercel

	Recommended: Vercel

	Why:
	- Excellent React/Vite support
	- Global CDN
	- Preview deployments for PRs
	- Free tier is generous

	Alternative: Netlify, Cloudflare Pages, AWS S3 + CloudFront

	---

	### Backend Hosting: Cloud Run or Modal

	Recommended: Modal (for GPU workers)

	Why:
	- Serverless GPU containers
	- Pay-per-use (no idle GPU cost)
	- Fast cold starts
	- Good Python support

	Alternative: AWS ECS with GPU instances, GCP Cloud Run (CPU only, need separate GPU service)

	---

	### Database: PostgreSQL (future)

	Not needed for MVP (using Redis for job state)

	When to add:
	- User accounts and auth
	- Persistent job history
	- Sharing features

	---

	## Decision Criteria Summary

	When evaluating technologies, we prioritized:

	1. Quality Over Speed: Better transcription/rendering > faster processing
	2. Open Source First: Avoid vendor lock-in, control costs
	3. Python for ML: Ecosystem too strong to ignore
	4. Standard Formats: MusicXML/MIDI over proprietary
	5. Proven Tech: Prefer mature libraries over bleeding edge
	6. Developer Experience: Good docs and tooling matter

	## Trade-off Examples

	### Demucs vs. Spleeter
	- Chose Demucs: Better quality worth 2x processing time
	- Rationale: Users wait minutes anyway, quality is paramount

	### VexFlow vs. OSMD
	- Chose VexFlow: Editing capability > slightly better rendering
	- Rationale: Users will edit output, need programmatic access

	### FastAPI vs. Django
	- Chose FastAPI: Async WebSocket support > admin panel
	- Rationale: Real-time updates critical, don't need admin UI

	## Next Steps

	See [Deployment Strategy](deployment.md) for how these technologies deploy.