Spaces:

calebhan
/

rescored

Running

Option	Pros	Cons	Why Not Chosen
Vue 3	Simpler API, lighter weight	Smaller ecosystem for music libraries	Less community support for music notation
Svelte	Excellent performance, less boilerplate	Immature ecosystem	Risk for complex audio/notation needs
Vanilla JS	Full control, no framework overhead	Much more code to manage state	Notation editing is complex, need good state management

Decision: React's ecosystem and component model outweigh its learning curve.

Notation Rendering: VexFlow

Chosen: VexFlow 4.x

Why:

Pure JavaScript, runs entirely in browser
Programmatic API for rendering notation (good for editing)
Generates clean SVG that we can attach event listeners to
Active maintenance, good documentation
Used in production by Flat.io, Soundslice

Alternatives Considered:

Option	Pros	Cons	Why Not Chosen
OpenSheetMusicDisplay (OSMD)	Better MusicXML support, prettier output	Harder to build editing on top, heavier bundle	Optimized for display, not editing
music21.js	Pythonic API, good theory support	Limited rendering, not designed for web	Better as backend tool
abcjs	Lightweight, simple syntax	ABC notation less standard than MusicXML	MusicXML is industry standard
Custom renderer	Full control	Months of work to match VexFlow quality	Not worth reinventing wheel

Decision: VexFlow strikes the best balance between rendering quality and edit-ability.

Audio Playback: Tone.js

Chosen: Tone.js 14+

Why:

High-level abstractions over Web Audio API
Built-in scheduling for precise timing
Multiple synthesis methods (samples, FM, AM)
Transport controls (play, pause, seek, loop)
MIDI playback support via Tone.Sampler

Alternatives Considered:

Option	Pros	Cons	Why Not Chosen
Web Audio API (raw)	Maximum control, no dependencies	Requires lots of boilerplate	Too low-level for quick MVP
Howler.js	Simple API, good for sound effects	Not designed for music, no MIDI	No timing control for notation sync
MIDIjs	Simple MIDI playback	Limited synthesis, GM soundfonts	Lower quality sound than Tone.js samplers
SoundFont2.js	Authentic GM sounds	Large file sizes, older API	Tone.js can load SoundFonts if needed

Decision: Tone.js provides the right abstraction level for MIDI playback with good sound quality.

State Management: Zustand

Chosen: Zustand (tentative)

Why:

Minimal boilerplate compared to Redux
Works well with React hooks
Good for global state (notation data, playback state)
Small bundle size (~1KB)

Alternatives Considered:

Option	Pros	Cons	Why Not Chosen
Redux Toolkit	Battle-tested, great DevTools	More boilerplate, steeper learning curve	Overkill for MVP
React Context	Built-in, no deps	Performance issues with frequent updates	Notation editing has lots of updates
Jotai/Recoil	Atomic state, very modern	Newer, smaller ecosystem	Zustand more proven
Local state only	Simplest	Hard to share state across components	Need global notation state

Decision: Zustand for MVP, can migrate to Redux if needed later.

Backend Technologies

API Framework: FastAPI

Chosen: FastAPI (Python 3.11+)

Why:

Async Python (critical for WebSocket connections)
Auto-generated OpenAPI docs (Swagger UI)
Native WebSocket support
Type hints for better code quality
Integrates well with Python ML libraries (Demucs, basic-pitch)
Excellent performance (on par with Node.js)

Alternatives Considered:

Option	Pros	Cons	Why Not Chosen
Node.js (Express)	Async by default, JavaScript everywhere	Worse ML library support	ML models are Python-first
Flask	Simple, well-known	No async support, manual WebSocket setup	FastAPI is modern Flask
Django	Full-featured, admin panel	Heavy, slower, less async support	Overkill for API-only service
Go (Gin/Fiber)	Excellent performance	Weaker ML ecosystem, FFI overhead	Python has better audio/ML tools

Decision: FastAPI combines async support with Python's ML ecosystem.

Task Queue: Celery + Redis

Chosen: Celery 5.x with Redis as broker

Why:

Industry standard for async Python tasks
Reliable, battle-tested in production
Priority queues (transcription vs. export jobs)
Automatic retries and error handling
Redis is fast, simple, good for both queue and caching

Alternatives Considered:

Option	Pros	Cons	Why Not Chosen
RQ (Redis Queue)	Simpler API than Celery	Fewer features, less ecosystem	Need advanced features (priorities, chaining)
Dramatiq	Modern, better API than Celery	Smaller community, less mature	Celery's ecosystem worth the complexity
BullMQ (Node)	Excellent, modern	Requires Node backend	Using Python for ML libraries
Cloud tasks (GCP/AWS)	Managed service, no infrastructure	Vendor lock-in, cold starts	Local dev first

Decision: Celery's maturity and feature set justify the learning curve.

ML/Audio Technologies

Source Separation: Demucs

Chosen: Demucs v4 (Meta Research)

Why:

State-of-the-art audio separation quality (MDX leaderboard winner)
4-stem model (drums, bass, vocals, other) is good default
6-stem model available (drums, bass, vocals, guitar, piano, other)
Open-source, MIT license
PyTorch model, runs on GPU

Alternatives Considered:

Option	Pros	Cons	Why Not Chosen
Spleeter	Faster, lighter	Lower quality, no longer actively developed	Quality matters more than speed
X-UMX	Open-source, good quality	Slower than Demucs	Demucs quality worth extra time
commercial APIs	No GPU needed, better quality	Costly ($0.10+/song), privacy concerns	Local processing preferred for MVP

Decision: Demucs offers best quality for a self-hosted solution.

Transcription: YourMT3+ (Primary) + basic-pitch (Fallback)

Chosen: YourMT3+ (KAIST) with automatic fallback to basic-pitch (Spotify)

Why YourMT3+:

80-85% accuracy vs 70% for basic-pitch
State-of-the-art multi-instrument transcription model
Mixture of Experts architecture for better quality
Perceiver-TF encoder with RoPE position encoding
Trained on diverse datasets (30k+ songs, 13 instrument classes)
Open-source, actively maintained
Optimized for Apple Silicon (MPS) with float16 precision (14x speedup)

Why basic-pitch as Fallback:

Polyphonic transcription (multiple notes at once)
Lighter weight, faster inference
Simple setup, no model download required
Good baseline quality (70% accuracy)
Automatically used if YourMT3+ unavailable

Alternatives Considered:

Option	Pros	Cons	Why Not Chosen
MT3 (Music Transformer)	Google's latest, multi-instrument aware	Slower, larger model, harder to run	YourMT3+ more accurate
Omnizart	Multi-instrument, good documentation	Lower accuracy than YourMT3+, slower	Removed in favor of YourMT3+
Tony (pYIN)	Excellent for monophonic	Only monophonic	Need polyphonic support
commercial APIs	Better quality	Expensive, privacy concerns	Local processing preferred

Decision: YourMT3+ offers the best accuracy for self-hosted solution with intelligent fallback to basic-pitch for reliability.

File Formats

Primary Format: MusicXML

Chosen: MusicXML 4.0

Why:

Industry-standard interchange format
Supported by all major notation software (Finale, Sibelius, MuseScore, Dorico)
Preserves notation semantics (clefs, articulations, lyrics)
Human-readable XML (good for debugging)
VexFlow can parse it directly

Alternatives Considered:

Option	Pros	Cons	Why Not Chosen
MIDI	Universal, compact, great for playback	No notation info (clefs, staff layout)	Complementary, not replacement
MEI (Music Encoding Initiative)	More expressive than MusicXML	Less tool support, steeper learning curve	MusicXML more widely adopted
ABC Notation	Human-readable text	Limited notation features, less standard	Better for folk music than general use
Proprietary (Finale .musx)	Native to notation software	Requires specific tools to read	MusicXML is open standard

Decision: MusicXML is the universal standard for notation exchange.

Intermediate Format: MIDI

Chosen: MIDI 1.0 (SMF Type 1)

Why:

Universal output format from transcription models
Easy to convert to MusicXML
Useful for export option
Tone.js plays MIDI directly

Why Not Sufficient Alone:

Lacks notation semantics (clefs, key signatures, measure boundaries)
No staff layout information
Ambiguous rhythmic notation

Development Tools

Python Package Manager: uv or Poetry

Chosen: uv (recommended) or Poetry

Why:

Reproducible builds with lock files
Virtual environment management
Faster than pip for large dependencies (PyTorch, etc.)

Frontend Build Tool: Vite

Chosen: Vite

Why:

Fast dev server with HMR
Modern, best-in-class DX
Great for React apps
Smaller bundles than Webpack

Containerization: Docker

Chosen: Docker + Docker Compose

Why:

Consistent dev environment across machines
Easy GPU passthrough for Demucs
Simplifies Redis, API, worker orchestration

Infrastructure (Future)

Frontend Hosting: Vercel

Recommended: Vercel

Why:

Excellent React/Vite support
Global CDN
Preview deployments for PRs
Free tier is generous

Alternative: Netlify, Cloudflare Pages, AWS S3 + CloudFront

Backend Hosting: Cloud Run or Modal

Recommended: Modal (for GPU workers)

Why:

Serverless GPU containers
Pay-per-use (no idle GPU cost)
Fast cold starts
Good Python support

Alternative: AWS ECS with GPU instances, GCP Cloud Run (CPU only, need separate GPU service)

Database: PostgreSQL (future)

Not needed for MVP (using Redis for job state)

When to add:

User accounts and auth
Persistent job history
Sharing features

Decision Criteria Summary

When evaluating technologies, we prioritized:

Quality Over Speed: Better transcription/rendering > faster processing
Open Source First: Avoid vendor lock-in, control costs
Python for ML: Ecosystem too strong to ignore
Standard Formats: MusicXML/MIDI over proprietary
Proven Tech: Prefer mature libraries over bleeding edge
Developer Experience: Good docs and tooling matter

Trade-off Examples

Demucs vs. Spleeter

Chose Demucs: Better quality worth 2x processing time
Rationale: Users wait minutes anyway, quality is paramount

VexFlow vs. OSMD

Chose VexFlow: Editing capability > slightly better rendering
Rationale: Users will edit output, need programmatic access

FastAPI vs. Django

Chose FastAPI: Async WebSocket support > admin panel
Rationale: Real-time updates critical, don't need admin UI

Next Steps

See Deployment Strategy for how these technologies deploy.