File size: 8,798 Bytes
c27ae8d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75d3906
c27ae8d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75d3906
 
c27ae8d
75d3906
c27ae8d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75d3906
c27ae8d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e7bf1e6
c27ae8d
 
 
 
 
 
e7bf1e6
 
 
 
 
 
 
 
 
 
 
 
 
c27ae8d
 
 
e7bf1e6
 
 
 
 
 
c27ae8d
 
e7bf1e6
 
 
 
 
 
 
c27ae8d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75d3906
c27ae8d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
# Getting Started with Rescored Documentation

## Welcome

This documentation serves as the complete technical blueprint for building Rescored, an AI-powered music transcription and notation editor. Whether you're implementing the system, evaluating the architecture, or understanding design decisions, you'll find detailed information here.

---

## What is Rescored?

Rescored converts YouTube videos into editable sheet music:

1. User pastes YouTube URL
2. Backend downloads audio, separates instruments, transcribes to MIDI
3. MIDI converted to MusicXML (standard notation format)
4. Frontend renders notation, user can edit, play back, and export

**Target Users**: Musicians who want to learn songs, create arrangements, or transcribe performances.

---

## Documentation Philosophy

This documentation focuses on **high-level architecture and design decisions**, not implementation details like "how to install Python." It assumes:

- You're a developer comfortable with web development (React, Python/FastAPI, or both)
- You understand basic music concepts (notes, measures, clefs)
- You want to understand **why** choices were made, not just **what** to build

---

## How to Navigate This Documentation

### If You're New: Start Here

1. **Read the [README](README.md)** - Get an overview of doc structure
2. **[Architecture Overview](architecture/overview.md)** - Understand the system design
3. **[MVP Scope](features/mvp.md)** - See what to build first
4. **[Technology Stack](architecture/tech-stack.md)** - Understand tech choices

### By Role

#### Backend Engineer

**Your Mission**: Implement the audio processing pipeline (YouTube → MusicXML)

**Reading Order**:
1. [Architecture Overview](architecture/overview.md) - System context
2. [Audio Processing Pipeline](backend/pipeline.md) - Detailed workflow
3. [Background Workers](backend/workers.md) - Celery setup
4. [API Design](backend/api.md) - REST + WebSocket endpoints
5. [ML Model Selection](research/ml-models.md) - Demucs, YourMT3+, basic-pitch
6. [Challenges](research/challenges.md) - Known limitations

**Key Files to Create**:
- `backend/pipeline.py` - Main transcription logic
- `backend/tasks.py` - Celery workers
- `backend/api.py` - FastAPI endpoints
- `docker-compose.yml` - Local dev environment

---

#### Frontend Engineer

**Your Mission**: Build the notation editor and playback interface

**Reading Order**:
1. [Architecture Overview](architecture/overview.md) - System context
2. [Notation Rendering](frontend/notation-rendering.md) - VexFlow integration
3. [Interactive Editor](frontend/editor.md) - Editing operations
4. [Playback System](frontend/playback.md) - Tone.js audio
5. [Data Flow](frontend/data-flow.md) - State management
6. [WebSocket Protocol](integration/websocket-protocol.md) - Real-time updates

**Key Files to Create**:
- `frontend/src/components/NotationCanvas.tsx` - VexFlow rendering
- `frontend/src/components/PlaybackControls.tsx` - Audio playback
- `frontend/src/store/notation.ts` - Zustand store
- `frontend/src/api/client.ts` - API/WebSocket client

---

#### Full-Stack Engineer

**Your Mission**: Build end-to-end from URL to editable notation

**Reading Order**:
1. [Architecture Overview](architecture/overview.md)
2. [MVP Scope](features/mvp.md) - What to build first
3. [Deployment Strategy](architecture/deployment.md) - Local dev setup
4. Backend docs: [Pipeline](backend/pipeline.md), [API](backend/api.md), [Workers](backend/workers.md)
5. Frontend docs: [Rendering](frontend/notation-rendering.md), [Editor](frontend/editor.md), [Playback](frontend/playback.md)
6. Integration: [File Formats](integration/file-formats.md), [WebSocket](integration/websocket-protocol.md)

**Start Here**: Set up local environment with Docker Compose (see [Deployment](architecture/deployment.md))

---

#### Product/Design

**Your Mission**: Understand capabilities, limitations, and user flow

**Reading Order**:
1. [Architecture Overview](architecture/overview.md) - User flow diagram
2. [MVP Scope](features/mvp.md) - Phase 1 features
3. [Challenges](research/challenges.md) - Known limitations to design around
4. [ML Model Selection](research/ml-models.md) - Accuracy expectations

**Key Insights**:
- Transcription is ~80-85% accurate with YourMT3+, ~70% with basic-pitch fallback
- Users **must** edit output - editor is **critical**
- Processing takes 1-2 minutes (GPU) or 10-15 minutes (CPU)
- YourMT3+ optimized for Apple Silicon (MPS) with 14x speedup via float16
- MVP focuses on piano only, multi-instrument in Phase 2

---

## Key Concepts to Understand

### Music Notation Basics

If you're not familiar with music notation:
- **Staff**: 5 horizontal lines where notes are placed
- **Clef**: Symbol indicating pitch range (treble, bass)
- **Measure**: Group of beats separated by bar lines
- **Note Duration**: whole (4 beats), half (2 beats), quarter (1 beat), eighth (0.5 beats)

See [Glossary](glossary.md) for more terms.

---

### Tech Stack Overview

**Frontend**: React + VexFlow (notation) + Tone.js (playback)
**Backend**: Python/FastAPI + Celery (workers) + Redis (queue)
**ML**: Demucs (source separation) + YourMT3+ (primary transcription, 80-85% accuracy) + basic-pitch (fallback, 70% accuracy)
**Formats**: MusicXML (primary), MIDI (intermediate)

---

### Data Flow

```mermaid
graph TB
    URL["YouTube URL"]
    Submit["Submit to API"]
    Backend["Backend creates job,<br/>queues Celery task"]
    Worker["Worker processes:<br/>Download audio → Separate stems<br/>→ Transcribe → MusicXML"]
    Progress["WebSocket updates:<br/>Frontend shows progress bar"]
    Completion["On completion:<br/>Frontend fetches MusicXML,<br/>renders with VexFlow"]
    Edit["User edits:<br/>State updates → Re-render notation"]
    Export["Export:<br/>Download MusicXML or MIDI"]

    URL --> Submit
    Submit --> Backend
    Backend --> Worker
    Worker --> Progress
    Progress --> Completion
    Completion --> Edit
    Edit --> Export
```

---

## Setting Up Local Development

See the [main README](../README.md) for detailed setup instructions. Quick start:

```bash
# Clone repo
git clone https://github.com/yourusername/rescored.git
cd rescored

# Setup backend (Python 3.10)
cd backend
python3.10 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Setup frontend
cd ../frontend
npm install

# Start all services (from project root)
cd ..
./start.sh

# Services:
# - Frontend: http://localhost:5173
# - Backend API: http://localhost:8000
# - API Docs: http://localhost:8000/docs
# - Redis: localhost:6379 (must be running: brew services start redis)

# Stop all services
./stop.sh
```

**Requirements:**
- Python 3.10 (for madmom compatibility)
- Node.js 18+
- Redis 7+
- FFmpeg
- YouTube cookies (see README for setup)

---

## Common Questions

### Q: Why separate docs instead of code comments?

**A**: Architecture and design decisions are best documented separately. Code comments explain "how," docs explain "why" and "what alternatives were considered."

### Q: Do I need to read everything?

**A**: No. Start with your role's reading path above, then dive deeper as needed.

### Q: What if I want to change a tech choice?

**A**: See [Technology Stack](architecture/tech-stack.md) for trade-offs. Each decision documents alternatives and why they weren't chosen.

### Q: How accurate is transcription?

**A**: 80-85% for simple piano with YourMT3+ (70-75% for complex music). Falls back to basic-pitch (70% simple, 60-70% complex) if YourMT3+ unavailable. See [ML Models](research/ml-models.md) and [Challenges](research/challenges.md).

### Q: Can I deploy this to production?

**A**: MVP is designed for local dev. See [Deployment Strategy](architecture/deployment.md) Phase 2 for production deployment (Vercel + Modal + Redis).

### Q: What's the MVP scope?

**A**: Piano-only transcription + basic editing + playback + export. See [MVP Scope](features/mvp.md).

---

## Next Steps

1. Choose your role's reading path above
2. Set up local dev environment (see [Deployment](architecture/deployment.md))
3. Start implementing! Backend pipeline or frontend rendering
4. Test with sample YouTube videos (piano performances)
5. Iterate based on accuracy and UX

---

## Contributing to Docs

As you implement Rescored, please update these docs with:
- Actual code examples (replace placeholders)
- Performance benchmarks (processing time, accuracy metrics)
- Lessons learned and gotchas
- Configuration details

**Keep it concise** - docs should be scannable, not novels.

---

## Need Help?

- **Check [Glossary](glossary.md)** for terminology
- **Review [Challenges](research/challenges.md)** for known issues
- **See [Tech Stack](architecture/tech-stack.md)** for decision context

Good luck building Rescored!