Spaces:

calebhan
/

rescored

Sleeping

App Files Files Community

calebhan commited on Dec 28, 2025

Commit

75d3906

1 Parent(s): ac5c764

yourmt3 integration and refactor

Browse files

Files changed (42) hide show

.gitignore +13 -0
README.md +170 -63
backend/.dockerignore +49 -0
backend/.env.example +5 -1
backend/Dockerfile +5 -0
backend/Dockerfile.worker +5 -0
backend/{config.py → app_config.py} +9 -4
backend/{utils.py → app_utils.py} +0 -0
backend/celery_app.py +1 -1
backend/main.py +129 -3
backend/pipeline.py +311 -123
backend/requirements.txt +10 -3
backend/scripts/diagnose_pipeline.py +1 -1
backend/scripts/test_accuracy.py +1 -1
backend/scripts/test_e2e.py +1 -1
backend/scripts/test_mps_performance.sh +55 -0
backend/scripts/test_quick_verify.py +1 -1
backend/tasks.py +6 -4
backend/tests/test_pipeline_fixes.py +1 -1
backend/tests/test_pipeline_monophonic.py +1 -1
backend/tests/test_utils.py +1 -1
backend/tests/test_yourmt3_integration.py +296 -0
backend/ymt/amt +1 -0
backend/ymt/yourmt3_core +1 -0
backend/yourmt3_wrapper.py +211 -0
docker-compose.yml +1 -0
docs/architecture/tech-stack.md +20 -11
docs/backend/pipeline.md +114 -62
docs/getting-started.md +6 -5
docs/research/ml-models.md +87 -66
docs/testing/backend-testing.md +0 -520
docs/testing/baseline-accuracy.md +0 -178
docs/testing/failure-modes.md +0 -216
docs/testing/frontend-testing.md +0 -653
docs/testing/overview.md +0 -315
docs/testing/test-videos.md +0 -371
frontend/Dockerfile +1 -1
frontend/package.json +1 -1
frontend/src/components/JobSubmission.css +25 -4
frontend/src/components/JobSubmission.tsx +112 -8
start.sh +139 -0
stop.sh +18 -0

.gitignore CHANGED Viewed

@@ -224,6 +224,11 @@ backend/*.musicxml
 backend/*.mid
 backend/*.wav
 # Frontend
 frontend/node_modules/
 frontend/dist/
@@ -242,6 +247,14 @@ storage/temp/*
 # Temp files
 /tmp/
 *.tmp
 # Docker volumes
 docker-compose.override.yml

 backend/*.mid
 backend/*.wav
+# YourMT3+ temporary files
+backend/ymt/model_output/
+backend/ymt/*.mid
+backend/ymt/*.log
 # Frontend
 frontend/node_modules/
 frontend/dist/
 # Temp files
 /tmp/
 *.tmp
+*.temp
+# Logs
+logs/
+*.log
+# macOS
+.DS_Store
 # Docker volumes
 docker-compose.override.yml

README.md CHANGED Viewed

@@ -13,20 +13,20 @@ Rescored transcribes YouTube videos to professional-quality music notation:
 **Tech Stack**:
 - **Backend**: Python/FastAPI + Celery + Redis
 - **Frontend**: React + VexFlow (notation) + Tone.js (playback)
-- **ML**: Demucs (source separation) + basic-pitch (transcription)
 ## Quick Start
 ### Prerequisites
-- **Docker Desktop** (recommended) OR:
-  - Python 3.11+
-  - Node.js 18+
-  - Redis 7+
-  - FFmpeg
-  - (Optional) NVIDIA GPU with CUDA for faster processing
-### Option 1: Docker Compose (Recommended)
 ```bash
 # Clone repository
@@ -34,7 +34,49 @@ git clone https://github.com/yourusername/rescored.git
 cd rescored
 ```
-#### ⚠️ REQUIRED: YouTube Cookies Setup
 YouTube requires authentication for video downloads (as of December 2024). You **MUST** export your YouTube cookies before the application will work.
@@ -53,29 +95,65 @@ YouTube requires authentication for video downloads (as of December 2024). You *
 3. **Place Cookie File**
    ```bash
-   # Create storage directory
    mkdir -p storage
    # Move the exported file (adjust path if needed)
    mv ~/Downloads/youtube.com_cookies.txt ./storage/youtube_cookies.txt
-   # OR on Windows:
-   # move %USERPROFILE%\Downloads\youtube.com_cookies.txt storage\youtube_cookies.txt
    ```
 4. **Start Services**
    ```bash
-   docker-compose up
-   # Services will be available at:
-   # - Frontend: http://localhost:5173
-   # - Backend API: http://localhost:8000
-   # - API Docs: http://localhost:8000/docs
    ```
 **Verification:**
 ```bash
-docker-compose exec worker ls -lh /app/storage/youtube_cookies.txt
 ```
 You should see the file listed.
@@ -91,50 +169,52 @@ You should see the file listed.
 **Why Is This Required?** YouTube implemented bot detection in late 2024 that blocks unauthenticated downloads. Even though our tool is for legitimate transcription purposes, YouTube's systems can't distinguish it from scrapers. By providing your cookies, you're proving you're a real user who has agreed to YouTube's terms of service.
-### Option 2: Manual Setup
-**Backend**:
-```bash
-cd backend
-# Create virtual environment
-python3 -m venv venv
-source venv/bin/activate  # On Windows: venv\Scripts\activate
-# Install dependencies
-pip install -r requirements.txt
-# Copy environment file
-cp .env.example .env
-# Start Redis (in separate terminal)
-redis-server
-# Start Celery worker (in separate terminal)
-celery -A tasks worker --loglevel=info
-# Start API server
-python main.py
 ```
-**Frontend**:
-```bash
-cd frontend
-# Install dependencies
-npm install
-# Start dev server
-npm run dev
-```
 ## Usage
-1. Open [http://localhost:5173](http://localhost:5173)
-2. Paste a YouTube URL (piano music recommended for best results)
-3. Wait 1-2 minutes for transcription (with GPU) or 10-15 minutes (CPU)
-4. Edit the notation in the interactive editor
-5. Export as MusicXML or MIDI
 ## MVP Features
@@ -186,7 +266,13 @@ Comprehensive documentation is available in the [`docs/`](docs/) directory:
 ## Performance
-**With GPU (RTX 3080)**:
 - Download: ~10 seconds
 - Source separation: ~45 seconds
 - Transcription: ~5 seconds
@@ -200,7 +286,15 @@ Comprehensive documentation is available in the [`docs/`](docs/) directory:
 ## Accuracy Expectations
-Transcription is **70-80% accurate** for simple piano music, **60-70%** for complex pieces. The interactive editor is designed to make fixing errors easy.
 ## Development
@@ -226,16 +320,28 @@ Once the backend is running, visit:
 **Worker not processing jobs?**
 - Check Redis is running: `redis-cli ping` (should return PONG)
-- Check worker logs: `docker-compose logs worker`
-**GPU not detected?**
-- Install NVIDIA Docker runtime
-- Uncomment GPU section in `docker-compose.yml`
-- Set `GPU_ENABLED=true` in `.env`
 **YouTube download fails?**
 - Video may be age-restricted or private
-- Check yt-dlp is up to date: `pip install -U yt-dlp`
 ## Contributing
@@ -247,8 +353,9 @@ MIT License - see [LICENSE](LICENSE) for details.
 ## Acknowledgments
 - **Demucs** (Meta AI Research) - Source separation
-- **basic-pitch** (Spotify) - Audio transcription
 - **VexFlow** - Music notation rendering
 - **Tone.js** - Web audio synthesis

 **Tech Stack**:
 - **Backend**: Python/FastAPI + Celery + Redis
 - **Frontend**: React + VexFlow (notation) + Tone.js (playback)
+- **ML**: Demucs (source separation) + YourMT3+ (transcription, 80-85% accuracy) + basic-pitch (fallback)
 ## Quick Start
 ### Prerequisites
+- **macOS** (Apple Silicon recommended for MPS GPU acceleration) OR **Linux** (with NVIDIA GPU)
+- **Python 3.10** (required for madmom compatibility)
+- **Node.js 18+**
+- **Redis 7+**
+- **FFmpeg**
+- **Homebrew** (macOS only, for Redis installation)
+### Installation
 ```bash
 # Clone repository
 cd rescored
 ```
+### Setup Redis (macOS)
+```bash
+# Install Redis via Homebrew
+brew install redis
+# Start Redis service
+brew services start redis
+# Verify Redis is running
+redis-cli ping  # Should return PONG
+```
+### Setup Backend (Python 3.10 + MPS GPU Acceleration)
+```bash
+cd backend
+# Activate Python 3.10 virtual environment (already configured)
+source .venv/bin/activate
+# Verify Python version
+python --version  # Should show Python 3.10.x
+# Backend dependencies are already installed in .venv
+# If you need to reinstall:
+# pip install -r requirements.txt
+# Copy environment file and configure
+cp .env.example .env
+# Edit .env - ensure YOURMT3_DEVICE=mps for Apple Silicon GPU acceleration
+```
+### Setup Frontend
+```bash
+cd frontend
+# Install dependencies
+npm install
+```
+### ⚠️ REQUIRED: YouTube Cookies Setup
 YouTube requires authentication for video downloads (as of December 2024). You **MUST** export your YouTube cookies before the application will work.
 3. **Place Cookie File**
    ```bash
+   # Create storage directory if it doesn't exist
    mkdir -p storage
    # Move the exported file (adjust path if needed)
    mv ~/Downloads/youtube.com_cookies.txt ./storage/youtube_cookies.txt
    ```
 4. **Start Services**
+   **Option A: Single Command (Recommended)**
+   ```bash
+   ./start.sh
+   ```
+   This starts all services in the background. Logs are written to `logs/` directory.
+   To stop all services:
+   ```bash
+   ./stop.sh
+   # Or press Ctrl+C in the terminal running start.sh
+   ```
+   To view logs while running:
+   ```bash
+   tail -f logs/api.log      # Backend API logs
+   tail -f logs/worker.log   # Celery worker logs
+   tail -f logs/frontend.log # Frontend logs
+   ```
+   **Option B: Manual (3 separate terminals)**
+   **Terminal 1 - Backend API:**
+   ```bash
+   cd backend
+   source .venv/bin/activate
+   uvicorn main:app --host 0.0.0.0 --port 8000 --reload
+   ```
+   **Terminal 2 - Celery Worker:**
+   ```bash
+   cd backend
+   source .venv/bin/activate
+   # Use --pool=solo on macOS to avoid fork() crashes with ML libraries
+   celery -A tasks worker --loglevel=info --pool=solo
+   ```
+   **Terminal 3 - Frontend:**
    ```bash
+   cd frontend
+   npm run dev
    ```
+   **Services will be available at:**
+   - Frontend: http://localhost:5173
+   - Backend API: http://localhost:8000
+   - API Docs: http://localhost:8000/docs
 **Verification:**
 ```bash
+ls -lh storage/youtube_cookies.txt
 ```
 You should see the file listed.
 **Why Is This Required?** YouTube implemented bot detection in late 2024 that blocks unauthenticated downloads. Even though our tool is for legitimate transcription purposes, YouTube's systems can't distinguish it from scrapers. By providing your cookies, you're proving you're a real user who has agreed to YouTube's terms of service.
+### YourMT3+ Setup
+The backend uses **YourMT3+** as the primary transcription model (80-85% accuracy) with automatic fallback to basic-pitch (70% accuracy) if YourMT3+ is unavailable.
+**YourMT3+ model files and source code are already included in the repository.** The model checkpoint (~536MB) is stored via Git LFS in `backend/ymt/yourmt3_core/`.
+**Verify YourMT3+ is working:**
+```bash
+# Start backend (if not already running)
+cd backend
+source .venv/bin/activate
+uvicorn main:app --host 0.0.0.0 --port 8000 --reload
+# In another terminal, test YourMT3+ loading
+cd backend
+source .venv/bin/activate
+python -c "from yourmt3_wrapper import YourMT3Transcriber; t = YourMT3Transcriber(device='mps'); print('✓ YourMT3+ loaded successfully!')"
 ```
+You should see:
+- `Model loaded successfully on mps`
+- `GPU available: True (mps), used: True`
+- `✓ YourMT3+ loaded successfully!`
+**GPU Acceleration:**
+- **Apple Silicon (M1/M2/M3/M4):** Uses MPS (Metal Performance Shaders) with 16-bit mixed precision for optimal performance. Default is `YOURMT3_DEVICE=mps` in `.env`.
+- **NVIDIA GPU:** Change `YOURMT3_DEVICE=cuda` in `.env`
+- **CPU Only:** Change `YOURMT3_DEVICE=cpu` in `.env` (will be much slower)
+**Important:** The symlink at `backend/ymt/yourmt3_core/amt/src/amt/logs` must point to `../../logs` for checkpoint loading to work. This is already configured in the repository.
 ## Usage
+1. **Ensure all services are running:**
+   - Redis: `brew services list | grep redis` (should show "started")
+   - Backend API: Terminal 1 should show "Uvicorn running on http://0.0.0.0:8000"
+   - Celery Worker: Terminal 2 should show "celery@hostname ready"
+   - Frontend: Terminal 3 should show "Local: http://localhost:5173"
+2. Open [http://localhost:5173](http://localhost:5173)
+3. Paste a YouTube URL (piano music recommended for best results)
+4. Wait for transcription:
+   - **With MPS/GPU**: ~1-2 minutes
+   - **With CPU**: ~10-15 minutes
+5. Edit the notation in the interactive editor
+6. Export as MusicXML or MIDI
 ## MVP Features
 ## Performance
+**With Apple Silicon MPS (M1/M2/M3/M4)**:
+- Download: ~10 seconds
+- Source separation (Demucs): ~30-60 seconds
+- Transcription (YourMT3+): ~20-30 seconds
+- **Total: ~1-2 minutes**
+**With NVIDIA GPU (RTX 3080)**:
 - Download: ~10 seconds
 - Source separation: ~45 seconds
 - Transcription: ~5 seconds
 ## Accuracy Expectations
+**With YourMT3+ (recommended):**
+- Simple piano: **80-85% accurate**
+- Complex pieces: **70-75% accurate**
+**With basic-pitch (fallback):**
+- Simple piano: **70-75% accurate**
+- Complex pieces: **60-70% accurate**
+The interactive editor is designed to make fixing errors easy regardless of which transcription model is used.
 ## Development
 **Worker not processing jobs?**
 - Check Redis is running: `redis-cli ping` (should return PONG)
+- If Redis isn't running: `brew services start redis`
+- Check worker logs in Terminal 2
+**MPS/GPU not being used?**
+- Verify MPS is available: `python -c "import torch; print(torch.backends.mps.is_available())"`
+- Check `.env` has `YOURMT3_DEVICE=mps`
+- For NVIDIA GPU: Set `YOURMT3_DEVICE=cuda`
+**YourMT3+ fails to load?**
+- Ensure Python 3.10 is being used: `python --version`
+- Check symlink exists: `ls -la backend/ymt/yourmt3_core/amt/src/amt/logs`
+- Verify checkpoint file exists: `ls -lh backend/ymt/yourmt3_core/logs/2024/*/checkpoints/last.ckpt`
 **YouTube download fails?**
+- Ensure `storage/youtube_cookies.txt` exists and is recent
+- Export fresh cookies from a NEW incognito window
 - Video may be age-restricted or private
+- Update yt-dlp: `source .venv/bin/activate && pip install -U yt-dlp`
+**Module import errors?**
+- Make sure you're in the virtual environment: `source backend/.venv/bin/activate`
+- Reinstall requirements: `pip install -r requirements.txt`
 ## Contributing
 ## Acknowledgments
+- **YourMT3+** (KAIST) - State-of-the-art music transcription ([Paper](https://arxiv.org/abs/2407.04822))
 - **Demucs** (Meta AI Research) - Source separation
+- **basic-pitch** (Spotify) - Fallback audio transcription
 - **VexFlow** - Music notation rendering
 - **Tone.js** - Web audio synthesis

backend/.dockerignore ADDED Viewed

	@@ -0,0 +1,49 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+*.egg-info/
+dist/
+build/
+*.egg
+# Virtual environments
+.venv/
+venv/
+ENV/
+env/
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# Testing
+.pytest_cache/
+.coverage
+htmlcov/
+*.log
+# Storage and temp files
+storage/
+*.wav
+*.mid
+*.musicxml
+*.tmp
+*.temp
+# YourMT3+ large files (4.5GB model repo - too large for Docker image)
+ymt/yourmt3_core/model_repo/
+ymt/yourmt3_core/.git/
+ymt/model_output/
+ymt/*.mid
+ymt/*.log
+# Environment files
+.env
+.env.local
+.env.production

backend/.env.example CHANGED Viewed

@@ -2,7 +2,7 @@
 REDIS_URL=redis://localhost:6379/0
 # Storage Configuration
-STORAGE_PATH=/tmp/rescored
 # API Configuration
 API_HOST=0.0.0.0
@@ -14,3 +14,7 @@ MAX_VIDEO_DURATION=900  # 15 minutes in seconds
 # CORS Origins (comma-separated)
 CORS_ORIGINS=http://localhost:5173,http://localhost:3000

 REDIS_URL=redis://localhost:6379/0
 # Storage Configuration
+STORAGE_PATH=../storage
 # API Configuration
 API_HOST=0.0.0.0
 # CORS Origins (comma-separated)
 CORS_ORIGINS=http://localhost:5173,http://localhost:3000
+# YourMT3+ Use
+USE_YOURMT3_TRANSCRIPTION=true
+YOURMT3_DEVICE=mps

backend/Dockerfile CHANGED Viewed

@@ -4,6 +4,8 @@ FROM python:3.11-slim
 RUN apt-get update && apt-get install -y \
     ffmpeg \
     git \
     && rm -rf /var/lib/apt/lists/*
 # Set working directory
@@ -12,6 +14,9 @@ WORKDIR /app
 # Copy requirements
 COPY requirements.txt .
 # Install Python dependencies
 RUN pip install --no-cache-dir -r requirements.txt

 RUN apt-get update && apt-get install -y \
     ffmpeg \
     git \
+    gcc \
+    build-essential \
     && rm -rf /var/lib/apt/lists/*
 # Set working directory
 # Copy requirements
 COPY requirements.txt .
+# Install build dependencies for madmom
+RUN pip install --no-cache-dir Cython 'numpy<2.0.0'
 # Install Python dependencies
 RUN pip install --no-cache-dir -r requirements.txt

backend/Dockerfile.worker CHANGED Viewed

@@ -8,6 +8,8 @@ RUN apt-get update && apt-get install -y \
     python3-pip \
     ffmpeg \
     git \
     && rm -rf /var/lib/apt/lists/*
 # Set working directory
@@ -16,6 +18,9 @@ WORKDIR /app
 # Copy requirements
 COPY requirements.txt .
 # Install Python dependencies
 RUN pip3 install --no-cache-dir -r requirements.txt

     python3-pip \
     ffmpeg \
     git \
+    gcc \
+    build-essential \
     && rm -rf /var/lib/apt/lists/*
 # Set working directory
 # Copy requirements
 COPY requirements.txt .
+# Install build dependencies for madmom
+RUN pip3 install --no-cache-dir Cython 'numpy<2.0.0'
 # Install Python dependencies
 RUN pip3 install --no-cache-dir -r requirements.txt

backend/{config.py → app_config.py} RENAMED Viewed

@@ -21,9 +21,9 @@ class Settings(BaseSettings):
     max_video_duration: int = 900  # 15 minutes
     # Transcription Configuration (basic-pitch)
-    onset_threshold: float = 0.5  # Note onset confidence (0-1). Increased to reduce false positives
-    frame_threshold: float = 0.45  # Frame activation threshold (0-1)
-    minimum_note_length: int = 127  # Minimum note samples (~58ms at 44.1kHz)
     minimum_frequency_hz: float = 65.0  # C2 (65 Hz) - filter low-frequency noise like F1
     maximum_frequency_hz: float | None = None  # No upper limit for piano range
@@ -61,7 +61,12 @@ class Settings(BaseSettings):
     # Python compatibility: madmom runtime patch enables Python 3.10+ support
     use_madmom_tempo_detection: bool = True  # Multi-scale tempo (eliminates octave errors)
     use_beat_synchronous_quantization: bool = True  # Beat-aligned quantization (eliminates double quantization)
-    use_omnizart_transcription: bool = False  # Better onset/offset detection (requires model download)
     # Grand Staff Configuration
     enable_grand_staff: bool = True  # Split piano into treble + bass clefs

     max_video_duration: int = 900  # 15 minutes
     # Transcription Configuration (basic-pitch)
+    onset_threshold: float = 0.3  # Note onset confidence (0-1). Lower = more notes detected
+    frame_threshold: float = 0.3  # Frame activation threshold (0-1). Basic-pitch default
+    minimum_note_length: int = 58  # Minimum note samples (~58ms at 44.1kHz). Basic-pitch default
     minimum_frequency_hz: float = 65.0  # C2 (65 Hz) - filter low-frequency noise like F1
     maximum_frequency_hz: float | None = None  # No upper limit for piano range
     # Python compatibility: madmom runtime patch enables Python 3.10+ support
     use_madmom_tempo_detection: bool = True  # Multi-scale tempo (eliminates octave errors)
     use_beat_synchronous_quantization: bool = True  # Beat-aligned quantization (eliminates double quantization)
+    # Transcription Service Configuration
+    use_yourmt3_transcription: bool = True  # YourMT3+ for 80-85% accuracy (default, falls back to basic-pitch)
+    transcription_service_url: str = "http://localhost:8000"  # Main API URL (YourMT3+ integrated)
+    transcription_service_timeout: int = 300  # Timeout for transcription requests (seconds)
+    yourmt3_device: str = "mps"  # Device for YourMT3+: 'mps' (Apple Silicon), 'cuda' (NVIDIA), or 'cpu'
     # Grand Staff Configuration
     enable_grand_staff: bool = True  # Split piano into treble + bass clefs

backend/{utils.py → app_utils.py} RENAMED Viewed

File without changes

backend/celery_app.py CHANGED Viewed

@@ -1,7 +1,7 @@
 """Celery application configuration."""
 from celery import Celery
 from kombu import Exchange, Queue
-from config import settings
 # Initialize Celery
 celery_app = Celery(

 """Celery application configuration."""
 from celery import Celery
 from kombu import Exchange, Queue
+from app_config import settings
 # Initialize Celery
 celery_app = Celery(

backend/main.py CHANGED Viewed

@@ -1,5 +1,5 @@
 """FastAPI application for Rescored backend."""
-from fastapi import FastAPI, HTTPException, WebSocket, WebSocketDisconnect, Request
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.responses import FileResponse
 from pydantic import BaseModel, HttpUrl
@@ -11,10 +11,21 @@ from starlette.responses import JSONResponse
 import redis
 import json
 import asyncio
-from config import settings
-from utils import validate_youtube_url, check_video_availability
 from tasks import process_transcription_task
 # Initialize FastAPI
 app = FastAPI(
     title="Rescored API",
@@ -25,6 +36,10 @@ app = FastAPI(
 # Redis client (initialized before middleware)
 redis_client = redis.Redis.from_url(settings.redis_url, decode_responses=True)
 # === Rate Limiting Middleware ===
@@ -81,6 +96,38 @@ app.add_middleware(
 app.add_middleware(RateLimitMiddleware)
 # === Request/Response Models ===
 class TranscribeRequest(BaseModel):
@@ -402,6 +449,85 @@ async def health_check():
     }
 if __name__ == "__main__":
     import uvicorn
     uvicorn.run(

 """FastAPI application for Rescored backend."""
+from fastapi import FastAPI, HTTPException, WebSocket, WebSocketDisconnect, Request, File, UploadFile
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.responses import FileResponse
 from pydantic import BaseModel, HttpUrl
 import redis
 import json
 import asyncio
+import tempfile
+import shutil
+from typing import Optional
+from app_config import settings
+from app_utils import validate_youtube_url, check_video_availability
 from tasks import process_transcription_task
+# YourMT3+ transcription service
+try:
+    from yourmt3_wrapper import YourMT3Transcriber
+    YOURMT3_AVAILABLE = True
+except ImportError as e:
+    YOURMT3_AVAILABLE = False
+    print(f"WARNING: YourMT3+ not available: {e}")
 # Initialize FastAPI
 app = FastAPI(
     title="Rescored API",
 # Redis client (initialized before middleware)
 redis_client = redis.Redis.from_url(settings.redis_url, decode_responses=True)
+# YourMT3+ transcriber (loaded on startup)
+yourmt3_transcriber: Optional[YourMT3Transcriber] = None
+YOURMT3_TEMP_DIR = Path(tempfile.gettempdir()) / "yourmt3_service"
 # === Rate Limiting Middleware ===
 app.add_middleware(RateLimitMiddleware)
+# === Application Lifecycle Events ===
+@app.on_event("startup")
+async def startup_event():
+    """Initialize YourMT3+ model on startup."""
+    global yourmt3_transcriber
+    if not YOURMT3_AVAILABLE or not settings.use_yourmt3_transcription:
+        print("YourMT3+ transcription disabled or unavailable")
+        return
+    try:
+        YOURMT3_TEMP_DIR.mkdir(parents=True, exist_ok=True)
+        print(f"Loading YourMT3+ model (device: {settings.yourmt3_device})...")
+        yourmt3_transcriber = YourMT3Transcriber(
+            model_name="YPTF.MoE+Multi (noPS)",
+            device=settings.yourmt3_device
+        )
+        print("✓ YourMT3+ model loaded successfully")
+    except Exception as e:
+        print(f"⚠ Failed to load YourMT3+ model: {e}")
+        print("  Service will fall back to basic-pitch for transcription")
+        yourmt3_transcriber = None
+@app.on_event("shutdown")
+async def shutdown_event():
+    """Clean up temporary files on shutdown."""
+    if YOURMT3_TEMP_DIR.exists():
+        shutil.rmtree(YOURMT3_TEMP_DIR, ignore_errors=True)
 # === Request/Response Models ===
 class TranscribeRequest(BaseModel):
     }
+# === YourMT3+ Transcription Endpoints ===
+@app.get("/api/v1/yourmt3/health")
+async def yourmt3_health():
+    """
+    Check YourMT3+ transcription service health.
+    Returns model status, device, and availability.
+    """
+    if not YOURMT3_AVAILABLE:
+        return {
+            "status": "unavailable",
+            "model_loaded": False,
+            "reason": "YourMT3+ dependencies not installed"
+        }
+    model_loaded = yourmt3_transcriber is not None
+    return {
+        "status": "healthy" if model_loaded else "degraded",
+        "model_loaded": model_loaded,
+        "model_name": "YPTF.MoE+Multi (noPS)" if model_loaded else "not loaded",
+        "device": yourmt3_transcriber.device if model_loaded else "unknown"
+    }
+@app.post("/api/v1/yourmt3/transcribe")
+async def yourmt3_transcribe(file: UploadFile = File(...)):
+    """
+    Transcribe audio file to MIDI using YourMT3+.
+    This endpoint is used by the pipeline for direct transcription.
+    """
+    if yourmt3_transcriber is None:
+        raise HTTPException(status_code=503, detail="YourMT3+ model not loaded")
+    # Save uploaded file
+    input_file = YOURMT3_TEMP_DIR / f"input_{uuid4().hex}_{file.filename}"
+    try:
+        with open(input_file, "wb") as f:
+            content = await file.read()
+            f.write(content)
+        # Transcribe
+        output_dir = YOURMT3_TEMP_DIR / f"output_{uuid4().hex}"
+        output_dir.mkdir(parents=True, exist_ok=True)
+        midi_path = yourmt3_transcriber.transcribe_audio(input_file, output_dir)
+        # Return MIDI file
+        return FileResponse(
+            path=str(midi_path),
+            media_type="audio/midi",
+            filename=midi_path.name
+        )
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"Transcription failed: {str(e)}")
+    finally:
+        # Clean up input file
+        if input_file.exists():
+            input_file.unlink()
+@app.get("/api/v1/yourmt3/models")
+async def yourmt3_models():
+    """List available YourMT3+ model variants."""
+    return {
+        "models": [
+            {
+                "name": "YPTF.MoE+Multi (noPS)",
+                "description": "Mixture of Experts multi-instrument transcription (default)",
+                "loaded": yourmt3_transcriber is not None
+            }
+        ],
+        "default": "YPTF.MoE+Multi (noPS)"
+    }
 if __name__ == "__main__":
     import uvicorn
     uvicorn.run(

backend/pipeline.py CHANGED Viewed

@@ -34,12 +34,6 @@ except ImportError as e:
     print(f"WARNING: madmom not available. Falling back to librosa for tempo/beat detection.")
     print(f"         Error: {e}")
-try:
-    import omnizart
-    OMNIZART_AVAILABLE = True
-except ImportError:
-    OMNIZART_AVAILABLE = False
-    print("WARNING: omnizart not installed. Install with: pip install omnizart")
 class TranscriptionPipeline:
@@ -55,7 +49,7 @@ class TranscriptionPipeline:
         # Load configuration
         if config is None:
-            from config import settings
             self.config = settings
         else:
             self.config = config
@@ -87,7 +81,13 @@ class TranscriptionPipeline:
             midi_path = self.transcribe_to_midi(stems['other'])
             self.progress(90, "musicxml", "Generating MusicXML")
-            musicxml_path = self.generate_musicxml(midi_path)
             self.progress(100, "complete", "Transcription complete")
             return musicxml_path
@@ -184,101 +184,161 @@ class TranscriptionPipeline:
             minimum_note_length = self.config.minimum_note_length
         output_dir = self.temp_dir
-        midi_path = output_dir / "piano.mid"
-        print(f"   Transcribing with basic-pitch (onset={onset_threshold}, frame={frame_threshold})...")
-        # Run basic-pitch inference
-        # predict_and_save creates output files in the output directory
-        predict_and_save(
-            audio_path_list=[str(audio_path)],
-            output_directory=str(output_dir),
-            save_midi=True,
-            sonify_midi=False,  # Don't create audio
-            save_model_outputs=False,  # Don't save raw outputs
-            save_notes=False,  # Don't save CSV
-            model_or_model_path=ICASSP_2022_MODEL_PATH,
-            onset_threshold=onset_threshold,
-            frame_threshold=frame_threshold,
-            minimum_note_length=minimum_note_length,
-            minimum_frequency=self.config.minimum_frequency_hz,  # Filter low-frequency noise (F1)
-            maximum_frequency=self.config.maximum_frequency_hz,  # No upper limit
-            multiple_pitch_bends=False,
-            melodia_trick=True,  # Improves monophonic melody
-            debug_file=None
-        )
-        # basic-pitch saves as {audio_stem}_basic_pitch.mid
-        generated_midi = output_dir / f"{audio_path.stem}_basic_pitch.mid"
-        if not generated_midi.exists():
-            raise RuntimeError("basic-pitch did not create MIDI file")
-        # Rename to expected path
-        generated_midi.rename(midi_path)
-        # Detect tempo from source audio for accurate post-processing
-        source_audio = self.temp_dir / "audio.wav"
-        if source_audio.exists():
-            detected_tempo, _ = self.detect_tempo_from_audio(source_audio)
-        else:
-            detected_tempo = 120.0  # Fallback
-        # Post-process MIDI (adaptive pipeline based on music type)
-        # 1. Detect if music is polyphonic (wide range) or monophonic (narrow range)
-        range_semitones = self._get_midi_range(midi_path)
-        if range_semitones > 24:
-            # Wide range (>2 octaves) = likely polyphonic piano music
-            # Preserve all notes (bass + treble)
-            print(f"   Detected wide range ({range_semitones} semitones), preserving all notes")
-            mono_midi = midi_path
-        else:
-            # Narrow range (≤2 octaves) = likely monophonic melody
-            # Remove octave duplicates using pitch class deduplication
-            print(f"   Narrow range ({range_semitones} semitones), removing octave duplicates")
-            mono_midi = self.extract_monophonic_melody(midi_path)
-        # 2. Clean (filter invalid notes, light quantization)
-        cleaned_midi = self.clean_midi(mono_midi, detected_tempo=detected_tempo)
-        # 2.3. PHASE 2: Beat-synchronous quantization (ZERO-TRADEOFF)
-        # If enabled and madmom available, quantize to detected beats instead of fixed grid
-        # This eliminates double quantization and ensures perfect musical alignment
-        if self.config.use_beat_synchronous_quantization and source_audio.exists():
-            beat_synced_midi = self.beat_synchronous_quantize(
-                cleaned_midi,
-                source_audio,
-                tempo_bpm=detected_tempo
             )
         else:
-            beat_synced_midi = cleaned_midi
-        # 2.5. CRITICAL FIX: Merge consecutive notes at MIDI level
-        # This fixes sustained notes appearing as "note → rest → note"
-        # The quantization creates gaps (125ms at 120 BPM for 16th grid, or from beat alignment)
-        # Merging with 150ms threshold catches these quantization artifacts
-        print(f"   Merging consecutive notes (gap threshold: 150ms)...")
-        merged_midi = self.merge_consecutive_notes(
-            beat_synced_midi,  # Use beat-synced MIDI if available
-            gap_threshold_ms=150,  # Generous to catch quantization gaps
-            tempo_bpm=detected_tempo
-        )
-        # 2.6. Optional: Merge sustain artifacts using envelope analysis
-        if self.config.enable_envelope_analysis:
-            print(f"   Analyzing note envelopes for sustain artifacts...")
-            final_midi = self.analyze_note_envelope_and_merge_sustains(
-                merged_midi,
-                tempo_bpm=detected_tempo
             )
-        else:
-            final_midi = merged_midi
-        # 3. Detect repeated patterns (validation)
-        self.detect_repeated_note_patterns(final_midi)
-        return final_midi
     def _get_midi_range(self, midi_path: Path) -> int:
         """
@@ -626,10 +686,11 @@ class TranscriptionPipeline:
         mid = mido.MidiFile(midi_path)
         # 3. Convert beat times (seconds) to MIDI ticks
         seconds_per_beat = 60.0 / tempo_bpm
         beat_ticks = []
         for beat_time in beats:
-            ticks = int(beat_time / seconds_per_beat * mid.ticks_per_beat)
             beat_ticks.append(ticks)
         # 4. Quantize note onsets to nearest beat (preserve durations)
@@ -640,16 +701,23 @@ class TranscriptionPipeline:
             for msg in track:
                 abs_time += msg.time
                 messages_with_abs_time.append((abs_time, msg))
             # Quantize note_on events to nearest beat
-            note_on_times = {}  # Track quantized onset times
             for i, (abs_time, msg) in enumerate(messages_with_abs_time):
                 if msg.type == 'note_on' and msg.velocity > 0:
                     # Find nearest beat
                     nearest_beat = min(beat_ticks, key=lambda b: abs(b - abs_time))
                     # Update absolute time to nearest beat
                     messages_with_abs_time[i] = (nearest_beat, msg)
@@ -658,27 +726,44 @@ class TranscriptionPipeline:
                 elif msg.type == 'note_off' or (msg.type == 'note_on' and msg.velocity == 0):
                     # Preserve duration by keeping offset relative to quantized onset
-                    if (msg.channel, msg.note) in note_on_times:
-                        onset_time = note_on_times[(msg.channel, msg.note)]
-                        original_duration = abs_time - [t for t, m in messages_with_abs_time
-                                                        if m.type == 'note_on' and m.note == msg.note
-                                                        and m.channel == msg.channel][-1]
                         # Keep same duration from quantized onset
                         new_offset = onset_time + original_duration
                         messages_with_abs_time[i] = (new_offset, msg)
-                        del note_on_times[(msg.channel, msg.note)]
             # Rebuild track with new timings
             track.clear()
             previous_time = 0
             for abs_time, msg in sorted(messages_with_abs_time, key=lambda x: x[0]):
                 msg.time = max(0, abs_time - previous_time)
                 previous_time = abs_time
                 track.append(msg)
         # 5. Save beat-quantized MIDI
         beat_sync_path = midi_path.with_stem(f"{midi_path.stem}_beat_sync")
         mid.save(beat_sync_path)
@@ -1145,6 +1230,108 @@ class TranscriptionPipeline:
         return output_path
     def _deduplicate_overlapping_notes(self, score) -> stream.Score:
         """
         Deduplicate overlapping notes from basic-pitch to prevent MusicXML corruption.
@@ -1829,20 +2016,22 @@ class TranscriptionPipeline:
         """
         print(f"   Using madmom multi-scale tempo detection (eliminates octave errors)...")
-        # Multi-scale tempo processor
         tempo_processor = madmom.features.tempo.TempoEstimationProcessor(fps=100)
         # Get tempo candidates from multi-scale analysis
-        tempo_result = tempo_processor(str(audio_path))
-        # tempo_result is array of [tempo1, strength1, tempo2, strength2, ...]
-        # Extract top candidates
         tempos = []
         strengths = []
-        for i in range(0, len(tempo_result), 2):
-            if i + 1 < len(tempo_result):
-                tempos.append(float(tempo_result[i]))
-                strengths.append(float(tempo_result[i + 1]))
         if not tempos:
             print(f"   WARNING: Madmom returned no tempo candidates, using default 120 BPM")
@@ -1939,21 +2128,20 @@ class TranscriptionPipeline:
             print(f"   WARNING: madmom not available, falling back to librosa beat tracking")
             return self._detect_beats_librosa(audio_path)
-        print(f"   Detecting beats and downbeats with madmom...")
-        # Beat tracking processor
-        beat_processor = madmom.features.beats.BeatTrackingProcessor(fps=100)
-        beats = beat_processor(str(audio_path))
-        # Downbeat tracking processor
-        downbeat_processor = madmom.features.downbeats.DBNDownBeatTrackingProcessor(beats_per_bar=[3, 4], fps=100)
-        downbeats_result = downbeat_processor(str(audio_path))
-        # downbeats_result is array of [time, beat_position]
-        # Extract only downbeats (beat_position == 1)
-        downbeats = downbeats_result[downbeats_result[:, 1] == 1, 0] if len(downbeats_result) > 0 else np.array([])
-        print(f"   Detected {len(beats)} beats, {len(downbeats)} downbeats")
         return beats, downbeats

     print(f"WARNING: madmom not available. Falling back to librosa for tempo/beat detection.")
     print(f"         Error: {e}")
 class TranscriptionPipeline:
         # Load configuration
         if config is None:
+            from app_config import settings
             self.config = settings
         else:
             self.config = config
             midi_path = self.transcribe_to_midi(stems['other'])
             self.progress(90, "musicxml", "Generating MusicXML")
+            # Use minimal generator for YourMT3+, full generator for basic-pitch
+            if self.config.use_yourmt3_transcription:
+                print(f"   Using minimal MusicXML generation (YourMT3+)")
+                musicxml_path = self.generate_musicxml_minimal(midi_path, stems['other'])
+            else:
+                print(f"   Using full MusicXML generation (basic-pitch)")
+                musicxml_path = self.generate_musicxml(midi_path)
             self.progress(100, "complete", "Transcription complete")
             return musicxml_path
             minimum_note_length = self.config.minimum_note_length
         output_dir = self.temp_dir
+        # === STEP 1: Try YourMT3+ first (primary transcriber) ===
+        use_yourmt3 = self.config.use_yourmt3_transcription
+        midi_path = None
+        if use_yourmt3:
+            try:
+                print(f"   Transcribing with YourMT3+ (primary transcriber)...")
+                midi_path = self.transcribe_with_yourmt3(audio_path)
+                print(f"   ✓ YourMT3+ transcription complete")
+            except Exception as e:
+                import traceback
+                print(f"   ⚠ YourMT3+ failed: {e}")
+                print(f"   Full error: {traceback.format_exc()}")
+                print(f"   → Falling back to basic-pitch")
+                midi_path = None
+        # === STEP 2: Fallback to basic-pitch if YourMT3+ failed or disabled ===
+        if midi_path is None:
+            print(f"   Transcribing with basic-pitch (onset={onset_threshold}, frame={frame_threshold})...")
+            # Run basic-pitch inference
+            # predict_and_save creates output files in the output directory
+            predict_and_save(
+                audio_path_list=[str(audio_path)],
+                output_directory=str(output_dir),
+                save_midi=True,
+                sonify_midi=False,  # Don't create audio
+                save_model_outputs=False,  # Don't save raw outputs
+                save_notes=False,  # Don't save CSV
+                model_or_model_path=ICASSP_2022_MODEL_PATH,
+                onset_threshold=onset_threshold,
+                frame_threshold=frame_threshold,
+                minimum_note_length=minimum_note_length,
+                minimum_frequency=self.config.minimum_frequency_hz,  # Filter low-frequency noise (F1)
+                maximum_frequency=self.config.maximum_frequency_hz,  # No upper limit
+                multiple_pitch_bends=False,
+                melodia_trick=True,  # Improves monophonic melody
+                debug_file=None
             )
+            # basic-pitch saves as {audio_stem}_basic_pitch.mid
+            generated_bp_midi = output_dir / f"{audio_path.stem}_basic_pitch.mid"
+            if not generated_bp_midi.exists():
+                raise RuntimeError("basic-pitch did not create MIDI file")
+            midi_path = generated_bp_midi
+            print(f"   ✓ basic-pitch transcription complete")
+        # Rename final MIDI to standard name for post-processing
+        final_midi_path = output_dir / "piano.mid"
+        if midi_path != final_midi_path:
+            midi_path.rename(final_midi_path)
+            midi_path = final_midi_path
+        # Conditional post-processing based on transcriber
+        if self.config.use_yourmt3_transcription:
+            # YourMT3+ produces clean MIDI - use as-is
+            print(f"   Using YourMT3+ output directly (no post-processing)")
+            return midi_path
         else:
+            # basic-pitch needs full post-processing pipeline
+            print(f"   Applying full post-processing for basic-pitch")
+            # Detect tempo from source audio for accurate post-processing
+            source_audio = self.temp_dir / "audio.wav"
+            if source_audio.exists():
+                detected_tempo, _ = self.detect_tempo_from_audio(source_audio)
+            else:
+                detected_tempo = 120.0
+            # 1. Polyphony detection
+            range_semitones = self._get_midi_range(midi_path)
+            if range_semitones > 24:
+                # Wide range (>2 octaves) = likely polyphonic piano music
+                print(f"   Detected wide range ({range_semitones} semitones), preserving all notes")
+                mono_midi = midi_path
+            else:
+                # Narrow range (≤2 octaves) = likely monophonic melody
+                print(f"   Narrow range ({range_semitones} semitones), removing octave duplicates")
+                mono_midi = self.extract_monophonic_melody(midi_path)
+            # 2. Clean (filter, quantize)
+            cleaned_midi = self.clean_midi(mono_midi, detected_tempo)
+            # 3. Beat-synchronous quantization
+            if self.config.use_beat_synchronous_quantization and source_audio.exists():
+                beat_synced_midi = self.beat_synchronous_quantize(cleaned_midi, source_audio, detected_tempo)
+            else:
+                beat_synced_midi = cleaned_midi
+            # 4. Merge consecutive notes
+            print(f"   Merging consecutive notes (gap threshold: 150ms)...")
+            merged_midi = self.merge_consecutive_notes(beat_synced_midi, gap_threshold_ms=150, tempo_bpm=detected_tempo)
+            # 5. Envelope analysis
+            if self.config.enable_envelope_analysis:
+                print(f"   Analyzing note envelopes for sustain artifacts...")
+                final_midi = self.analyze_note_envelope_and_merge_sustains(merged_midi, tempo_bpm=detected_tempo)
+            else:
+                final_midi = merged_midi
+            # 6. Validate (pattern detection)
+            self.detect_repeated_note_patterns(final_midi)
+            return final_midi
+    def transcribe_with_yourmt3(self, audio_path: Path) -> Path:
+        """
+        Transcribe audio to MIDI using YourMT3+ directly (in-process).
+        YourMT3+ is a state-of-the-art multi-instrument transcription model
+        that achieves 80-85% accuracy (vs 70% for basic-pitch).
+        Args:
+            audio_path: Path to audio file (should be 'other' stem for piano)
+        Returns:
+            Path to generated MIDI file
+        Raises:
+            RuntimeError: If transcription fails
+        """
+        try:
+            from yourmt3_wrapper import YourMT3Transcriber
+        except ImportError:
+            # Try adding backend directory to path
+            import sys
+            from pathlib import Path as PathLib
+            backend_dir = PathLib(__file__).parent
+            if str(backend_dir) not in sys.path:
+                sys.path.insert(0, str(backend_dir))
+            from yourmt3_wrapper import YourMT3Transcriber
+        print(f"   Transcribing with YourMT3+ (direct call, device: {self.config.yourmt3_device})...")
+        try:
+            # Initialize transcriber (reuses loaded model from API if available)
+            transcriber = YourMT3Transcriber(
+                model_name="YPTF.MoE+Multi (noPS)",
+                device=self.config.yourmt3_device
             )
+            # Transcribe audio
+            output_dir = self.temp_dir / "yourmt3_output"
+            output_dir.mkdir(exist_ok=True)
+            midi_path = transcriber.transcribe_audio(audio_path, output_dir)
+            print(f"   ✓ YourMT3+ transcription complete")
+            return midi_path
+        except Exception as e:
+            raise RuntimeError(f"YourMT3+ transcription failed: {e}")
     def _get_midi_range(self, midi_path: Path) -> int:
         """
         mid = mido.MidiFile(midi_path)
         # 3. Convert beat times (seconds) to MIDI ticks
+        # Formula: seconds * (ticks_per_beat / seconds_per_beat)
         seconds_per_beat = 60.0 / tempo_bpm
         beat_ticks = []
         for beat_time in beats:
+            ticks = int(beat_time * mid.ticks_per_beat / seconds_per_beat)
             beat_ticks.append(ticks)
         # 4. Quantize note onsets to nearest beat (preserve durations)
             for msg in track:
                 abs_time += msg.time
+                # Skip pitchwheel messages (not needed for notation, can cause timing issues)
+                if msg.type == 'pitchwheel':
+                    continue
                 messages_with_abs_time.append((abs_time, msg))
             # Quantize note_on events to nearest beat
+            note_on_times = {}  # Track quantized onset times: (channel, note) -> quantized_time
+            note_original_times = {}  # Track original onset times: (channel, note) -> original_time
             for i, (abs_time, msg) in enumerate(messages_with_abs_time):
                 if msg.type == 'note_on' and msg.velocity > 0:
                     # Find nearest beat
                     nearest_beat = min(beat_ticks, key=lambda b: abs(b - abs_time))
+                    # Store original time BEFORE quantization
+                    note_original_times[(msg.channel, msg.note)] = abs_time
                     # Update absolute time to nearest beat
                     messages_with_abs_time[i] = (nearest_beat, msg)
                 elif msg.type == 'note_off' or (msg.type == 'note_on' and msg.velocity == 0):
                     # Preserve duration by keeping offset relative to quantized onset
+                    key = (msg.channel, msg.note)
+                    if key in note_on_times and key in note_original_times:
+                        onset_time = note_on_times[key]
+                        original_onset_time = note_original_times[key]
+                        # Calculate duration using original times
+                        original_duration = abs_time - original_onset_time
                         # Keep same duration from quantized onset
                         new_offset = onset_time + original_duration
                         messages_with_abs_time[i] = (new_offset, msg)
+                        del note_on_times[key]
+                        del note_original_times[key]
             # Rebuild track with new timings
             track.clear()
             previous_time = 0
+            last_note_time = 0
             for abs_time, msg in sorted(messages_with_abs_time, key=lambda x: x[0]):
+                # Skip end_of_track for now - we'll add it at the end
+                if msg.type == 'end_of_track':
+                    continue
                 msg.time = max(0, abs_time - previous_time)
                 previous_time = abs_time
                 track.append(msg)
+                # Track last note time
+                if msg.type in ('note_on', 'note_off'):
+                    last_note_time = abs_time
+            # Add end_of_track after last note with small delta
+            from mido import MetaMessage
+            end_msg = MetaMessage('end_of_track', time=10)
+            track.append(end_msg)
         # 5. Save beat-quantized MIDI
         beat_sync_path = midi_path.with_stem(f"{midi_path.stem}_beat_sync")
         mid.save(beat_sync_path)
         return output_path
+    def generate_musicxml_minimal(self, midi_path: Path, source_audio: Path) -> Path:
+        """
+        Generate MusicXML from clean MIDI (YourMT3+ output) with minimal post-processing.
+        This is a simplified pipeline for YourMT3+ which produces clean, well-quantized MIDI.
+        Skips all MIDI-level post-processing and only applies music21-level operations.
+        Steps:
+        1. Detect tempo, time signature, key from audio
+        2. Parse MIDI with music21
+        3. Create measures
+        4. Optional: Split into grand staff (treble + bass)
+        5. Export MusicXML
+        Args:
+            midi_path: Clean MIDI from YourMT3+ (no post-processing needed)
+            source_audio: Audio file for metadata detection
+        Returns:
+            Path to generated MusicXML file
+        """
+        from music21 import converter, tempo, meter, clef
+        self.progress(92, "musicxml", "Detecting metadata from audio")
+        # Step 1: Detect metadata from audio
+        if source_audio.exists():
+            # Detect tempo
+            detected_tempo, tempo_confidence = self.detect_tempo_from_audio(source_audio)
+            # Detect time signature
+            time_sig_num, time_sig_denom, ts_confidence = self.detect_time_signature(source_audio, detected_tempo)
+        else:
+            print("   WARNING: Audio file not found, using defaults")
+            detected_tempo, tempo_confidence = 120.0, 0.0
+            time_sig_num, time_sig_denom, ts_confidence = 4, 4, 0.0
+        print(f"   Detected: {detected_tempo} BPM (confidence: {tempo_confidence:.2f})")
+        print(f"   Detected: {time_sig_num}/{time_sig_denom} time (confidence: {ts_confidence:.2f})")
+        self.progress(93, "musicxml", "Parsing MIDI")
+        # Step 2: Parse MIDI
+        score = converter.parse(midi_path)
+        self.progress(94, "musicxml", "Detecting key signature")
+        # Step 3: Detect key signature
+        detected_key, key_confidence = self.detect_key_ensemble(score, source_audio)
+        print(f"   Detected key: {detected_key} (confidence: {key_confidence:.2f})")
+        self.progress(96, "musicxml", "Creating measures")
+        # Step 4: Create measures
+        score = score.makeMeasures()
+        # Step 5: Grand staff split (optional)
+        if self.config.enable_grand_staff:
+            print(f"   Splitting into grand staff (split at MIDI note {self.config.middle_c_split})...")
+            score = self._split_into_grand_staff(score)
+            print(f"   Created {len(score.parts)} staves (treble + bass)")
+            # Insert metadata into each part
+            for part in score.parts:
+                measures = part.getElementsByClass('Measure')
+                if measures:
+                    first_measure = measures[0]
+                    first_measure.insert(0, tempo.MetronomeMark(number=detected_tempo))
+                    first_measure.insert(0, detected_key)
+                    first_measure.insert(0, meter.TimeSignature(f'{time_sig_num}/{time_sig_denom}'))
+        else:
+            # Single staff: add treble clef and metadata
+            for part in score.parts:
+                part.insert(0, clef.TrebleClef())
+                part.insert(0, detected_key)
+                part.insert(0, meter.TimeSignature(f'{time_sig_num}/{time_sig_denom}'))
+                part.insert(0, tempo.MetronomeMark(number=detected_tempo))
+                part.partName = "Piano"
+        self.progress(97, "musicxml", "Normalizing durations")
+        # Step 5.5: Fix any impossible durations that music21 can't export
+        # YourMT3+ output is clean, but music21 has limitations on complex durations
+        score = self._remove_impossible_durations(score)
+        self.progress(98, "musicxml", "Exporting MusicXML")
+        # Step 6: Export MusicXML
+        output_path = self.temp_dir / f"{self.job_id}.musicxml"
+        print(f"   Writing MusicXML to {output_path}...")
+        try:
+            score.write('musicxml', fp=str(output_path), makeNotation=False)
+        except Exception as e:
+            # If export still fails due to complex durations, try with makeNotation=True
+            # This lets music21 handle the complex durations automatically
+            print(f"   WARNING: Export failed with makeNotation=False: {e}")
+            print(f"   Retrying with makeNotation=True (auto-notation)...")
+            score.write('musicxml', fp=str(output_path), makeNotation=True)
+        print(f"   ✓ MusicXML generation complete")
+        return output_path
     def _deduplicate_overlapping_notes(self, score) -> stream.Score:
         """
         Deduplicate overlapping notes from basic-pitch to prevent MusicXML corruption.
         """
         print(f"   Using madmom multi-scale tempo detection (eliminates octave errors)...")
+        # Process audio to get beat activations
+        act = madmom.features.beats.RNNBeatProcessor()(str(audio_path))
+        # Multi-scale tempo processor (operates on activations, not raw audio)
         tempo_processor = madmom.features.tempo.TempoEstimationProcessor(fps=100)
         # Get tempo candidates from multi-scale analysis
+        tempo_result = tempo_processor(act)
+        # tempo_result is 2D array where each row is [tempo_bpm, strength]
+        # Extract candidates
         tempos = []
         strengths = []
+        for row in tempo_result:
+            tempos.append(float(row[0]))  # tempo in BPM
+            strengths.append(float(row[1]))  # strength/confidence
         if not tempos:
             print(f"   WARNING: Madmom returned no tempo candidates, using default 120 BPM")
             print(f"   WARNING: madmom not available, falling back to librosa beat tracking")
             return self._detect_beats_librosa(audio_path)
+        print(f"   Detecting beats with madmom...")
+        # Process audio to get beat activations
+        beat_act = madmom.features.beats.RNNBeatProcessor()(str(audio_path))
+        # Beat tracking processor (operates on activations)
+        beat_processor = madmom.features.beats.BeatTrackingProcessor(fps=100)
+        beats = beat_processor(beat_act)
+        # Estimate downbeats (every 4th beat for 4/4 time - simple heuristic)
+        # More sophisticated downbeat detection with madmom can be added later if needed
+        downbeats = beats[::4] if len(beats) > 0 else np.array([])
+        print(f"   Detected {len(beats)} beats, {len(downbeats)} estimated downbeats")
         return beats, downbeats

backend/requirements.txt CHANGED Viewed

@@ -12,16 +12,23 @@ redis==5.2.1
 yt-dlp>=2025.12.8
 soundfile==0.12.1
 librosa>=0.11.0
 madmom>=0.16.1  # Zero-tradeoff: Beat tracking and multi-scale tempo detection
 scipy
 torch>=2.0.0
 torchaudio>=2.9.1
-torchcodec>=0.9.1
 demucs>=3.0.6
 # Pitch detection (macOS default runtime is CoreML)
-basic-pitch==0.4.0  # Will be replaced by Omnizart for better accuracy
-omnizart>=0.5.0  # Zero-tradeoff: Better onset/offset detection than basic-pitch
 # Music Processing
 music21==9.3.0

 yt-dlp>=2025.12.8
 soundfile==0.12.1
 librosa>=0.11.0
+Cython  # Required by madmom
 madmom>=0.16.1  # Zero-tradeoff: Beat tracking and multi-scale tempo detection
 scipy
 torch>=2.0.0
 torchaudio>=2.9.1
 demucs>=3.0.6
 # Pitch detection (macOS default runtime is CoreML)
+basic-pitch==0.4.0  # Fallback transcriber when YourMT3+ service unavailable
+# YourMT3+ Transcription (integrated into main service)
+lightning>=2.2.1
+transformers==4.45.1
+einops>=0.7.0
+deprecated
+wandb>=0.15.0
+gradio_log
 # Music Processing
 music21==9.3.0

backend/scripts/diagnose_pipeline.py CHANGED Viewed

@@ -17,7 +17,7 @@ import mido
 # Add parent directory to path for imports
 sys.path.insert(0, str(Path(__file__).parent.parent))
-from config import settings
 def analyze_audio_file(audio_path: Path, label: str):

 # Add parent directory to path for imports
 sys.path.insert(0, str(Path(__file__).parent.parent))
+from app_config import settings
 def analyze_audio_file(audio_path: Path, label: str):

backend/scripts/test_accuracy.py CHANGED Viewed

@@ -9,7 +9,7 @@ from pathlib import Path
 sys.path.insert(0, str(Path(__file__).parent.parent))
 from pipeline import TranscriptionPipeline
-from config import settings
 import json
 from datetime import datetime

 sys.path.insert(0, str(Path(__file__).parent.parent))
 from pipeline import TranscriptionPipeline
+from app_config import settings
 import json
 from datetime import datetime

backend/scripts/test_e2e.py CHANGED Viewed

@@ -15,7 +15,7 @@ from pathlib import Path
 sys.path.insert(0, str(Path(__file__).parent.parent))
 from pipeline import TranscriptionPipeline
-from config import settings
 import time

 sys.path.insert(0, str(Path(__file__).parent.parent))
 from pipeline import TranscriptionPipeline
+from app_config import settings
 import time

backend/scripts/test_mps_performance.sh ADDED Viewed

	@@ -0,0 +1,55 @@

+#!/bin/bash
+# Test MPS performance with optimizations
+echo "====================================="
+echo "YourMT3+ MPS Performance Test"
+echo "====================================="
+echo ""
+# Start service in background
+echo "Starting transcription service with MPS + float16..."
+cd /Users/calebhan/Documents/Coding/Personal/rescored/backend/transcription-service
+source ../backend/.venv/bin/activate 2>/dev/null || true
+python service.py > service.log 2>&1 &
+SERVICE_PID=$!
+echo "Service PID: $SERVICE_PID"
+echo "Waiting for service to initialize (30s)..."
+sleep 30
+# Check health
+echo ""
+echo "Checking service health..."
+curl -s http://localhost:8001/health | python -m json.tool
+# Run test transcription with timing
+echo ""
+echo "Running test transcription..."
+echo "Audio file: ../../audio.wav"
+echo ""
+START_TIME=$(date +%s)
+curl -X POST "http://localhost:8001/transcribe" \
+    -F "file=@../../audio.wav" \
+    --output test_mps_output.mid \
+    --max-time 600
+END_TIME=$(date +%s)
+ELAPSED=$((END_TIME - START_TIME))
+echo ""
+echo "====================================="
+echo "Results:"
+echo "====================================="
+echo "Processing time: ${ELAPSED}s"
+echo "MIDI output size: $(ls -lh test_mps_output.mid 2>/dev/null | awk '{print $5}')"
+echo ""
+echo "Service log (last 20 lines):"
+tail -20 service.log
+echo ""
+echo "====================================="
+# Cleanup
+echo "Stopping service (PID: $SERVICE_PID)..."
+kill $SERVICE_PID 2>/dev/null || true
+echo "Done!"

backend/scripts/test_quick_verify.py CHANGED Viewed

@@ -113,7 +113,7 @@ def main():
             print(f"  - {r['video_id']:20s} | {error_preview}")
     # Save results
-    from config import settings
     output_path = Path(settings.storage_path) / "quick_verify_results.json"
     output_path.parent.mkdir(parents=True, exist_ok=True)

             print(f"  - {r['video_id']:20s} | {error_preview}")
     # Save results
+    from app_config import settings
     output_path = Path(settings.storage_path) / "quick_verify_results.json"
     output_path.parent.mkdir(parents=True, exist_ok=True)

backend/tasks.py CHANGED Viewed

@@ -6,7 +6,7 @@ import redis
 import json
 from datetime import datetime
 from pathlib import Path
-from config import settings
 import shutil
 # Redis client
@@ -26,6 +26,7 @@ class TranscriptionTask(Task):
             stage: Current stage name
             message: Status message
         """
         job_key = f"job:{job_id}"
         # Update Redis hash
@@ -45,7 +46,8 @@ class TranscriptionTask(Task):
             "message": message,
             "timestamp": datetime.utcnow().isoformat(),
         }
-        redis_client.publish(f"job:{job_id}:updates", json.dumps(update))
 @celery_app.task(base=TranscriptionTask, bind=True)
@@ -109,8 +111,8 @@ def process_transcription_task(self, job_id: str):
         redis_client.hset(f"job:{job_id}", mapping={
             "status": "completed",
             "progress": 100,
-            "output_path": str(output_path),
-            "midi_path": str(midi_path) if temp_midi_path.exists() else "",
             "completed_at": datetime.utcnow().isoformat(),
         })

 import json
 from datetime import datetime
 from pathlib import Path
+from app_config import settings
 import shutil
 # Redis client
             stage: Current stage name
             message: Status message
         """
+        print(f"[PROGRESS] {progress}% - {stage} - {message}")
         job_key = f"job:{job_id}"
         # Update Redis hash
             "message": message,
             "timestamp": datetime.utcnow().isoformat(),
         }
+        num_subscribers = redis_client.publish(f"job:{job_id}:updates", json.dumps(update))
+        print(f"[PROGRESS] Published to {num_subscribers} subscribers")
 @celery_app.task(base=TranscriptionTask, bind=True)
         redis_client.hset(f"job:{job_id}", mapping={
             "status": "completed",
             "progress": 100,
+            "output_path": str(output_path.absolute()),
+            "midi_path": str(midi_path.absolute()) if temp_midi_path.exists() else "",
             "completed_at": datetime.utcnow().isoformat(),
         })

backend/tests/test_pipeline_fixes.py CHANGED Viewed

@@ -4,7 +4,7 @@ from pathlib import Path
 import mido
 from music21 import note, chord, stream, converter
 from pipeline import TranscriptionPipeline
-from config import Settings
 @pytest.fixture

 import mido
 from music21 import note, chord, stream, converter
 from pipeline import TranscriptionPipeline
+from app_config import Settings
 @pytest.fixture

backend/tests/test_pipeline_monophonic.py CHANGED Viewed

@@ -3,7 +3,7 @@ import pytest
 import mido
 from pathlib import Path
 from pipeline import TranscriptionPipeline
-from config import Settings
 @pytest.fixture

 import mido
 from pathlib import Path
 from pipeline import TranscriptionPipeline
+from app_config import Settings
 @pytest.fixture

backend/tests/test_utils.py CHANGED Viewed

@@ -1,6 +1,6 @@
 """Unit tests for utility functions."""
 import pytest
-from utils import validate_youtube_url, check_video_availability
 from unittest.mock import patch, MagicMock
 import yt_dlp

 """Unit tests for utility functions."""
 import pytest
+from app_utils import validate_youtube_url, check_video_availability
 from unittest.mock import patch, MagicMock
 import yt_dlp

backend/tests/test_yourmt3_integration.py ADDED Viewed

	@@ -0,0 +1,296 @@

+"""
+Tests for YourMT3+ transcription service integration.
+Tests cover:
+- YourMT3+ service health check
+- Successful transcription
+- Fallback to basic-pitch on service failure
+- Fallback to basic-pitch when service disabled
+"""
+import pytest
+from pathlib import Path
+from unittest.mock import Mock, patch, MagicMock
+import mido
+import tempfile
+import shutil
+from pipeline import TranscriptionPipeline
+from app_config import Settings
+@pytest.fixture
+def temp_storage():
+    """Create temporary storage directory for tests."""
+    temp_dir = Path(tempfile.mkdtemp())
+    yield temp_dir
+    shutil.rmtree(temp_dir)
+@pytest.fixture
+def test_audio_file(temp_storage):
+    """Create a minimal test audio file."""
+    import soundfile as sf
+    import numpy as np
+    audio_path = temp_storage / "test_audio.wav"
+    # Create 1 second of silence
+    sample_rate = 44100
+    audio_data = np.zeros(sample_rate)
+    sf.write(str(audio_path), audio_data, sample_rate)
+    return audio_path
+@pytest.fixture
+def mock_yourmt3_midi(temp_storage):
+    """Create a mock MIDI file that YourMT3+ would return."""
+    midi_path = temp_storage / "yourmt3_output.mid"
+    # Create a simple MIDI file with one note
+    mid = mido.MidiFile()
+    track = mido.MidiTrack()
+    mid.tracks.append(track)
+    track.append(mido.Message('note_on', note=60, velocity=80, time=0))
+    track.append(mido.Message('note_off', note=60, velocity=0, time=480))
+    track.append(mido.MetaMessage('end_of_track', time=0))
+    mid.save(str(midi_path))
+    return midi_path
+@pytest.fixture
+def mock_basic_pitch_midi(temp_storage):
+    """Create a mock MIDI file that basic-pitch would return."""
+    midi_path = temp_storage / "basic_pitch_output.mid"
+    # Create a simple MIDI file with one note
+    mid = mido.MidiFile()
+    track = mido.MidiTrack()
+    mid.tracks.append(track)
+    track.append(mido.Message('note_on', note=62, velocity=70, time=0))
+    track.append(mido.Message('note_off', note=62, velocity=0, time=480))
+    track.append(mido.MetaMessage('end_of_track', time=0))
+    mid.save(str(midi_path))
+    return midi_path
+class TestYourMT3Integration:
+    """Test suite for YourMT3+ transcription service integration."""
+    def test_yourmt3_enabled_by_default(self):
+        """Test that YourMT3+ is enabled by default in config."""
+        config = Settings()
+        assert config.use_yourmt3_transcription is True
+    def test_yourmt3_service_health_check(self, temp_storage):
+        """Test YourMT3+ service health check endpoint."""
+        config = Settings(use_yourmt3_transcription=True)
+        pipeline = TranscriptionPipeline(
+            job_id="test_health",
+            youtube_url="https://youtube.com/test",
+            storage_path=temp_storage,
+            config=config
+        )
+        with patch('requests.get') as mock_get:
+            # Mock successful health check
+            mock_response = Mock()
+            mock_response.json.return_value = {
+                "status": "healthy",
+                "model_loaded": True,
+                "device": "mps"
+            }
+            mock_response.raise_for_status = Mock()
+            mock_get.return_value = mock_response
+            # Call transcribe_with_yourmt3 (which includes health check)
+            with patch('requests.post') as mock_post:
+                mock_post_response = Mock()
+                mock_post_response.content = b"mock midi data"
+                mock_post.return_value = mock_post_response
+                with patch('builtins.open', create=True):
+                    with patch('pathlib.Path.exists', return_value=True):
+                        # This would fail in real scenario, but we're testing health check
+                        try:
+                            pipeline.transcribe_with_yourmt3(temp_storage / "test.wav")
+                        except:
+                            pass  # Expected to fail, we just want to verify health check was called
+            # Verify health check was called
+            assert mock_get.called
+            assert "/health" in str(mock_get.call_args)
+    def test_yourmt3_transcription_success(self, temp_storage, test_audio_file, mock_yourmt3_midi):
+        """Test successful YourMT3+ transcription."""
+        config = Settings(use_yourmt3_transcription=True)
+        pipeline = TranscriptionPipeline(
+            job_id="test_success",
+            youtube_url="https://youtube.com/test",
+            storage_path=temp_storage,
+            config=config
+        )
+        with patch('requests.get') as mock_get:
+            # Mock successful health check
+            mock_health = Mock()
+            mock_health.json.return_value = {"status": "healthy", "model_loaded": True}
+            mock_health.raise_for_status = Mock()
+            mock_get.return_value = mock_health
+            with patch('requests.post') as mock_post:
+                # Mock successful transcription
+                with open(mock_yourmt3_midi, 'rb') as f:
+                    mock_midi_data = f.read()
+                mock_response = Mock()
+                mock_response.content = mock_midi_data
+                mock_post.return_value = mock_response
+                result = pipeline.transcribe_with_yourmt3(test_audio_file)
+                assert result.exists()
+                assert result.suffix == '.mid'
+                # Verify MIDI file is valid
+                mid = mido.MidiFile(result)
+                assert len(mid.tracks) > 0
+    def test_yourmt3_fallback_on_service_error(self, temp_storage, test_audio_file):
+        """Test fallback to basic-pitch when YourMT3+ service fails."""
+        config = Settings(use_yourmt3_transcription=True)
+        pipeline = TranscriptionPipeline(
+            job_id="test_fallback",
+            youtube_url="https://youtube.com/test",
+            storage_path=temp_storage,
+            config=config
+        )
+        with patch('requests.get') as mock_get:
+            # Mock health check failure
+            mock_get.side_effect = Exception("Service unavailable")
+            with patch('basic_pitch.inference.predict_and_save') as mock_bp:
+                # Mock basic-pitch creating a MIDI file
+                def create_basic_pitch_midi(*args, **kwargs):
+                    output_dir = Path(kwargs['output_directory'])
+                    audio_path = Path(kwargs['audio_path_list'][0])
+                    midi_path = output_dir / f"{audio_path.stem}_basic_pitch.mid"
+                    # Create simple MIDI
+                    mid = mido.MidiFile()
+                    track = mido.MidiTrack()
+                    mid.tracks.append(track)
+                    track.append(mido.Message('note_on', note=64, velocity=75, time=0))
+                    track.append(mido.Message('note_off', note=64, velocity=0, time=480))
+                    track.append(mido.MetaMessage('end_of_track', time=0))
+                    mid.save(str(midi_path))
+                mock_bp.side_effect = create_basic_pitch_midi
+                # This should use basic-pitch as fallback
+                result = pipeline.transcribe_to_midi(
+                    audio_path=test_audio_file
+                )
+                assert result.exists()
+                assert result.suffix == '.mid'
+                # Verify basic-pitch was called
+                assert mock_bp.called
+    def test_yourmt3_disabled_uses_basic_pitch(self, temp_storage, test_audio_file):
+        """Test that basic-pitch is used when YourMT3+ is disabled."""
+        config = Settings(use_yourmt3_transcription=False)
+        pipeline = TranscriptionPipeline(
+            job_id="test_disabled",
+            youtube_url="https://youtube.com/test",
+            storage_path=temp_storage,
+            config=config
+        )
+        with patch('basic_pitch.inference.predict_and_save') as mock_bp:
+            # Mock basic-pitch creating a MIDI file
+            def create_basic_pitch_midi(*args, **kwargs):
+                output_dir = Path(kwargs['output_directory'])
+                audio_path = Path(kwargs['audio_path_list'][0])
+                midi_path = output_dir / f"{audio_path.stem}_basic_pitch.mid"
+                # Create simple MIDI
+                mid = mido.MidiFile()
+                track = mido.MidiTrack()
+                mid.tracks.append(track)
+                track.append(mido.Message('note_on', note=65, velocity=78, time=0))
+                track.append(mido.Message('note_off', note=65, velocity=0, time=480))
+                track.append(mido.MetaMessage('end_of_track', time=0))
+                mid.save(str(midi_path))
+            mock_bp.side_effect = create_basic_pitch_midi
+            result = pipeline.transcribe_to_midi(
+                audio_path=test_audio_file
+            )
+            assert result.exists()
+            assert result.suffix == '.mid'
+            # Verify basic-pitch was called and YourMT3+ was not
+            assert mock_bp.called
+    def test_yourmt3_service_timeout(self, temp_storage, test_audio_file):
+        """Test that timeouts are handled gracefully with fallback."""
+        config = Settings(
+            use_yourmt3_transcription=True,
+            transcription_service_timeout=5
+        )
+        pipeline = TranscriptionPipeline(
+            job_id="test_timeout",
+            youtube_url="https://youtube.com/test",
+            storage_path=temp_storage,
+            config=config
+        )
+        import requests
+        with patch('requests.get') as mock_get:
+            # Mock health check success
+            mock_health = Mock()
+            mock_health.json.return_value = {"status": "healthy", "model_loaded": True}
+            mock_get.return_value = mock_health
+            with patch('requests.post') as mock_post:
+                # Mock timeout
+                mock_post.side_effect = requests.exceptions.Timeout()
+                with patch('basic_pitch.inference.predict_and_save') as mock_bp:
+                    # Mock basic-pitch creating a MIDI file
+                    def create_basic_pitch_midi(*args, **kwargs):
+                        output_dir = Path(kwargs['output_directory'])
+                        audio_path = Path(kwargs['audio_path_list'][0])
+                        midi_path = output_dir / f"{audio_path.stem}_basic_pitch.mid"
+                        # Create simple MIDI
+                        mid = mido.MidiFile()
+                        track = mido.MidiTrack()
+                        mid.tracks.append(track)
+                        track.append(mido.Message('note_on', note=66, velocity=80, time=0))
+                        track.append(mido.Message('note_off', note=66, velocity=0, time=480))
+                        track.append(mido.MetaMessage('end_of_track', time=0))
+                        mid.save(str(midi_path))
+                    mock_bp.side_effect = create_basic_pitch_midi
+                    result = pipeline.transcribe_to_midi(
+                        audio_path=test_audio_file
+                    )
+                    assert result.exists()
+                    # Verify fallback to basic-pitch
+                    assert mock_bp.called
+if __name__ == "__main__":
+    pytest.main([__file__, "-v"])

backend/ymt/amt ADDED Viewed

	@@ -0,0 +1 @@


1	+ yourmt3_core/amt

backend/ymt/yourmt3_core ADDED Viewed

	@@ -0,0 +1 @@


1	+ Subproject commit 8a4fbcabdf660f9f06bcb6f12bbf00d9b3139b98

backend/yourmt3_wrapper.py ADDED Viewed

	@@ -0,0 +1,211 @@

+"""
+YourMT3 Transcription Wrapper
+This module provides a simplified interface to the YourMT3+ model for
+music transcription. It wraps the HuggingFace Spaces implementation
+to provide a clean API for transcription services.
+Based on: https://huggingface.co/spaces/mimbres/YourMT3
+"""
+import sys
+import os
+from pathlib import Path
+from typing import Optional
+# Add paths for imports
+_base_dir = Path(__file__).parent
+sys.path.insert(0, str(_base_dir / "ymt" / "yourmt3_core"))  # For model_helper
+sys.path.insert(0, str(_base_dir / "ymt" / "yourmt3_core" / "amt" / "src"))  # For model/utils
+import torch
+import torchaudio
+class YourMT3Transcriber:
+    """
+    Wrapper class for YourMT3+ music transcription model.
+    This class handles model loading and provides a simple transcribe() method
+    for converting audio files to MIDI.
+    """
+    def __init__(
+        self,
+        model_name: str = "YPTF.MoE+Multi (noPS)",
+        device: Optional[str] = None,
+        checkpoint_dir: Optional[Path] = None
+    ):
+        """
+        Initialize the YourMT3 transcriber.
+        Args:
+            model_name: Model variant to use. Options:
+                - "YMT3+"
+                - "YPTF+Single (noPS)"
+                - "YPTF+Multi (PS)"
+                - "YPTF.MoE+Multi (noPS)" (default, best quality)
+                - "YPTF.MoE+Multi (PS)"
+            device: Device to run on ('cuda', 'cpu', or None for auto-detect)
+            checkpoint_dir: Directory containing model checkpoints
+        """
+        self.model_name = model_name
+        self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")
+        self.checkpoint_dir = checkpoint_dir or Path(__file__).parent / "ymt" / "yourmt3_core" / "logs" / "2024"
+        print(f"Initializing YourMT3+ ({model_name}) on {self.device}")
+        print(f"Checkpoint dir: {self.checkpoint_dir}")
+        # Import after path setup
+        try:
+            from model_helper import load_model_checkpoint
+            self._load_model_checkpoint = load_model_checkpoint
+        except ImportError as e:
+            raise RuntimeError(
+                f"Failed to import YourMT3 model helpers: {e}\n"
+                f"Make sure the amt/src directory is properly set up in yourmt3_core/"
+            )
+        # Load model
+        self.model = self._load_model(model_name)
+    def _get_model_args(self, model_name: str) -> list:
+        """Get command-line arguments for model loading."""
+        project = '2024'
+        # Use float16 for GPU devices (CUDA/MPS) for better performance and lower memory
+        precision = '16' if self.device in ['cuda', 'mps'] else '32'
+        if model_name == "YMT3+":
+            checkpoint = "notask_all_cross_v6_xk2_amp0811_gm_ext_plus_nops_b72@model.ckpt"
+            args = [checkpoint, '-p', project, '-pr', precision]
+        elif model_name == "YPTF+Single (noPS)":
+            checkpoint = "ptf_all_cross_rebal5_mirst_xk2_edr005_attend_c_full_plus_b100@model.ckpt"
+            args = [checkpoint, '-p', project, '-enc', 'perceiver-tf', '-ac', 'spec',
+                    '-hop', '300', '-atc', '1', '-pr', precision]
+        elif model_name == "YPTF+Multi (PS)":
+            checkpoint = "mc13_256_all_cross_v6_xk5_amp0811_edr005_attend_c_full_plus_2psn_nl26_sb_b26r_800k@model.ckpt"
+            args = [checkpoint, '-p', project, '-tk', 'mc13_full_plus_256',
+                    '-dec', 'multi-t5', '-nl', '26', '-enc', 'perceiver-tf',
+                    '-ac', 'spec', '-hop', '300', '-atc', '1', '-pr', precision]
+        elif model_name == "YPTF.MoE+Multi (noPS)":
+            checkpoint = "mc13_256_g4_all_v7_mt3f_sqr_rms_moe_wf4_n8k2_silu_rope_rp_b36_nops@last.ckpt"
+            args = [checkpoint, '-p', project, '-tk', 'mc13_full_plus_256', '-dec', 'multi-t5',
+                    '-nl', '26', '-enc', 'perceiver-tf', '-sqr', '1', '-ff', 'moe',
+                    '-wf', '4', '-nmoe', '8', '-kmoe', '2', '-act', 'silu', '-epe', 'rope',
+                    '-rp', '1', '-ac', 'spec', '-hop', '300', '-atc', '1', '-pr', precision]
+        elif model_name == "YPTF.MoE+Multi (PS)":
+            checkpoint = "mc13_256_g4_all_v7_mt3f_sqr_rms_moe_wf4_n8k2_silu_rope_rp_b80_ps2@model.ckpt"
+            args = [checkpoint, '-p', project, '-tk', 'mc13_full_plus_256', '-dec', 'multi-t5',
+                    '-nl', '26', '-enc', 'perceiver-tf', '-sqr', '1', '-ff', 'moe',
+                    '-wf', '4', '-nmoe', '8', '-kmoe', '2', '-act', 'silu', '-epe', 'rope',
+                    '-rp', '1', '-ac', 'spec', '-hop', '300', '-atc', '1', '-pr', precision]
+        else:
+            raise ValueError(f"Unknown model name: {model_name}")
+        return args
+    def _load_model(self, model_name: str):
+        """Load the YourMT3 model checkpoint."""
+        args = self._get_model_args(model_name)
+        print(f"Loading model with args: {' '.join(args)}")
+        # YourMT3 expects to be run from amt/src directory for checkpoint paths
+        # Save current directory and temporarily change to amt/src
+        original_cwd = os.getcwd()
+        amt_src_dir = _base_dir / "ymt" / "yourmt3_core" / "amt" / "src"
+        try:
+            os.chdir(str(amt_src_dir))
+            # Load on CPU first, then move to target device
+            model = self._load_model_checkpoint(args=args, device="cpu")
+            model.to(self.device)
+            model.eval()
+        finally:
+            # Always restore original directory
+            os.chdir(original_cwd)
+        # Enable optimizations for inference
+        if hasattr(torch, 'set_float32_matmul_precision'):
+            torch.set_float32_matmul_precision('high')  # Use TF32 on Ampere GPUs
+        # Disable gradient computation for inference (reduces memory)
+        for param in model.parameters():
+            param.requires_grad = False
+        print(f"Model loaded successfully on {self.device}")
+        return model
+    def transcribe_audio(self, audio_path: Path, output_dir: Optional[Path] = None) -> Path:
+        """
+        Transcribe an audio file to MIDI.
+        Args:
+            audio_path: Path to input audio file (WAV, MP3, etc.)
+            output_dir: Directory to save MIDI output (default: current directory)
+        Returns:
+            Path to the generated MIDI file
+        Raises:
+            FileNotFoundError: If audio_path doesn't exist
+            RuntimeError: If transcription fails
+        """
+        audio_path = Path(audio_path)
+        if not audio_path.exists():
+            raise FileNotFoundError(f"Audio file not found: {audio_path}")
+        output_dir = Path(output_dir) if output_dir else Path("./")
+        output_dir.mkdir(parents=True, exist_ok=True)
+        print(f"Transcribing: {audio_path}")
+        try:
+            # Import transcribe function
+            from model_helper import transcribe
+            # Prepare audio info dict (as expected by transcribe function)
+            audio_info = {
+                'filepath': str(audio_path),
+                'track_name': audio_path.stem
+            }
+            # Run transcription
+            midi_path = transcribe(self.model, audio_info)
+            midi_path = Path(midi_path)
+            # Move to output directory if needed
+            if midi_path.parent != output_dir:
+                final_path = output_dir / midi_path.name
+                midi_path.rename(final_path)
+                midi_path = final_path
+            print(f"Transcription complete: {midi_path}")
+            return midi_path
+        except Exception as e:
+            raise RuntimeError(f"Transcription failed: {e}")
+if __name__ == "__main__":
+    # Test the transcriber
+    import argparse
+    parser = argparse.ArgumentParser(description="Test YourMT3 Transcriber")
+    parser.add_argument("audio_file", type=str, help="Path to audio file")
+    parser.add_argument("--model", type=str, default="YPTF.MoE+Multi (noPS)",
+                       help="Model variant to use")
+    parser.add_argument("--output", type=str, default="./output",
+                       help="Output directory for MIDI files")
+    args = parser.parse_args()
+    # Initialize transcriber
+    transcriber = YourMT3Transcriber(model_name=args.model)
+    # Transcribe audio
+    midi_path = transcriber.transcribe_audio(
+        audio_path=Path(args.audio_file),
+        output_dir=Path(args.output)
+    )
+    print(f"MIDI saved to: {midi_path}")

docker-compose.yml CHANGED Viewed

@@ -27,6 +27,7 @@ services:
       - API_HOST=0.0.0.0
       - API_PORT=8000
       - CORS_ORIGINS=http://localhost:5173,http://localhost:3000
     volumes:
       - ./backend:/app
       - ./storage:/app/storage

       - API_HOST=0.0.0.0
       - API_PORT=8000
       - CORS_ORIGINS=http://localhost:5173,http://localhost:3000
+      - YOURMT3_DEVICE=cpu
     volumes:
       - ./backend:/app
       - ./storage:/app/storage

docs/architecture/tech-stack.md CHANGED Viewed

@@ -176,27 +176,36 @@ This document details the technology choices for Rescored, including alternative
 ---
-### Transcription: basic-pitch
-**Chosen**: basic-pitch (Spotify)
-**Why**:
 - Polyphonic transcription (multiple notes at once)
-- Trained on large dataset (30k+ songs)
-- Open-source, Apache 2.0 license
-- Outputs MIDI with note velocities
-- Actively maintained by Spotify
 **Alternatives Considered**:
 | Option | Pros | Cons | Why Not Chosen |
 |--------|------|------|----------------|
-| MT3 (Music Transformer) | Google's latest, multi-instrument aware | Slower, larger model, harder to run | basic-pitch faster for MVP |
-| Omnizart | Multi-instrument, good documentation | More complex setup, slower | basic-pitch simpler |
 | Tony (pYIN) | Excellent for monophonic | Only monophonic | Need polyphonic support |
-| commercial APIs | Better quality | Expensive, privacy | Local processing preferred |
-**Decision**: basic-pitch is the best open-source polyphonic transcription model.
 ---

 ---
+### Transcription: YourMT3+ (Primary) + basic-pitch (Fallback)
+**Chosen**: YourMT3+ (KAIST) with automatic fallback to basic-pitch (Spotify)
+**Why YourMT3+**:
+- **80-85% accuracy** vs 70% for basic-pitch
+- State-of-the-art multi-instrument transcription model
+- Mixture of Experts architecture for better quality
+- Perceiver-TF encoder with RoPE position encoding
+- Trained on diverse datasets (30k+ songs, 13 instrument classes)
+- Open-source, actively maintained
+- Optimized for Apple Silicon (MPS) with float16 precision (14x speedup)
+**Why basic-pitch as Fallback**:
 - Polyphonic transcription (multiple notes at once)
+- Lighter weight, faster inference
+- Simple setup, no model download required
+- Good baseline quality (70% accuracy)
+- Automatically used if YourMT3+ unavailable
 **Alternatives Considered**:
 | Option | Pros | Cons | Why Not Chosen |
 |--------|------|------|----------------|
+| MT3 (Music Transformer) | Google's latest, multi-instrument aware | Slower, larger model, harder to run | YourMT3+ more accurate |
+| Omnizart | Multi-instrument, good documentation | Lower accuracy than YourMT3+, slower | Removed in favor of YourMT3+ |
 | Tony (pYIN) | Excellent for monophonic | Only monophonic | Need polyphonic support |
+| commercial APIs | Better quality | Expensive, privacy concerns | Local processing preferred |
+**Decision**: YourMT3+ offers the best accuracy for self-hosted solution with intelligent fallback to basic-pitch for reliability.
 ---

docs/backend/pipeline.md CHANGED Viewed

@@ -31,12 +31,15 @@ graph TB
     end
     subgraph Stage3["Stage 3: Transcription (50-90%)"]
-        ForEach["For Each<br/>Stem"]
-        BasicPitch["basic-pitch<br/>Inference"]
         Quantize["Quantize<br/>& Clean<br/>MIDI"]
-        MIDI["drums.mid, bass.mid,<br/>vocals.mid, other.mid"]
-        ForEach --> BasicPitch
         BasicPitch --> Quantize
         Quantize --> MIDI
     end
@@ -298,83 +301,130 @@ class DemucsProcessor:
 ## Stage 3: Transcription (Audio → MIDI)
-### 3.1 basic-pitch Inference
-**Purpose**: Convert each audio stem to MIDI notes (pitch, timing, duration).
-**Why Per-Stem Transcription?**
-- Isolated instruments are easier for the model to detect
-- Reduces polyphonic complexity (fewer simultaneous notes)
-- Better note onset detection
 **Implementation**:
 ```python
-from basic_pitch.inference import predict
-from basic_pitch import ICASSP_2022_MODEL_PATH
-import numpy as np
 from pathlib import Path
-from mido import MidiFile, MidiTrack, Message
-class BasicPitchTranscriber:
-    def __init__(self):
-        # Model is auto-loaded by basic-pitch
-        pass
-    def transcribe_stem(self, audio_path: Path, output_path: Path) -> Path:
         """
-        Transcribe audio to MIDI using basic-pitch.
         Returns:
             Path to output MIDI file
         """
-        # Run inference
-        model_output, midi_data, note_events = predict(
-            audio_path=str(audio_path),
-            model_or_model_path=ICASSP_2022_MODEL_PATH,
-            onset_threshold=0.5,      # Note onset confidence threshold
-            frame_threshold=0.3,      # Frame activation threshold
-            minimum_note_length=127,  # ~58ms at 44.1kHz (filter very short notes)
-            minimum_frequency=None,   # No frequency limits
-            maximum_frequency=None,
-            multiple_pitch_bends=False,  # Simpler MIDI output
-            melodia_trick=True,       # Improves melody extraction
         )
         # Save MIDI
-        midi_data.write(str(output_path))
-        # Post-process MIDI (quantization, cleanup)
-        cleaned_midi = self.clean_midi(output_path)
-        return cleaned_midi
-    def clean_midi(self, midi_path: Path) -> Path:
-        """
-        Quantize notes to nearest 16th note, remove duplicates.
-        """
-        mid = MidiFile(midi_path)
-        # Quantize to 16th note grid (480 ticks per quarter note)
-        ticks_per_16th = mid.ticks_per_beat // 4
-        for track in mid.tracks:
-            time = 0
-            for msg in track:
-                time += msg.time
-                if msg.type in ['note_on', 'note_off']:
-                    # Quantize timing to nearest 16th
-                    quantized_time = round(time / ticks_per_16th) * ticks_per_16th
-                    msg.time = quantized_time - time
-                    time = quantized_time
-        # Save cleaned MIDI
-        cleaned_path = midi_path.with_stem(f"{midi_path.stem}_clean")
-        mid.save(cleaned_path)
-        return cleaned_path
 ```
 **Parameters** (Tempo-Adaptive):
 - **onset_threshold**: Note onset confidence threshold
   - Fast tempo (>140 BPM): 0.50 (stricter - fewer false positives)
@@ -387,9 +437,11 @@ class BasicPitchTranscriber:
   - Slow: More permissive for soft dynamics
 - **melodia_trick** (True): Improves monophonic melody detection
-**Post-Processing Pipeline**:
-After basic-pitch generates raw MIDI, several post-processing steps clean up common artifacts:
 1. **clean_midi()** - Filters and quantizes notes
    - Removes notes outside piano range (A0-C8)
@@ -399,7 +451,7 @@ After basic-pitch generates raw MIDI, several post-processing steps clean up com
 2. **merge_consecutive_notes()** - Fixes choppy sustained phrases
    - Merges notes of same pitch with small gaps (<150ms default)
-   - Addresses basic-pitch's tendency to split sustained notes
 3. **analyze_note_envelope_and_merge_sustains()** - **NEW: Removes ghost notes**
    - Detects false onsets from sustained note decay

     end
     subgraph Stage3["Stage 3: Transcription (50-90%)"]
+        Health["Check<br/>YourMT3+<br/>Health"]
+        YMT3["YourMT3+<br/>Inference<br/>(Primary)"]
+        BasicPitch["basic-pitch<br/>Inference<br/>(Fallback)"]
         Quantize["Quantize<br/>& Clean<br/>MIDI"]
+        MIDI["piano.mid"]
+        Health -->|Healthy| YMT3
+        Health -->|Unavailable| BasicPitch
+        YMT3 --> Quantize
         BasicPitch --> Quantize
         Quantize --> MIDI
     end
 ## Stage 3: Transcription (Audio → MIDI)
+**Current System**: YourMT3+ (Primary) with automatic fallback to basic-pitch
+### 3.1 YourMT3+ Inference (Primary, 80-85% Accuracy)
+**Purpose**: Convert audio stem to high-quality MIDI notes using state-of-the-art model.
+**Why YourMT3+?**
+- **80-85% note accuracy** (vs 70% for basic-pitch)
+- Multi-instrument awareness (13 instrument classes)
+- Better rhythm and onset detection
+- Mixture of Experts architecture for quality
+- Perceiver-TF encoder with RoPE position encoding
+**Health Check Flow**:
+1. Check YourMT3+ service health at `/api/v1/yourmt3/health`
+2. If healthy and model loaded → Use YourMT3+
+3. If unavailable/unhealthy → Automatic fallback to basic-pitch
 **Implementation**:
 ```python
+import requests
 from pathlib import Path
+class TranscriptionPipeline:
+    def __init__(self, job_id, youtube_url, storage_path, config):
+        self.config = config  # Has use_yourmt3_transcription flag
+        self.service_url = config.transcription_service_url  # http://localhost:8000
+    def transcribe_to_midi(self, audio_path: Path) -> Path:
         """
+        Transcribe audio to MIDI using YourMT3+ with automatic fallback.
         Returns:
             Path to output MIDI file
         """
+        midi_path = None
+        # Try YourMT3+ first (if enabled)
+        if self.config.use_yourmt3_transcription:
+            try:
+                print("Transcribing with YourMT3+ (primary)...")
+                midi_path = self.transcribe_with_yourmt3(audio_path)
+                print("✓ YourMT3+ transcription complete")
+            except Exception as e:
+                print(f"⚠ YourMT3+ failed: {e}")
+                print("→ Falling back to basic-pitch")
+                midi_path = None
+        # Fallback to basic-pitch if YourMT3+ failed or disabled
+        if midi_path is None:
+            print("Transcribing with basic-pitch (fallback)...")
+            midi_path = self.transcribe_with_basic_pitch(audio_path)
+            print("✓ basic-pitch transcription complete")
+        return midi_path
+    def transcribe_with_yourmt3(self, audio_path: Path) -> Path:
+        """Call YourMT3+ service via HTTP."""
+        # Health check
+        health_response = requests.get(
+            f"{self.service_url}/api/v1/yourmt3/health",
+            timeout=5
         )
+        health_data = health_response.json()
+        if not health_data.get("model_loaded"):
+            raise RuntimeError("YourMT3+ model not loaded")
+        # Transcribe
+        with open(audio_path, 'rb') as f:
+            files = {'file': (audio_path.name, f, 'audio/wav')}
+            response = requests.post(
+                f"{self.service_url}/api/v1/yourmt3/transcribe",
+                files=files,
+                timeout=self.config.transcription_service_timeout
+            )
         # Save MIDI
+        midi_path = self.temp_dir / "piano_yourmt3.mid"
+        with open(midi_path, 'wb') as f:
+            f.write(response.content)
+        return midi_path
+    def transcribe_with_basic_pitch(self, audio_path: Path) -> Path:
+        """Fallback transcription using basic-pitch."""
+        from basic_pitch.inference import predict_and_save
+        from basic_pitch import ICASSP_2022_MODEL_PATH
+        predict_and_save(
+            audio_path_list=[str(audio_path)],
+            output_directory=str(self.temp_dir),
+            save_midi=True,
+            model_or_model_path=ICASSP_2022_MODEL_PATH,
+            onset_threshold=0.3,
+            frame_threshold=0.3,
+        )
+        generated_midi = self.temp_dir / f"{audio_path.stem}_basic_pitch.mid"
+        return generated_midi
 ```
+**YourMT3+ Features**:
+- Integrated into main backend (port 8000)
+- Model loaded on startup (reduces per-request latency)
+- Float16 precision for MPS (14x speedup on Apple Silicon)
+- ~30-40s processing time for 3.5min audio
+- Automatic health monitoring
+---
+### 3.2 basic-pitch Inference (Fallback, 70% Accuracy)
+**Purpose**: Lightweight fallback transcription when YourMT3+ unavailable.
+**When Used**:
+- YourMT3+ service health check fails
+- YourMT3+ model not loaded
+- YourMT3+ request times out
+- `use_yourmt3_transcription=False` in config
+**Implementation**: See `transcribe_with_basic_pitch()` above
 **Parameters** (Tempo-Adaptive):
 - **onset_threshold**: Note onset confidence threshold
   - Fast tempo (>140 BPM): 0.50 (stricter - fewer false positives)
   - Slow: More permissive for soft dynamics
 - **melodia_trick** (True): Improves monophonic melody detection
+---
+### 3.3 Post-Processing Pipeline
+After either YourMT3+ or basic-pitch generates raw MIDI, several post-processing steps clean up common artifacts:
 1. **clean_midi()** - Filters and quantizes notes
    - Removes notes outside piano range (A0-C8)
 2. **merge_consecutive_notes()** - Fixes choppy sustained phrases
    - Merges notes of same pitch with small gaps (<150ms default)
+   - Addresses transcription models' tendency to split sustained notes
 3. **analyze_note_envelope_and_merge_sustains()** - **NEW: Removes ghost notes**
    - Detects false onsets from sustained note decay

docs/getting-started.md CHANGED Viewed

@@ -49,7 +49,7 @@ This documentation focuses on **high-level architecture and design decisions**,
 2. [Audio Processing Pipeline](backend/pipeline.md) - Detailed workflow
 3. [Background Workers](backend/workers.md) - Celery setup
 4. [API Design](backend/api.md) - REST + WebSocket endpoints
-5. [ML Model Selection](research/ml-models.md) - Demucs & basic-pitch
 6. [Challenges](research/challenges.md) - Known limitations
 **Key Files to Create**:
@@ -107,9 +107,10 @@ This documentation focuses on **high-level architecture and design decisions**,
 4. [ML Model Selection](research/ml-models.md) - Accuracy expectations
 **Key Insights**:
-- Transcription is ~70-80% accurate, users **must** edit output
 - Processing takes 1-2 minutes (GPU) or 10-15 minutes (CPU)
-- Editor is **critical** - make it fast and intuitive
 - MVP focuses on piano only, multi-instrument in Phase 2
 ---
@@ -132,7 +133,7 @@ See [Glossary](glossary.md) for more terms.
 **Frontend**: React + VexFlow (notation) + Tone.js (playback)
 **Backend**: Python/FastAPI + Celery (workers) + Redis (queue)
-**ML**: Demucs (source separation) + basic-pitch (transcription)
 **Formats**: MusicXML (primary), MIDI (intermediate)
 ---
@@ -198,7 +199,7 @@ docker-compose up
 ### Q: How accurate is transcription?
-**A**: 70-80% for simple piano, 60-70% for complex music. See [ML Models](research/ml-models.md) and [Challenges](research/challenges.md).
 ### Q: Can I deploy this to production?

 2. [Audio Processing Pipeline](backend/pipeline.md) - Detailed workflow
 3. [Background Workers](backend/workers.md) - Celery setup
 4. [API Design](backend/api.md) - REST + WebSocket endpoints
+5. [ML Model Selection](research/ml-models.md) - Demucs, YourMT3+, basic-pitch
 6. [Challenges](research/challenges.md) - Known limitations
 **Key Files to Create**:
 4. [ML Model Selection](research/ml-models.md) - Accuracy expectations
 **Key Insights**:
+- Transcription is ~80-85% accurate with YourMT3+, ~70% with basic-pitch fallback
+- Users **must** edit output - editor is **critical**
 - Processing takes 1-2 minutes (GPU) or 10-15 minutes (CPU)
+- YourMT3+ optimized for Apple Silicon (MPS) with 14x speedup via float16
 - MVP focuses on piano only, multi-instrument in Phase 2
 ---
 **Frontend**: React + VexFlow (notation) + Tone.js (playback)
 **Backend**: Python/FastAPI + Celery (workers) + Redis (queue)
+**ML**: Demucs (source separation) + YourMT3+ (primary transcription, 80-85% accuracy) + basic-pitch (fallback, 70% accuracy)
 **Formats**: MusicXML (primary), MIDI (intermediate)
 ---
 ### Q: How accurate is transcription?
+**A**: 80-85% for simple piano with YourMT3+ (70-75% for complex music). Falls back to basic-pitch (70% simple, 60-70% complex) if YourMT3+ unavailable. See [ML Models](research/ml-models.md) and [Challenges](research/challenges.md).
 ### Q: Can I deploy this to production?

docs/research/ml-models.md CHANGED Viewed

@@ -84,12 +84,46 @@
 ## Transcription Models
-### basic-pitch (Chosen)
-**Developer**: Spotify;
-**License**: Apache 2.0;
-**Model Size**: ~30MB;
-**Performance**: Good polyphonic transcription
 **Pros**:
 - Handles polyphonic music (multiple simultaneous notes)
@@ -97,55 +131,42 @@
 - Outputs MIDI with velocities
 - Fast (~5-10s per stem)
 - Active maintenance
 **Cons**:
-- Not perfect (~70-80% note accuracy)
 - Rhythm quantization can be off
 - Struggles with very dense polyphony
-**When to Use**: MVP and production (best open-source option)
 ---
-### MT3 (Music Transformer) - Alternative
-**Developer**: Google Magenta;
-**License**: Apache 2.0;
-**Model Size**: ~500MB;
-**Performance**: Better than basic-pitch on benchmarks
-**Pros**:
-- Multi-instrument aware (trained on full mixes)
-- Handles multiple instruments simultaneously
-- Better rhythm accuracy
-**Cons**:
-- Much slower (~30-60s per song)
-- Larger model
-- More complex setup (Transformer architecture)
-- Higher computational requirements
-**When to Use**: Future enhancement if quality > speed
 ---
-### Omnizart (Alternative)
-**Developer**: MCTLab (Taiwan);
-**License**: MIT;
-**Performance**: Specialized models per instrument
-**Pros**:
-- Separate models for piano, guitar, drums, vocals
-- Good single-instrument accuracy
-- Academic backing
-**Cons**:
-- Need to run different models for each instrument
-- Slower overall
 - Less active development
-**When to Use**: If targeting specific instruments only
 ---
@@ -169,58 +190,58 @@
 ### Comparison
-| Model | Polyphonic | Speed (GPU) | Accuracy | Use Case |
-|-------|-----------|-------------|----------|----------|
-| basic-pitch | Yes | 5-10s | 70-80% | General-purpose (chosen) |
-| MT3 | Yes | 30-60s | 80-85% | High-quality (future) |
-| Omnizart | Yes | 15-30s | 75-80% | Instrument-specific |
 | Tony | No | 2-5s | 90%+ | Vocals only |
-**Decision**: Use basic-pitch for MVP. Consider MT3 for Phase 3 if users demand better quality.
 ---
 ## Model Accuracy Expectations
-### Realistic Transcription Accuracy
 **Simple Piano Melody** (Twinkle Twinkle):
-- Note accuracy: 90-95%
-- Rhythm accuracy: 80-85%
 **Classical Piano** (Chopin Nocturne):
-- Note accuracy: 70-80%
-- Rhythm accuracy: 60-70%
 **Jazz Piano** (Bill Evans):
-- Note accuracy: 60-70% (complex chords)
-- Rhythm accuracy: 50-60% (swing feel)
 **Rock/Pop with Band**:
-- Piano separation: 70-80% (depends on mix)
-- Note accuracy: 60-70%
-**Key Insight**: Transcription won't be perfect. Editor is **critical** for users to fix errors.
 ---
 ## Future Model Improvements
-### Fine-Tuning
-Train basic-pitch on piano-specific dataset:
-- Collect 1000+ piano YouTube videos
-- Manually correct transcriptions
-- Fine-tune model
-- Expected improvement: +5-10% accuracy
-### Ensemble Models
-Combine multiple models:
-- Run basic-pitch + MT3
-- Merge results using voting or confidence scores
-- Expected improvement: +3-5% accuracy
-- Cost: 2-3x processing time
 ### Post-Processing

 ## Transcription Models
+### YourMT3+ (Primary)
+**Developer**: KAIST (Korea Advanced Institute of Science and Technology)
+**License**: Apache 2.0
+**Model Size**: ~536MB (YPTF.MoE+Multi checkpoint)
+**Performance**: **State-of-the-art** multi-instrument transcription
+**Architecture**:
+- Perceiver-TF encoder with Rotary Position Embeddings (RoPE)
+- Mixture of Experts (MoE) feedforward layers (8 experts, top-2)
+- Multi-channel T5 decoder for 13 instrument classes
+- Float16 precision for GPU optimization
+**Pros**:
+- **80-85% note accuracy** (vs 70% for basic-pitch)
+- Multi-instrument aware (13 instrument classes)
+- Handles complex polyphony
+- Active development (2024)
+- Open-source, well-documented
+- Optimized for Apple Silicon MPS (14x speedup with float16)
+- Good rhythm and onset detection
+**Cons**:
+- Large model size (~536MB download)
+- Requires additional setup (model checkpoint download)
+- Slower than basic-pitch (~30-40s per song on GPU)
+- Higher memory requirements (~1.1GB VRAM)
+**When to Use**: **Production (primary transcriber)** - Best quality for self-hosted solution
+**Current Status**: Integrated into main backend, enabled by default with automatic fallback
+---
+### basic-pitch (Fallback)
+**Developer**: Spotify
+**License**: Apache 2.0
+**Model Size**: ~30MB
+**Performance**: Good polyphonic transcription (70% accuracy)
 **Pros**:
 - Handles polyphonic music (multiple simultaneous notes)
 - Outputs MIDI with velocities
 - Fast (~5-10s per stem)
 - Active maintenance
+- Lightweight, no setup required
 **Cons**:
+- Lower accuracy than YourMT3+ (~70% vs 80-85%)
 - Rhythm quantization can be off
 - Struggles with very dense polyphony
+**When to Use**: **Automatic fallback** when YourMT3+ unavailable or disabled
 ---
+### MT3 (Music Transformer) - Not Used
+**Developer**: Google Magenta
+**License**: Apache 2.0
+**Model Size**: ~500MB
+**Performance**: Good, but surpassed by YourMT3+
+**Why Not Chosen**:
+- YourMT3+ offers better accuracy
+- Similar computational requirements
+- YourMT3+ has better documentation and setup
 ---
+### Omnizart - Removed
+**Developer**: MCTLab (Taiwan)
+**License**: MIT
+**Status**: **Removed from codebase** (replaced by YourMT3+)
+**Why Removed**:
+- Lower accuracy than YourMT3+ (75-80% vs 80-85%)
+- More complex setup with multiple models
 - Less active development
+- Dual-transcription merging added complexity without accuracy gains
 ---
 ### Comparison
+| Model | Polyphonic | Speed (GPU) | Accuracy | Status |
+|-------|-----------|-------------|----------|--------|
+| **YourMT3+** | Yes | 30-40s | **80-85%** | **Primary (Production)** |
+| basic-pitch | Yes | 5-10s | 70% | Fallback |
+| MT3 | Yes | 30-60s | 75-80% | Not used |
+| Omnizart | Yes | 15-30s | 75-80% | Removed |
 | Tony | No | 2-5s | 90%+ | Vocals only |
+**Decision**: YourMT3+ as primary transcriber with automatic fallback to basic-pitch for reliability.
 ---
 ## Model Accuracy Expectations
+### Realistic Transcription Accuracy (with YourMT3+)
 **Simple Piano Melody** (Twinkle Twinkle):
+- Note accuracy: **90-95%** (YourMT3+) / 85-90% (basic-pitch)
+- Rhythm accuracy: **85-90%** (YourMT3+) / 75-80% (basic-pitch)
 **Classical Piano** (Chopin Nocturne):
+- Note accuracy: **75-85%** (YourMT3+) / 65-75% (basic-pitch)
+- Rhythm accuracy: **70-75%** (YourMT3+) / 55-65% (basic-pitch)
 **Jazz Piano** (Bill Evans):
+- Note accuracy: **70-75%** (YourMT3+) / 55-65% (basic-pitch)
+- Rhythm accuracy: **60-70%** (YourMT3+) / 45-55% (basic-pitch)
 **Rock/Pop with Band**:
+- Piano separation: 70-80% (depends on Demucs quality)
+- Note accuracy: **70-75%** (YourMT3+) / 55-65% (basic-pitch)
+**Key Insight**: YourMT3+ provides 10-15% better accuracy than basic-pitch, but transcription still won't be perfect. Editor is **critical** for users to fix errors.
 ---
 ## Future Model Improvements
+### Fine-Tuning YourMT3+
+Train on piano-specific dataset:
+- Collect 1000+ piano YouTube videos with ground truth
+- Fine-tune YourMT3+ checkpoint on piano-only data
+- Expected improvement: +3-5% accuracy for piano
+- Cost: GPU compute for training
+### Ensemble Models (Not Currently Used)
+Previously attempted basic-pitch + omnizart merging:
+- **Result**: Removed due to complexity without significant accuracy gain
+- **Learning**: YourMT3+ alone provides better results than merged basic-pitch + omnizart
+- **Future**: Could revisit with YourMT3+ + MT3 ensemble if needed
 ### Post-Processing

docs/testing/backend-testing.md DELETED Viewed

@@ -1,520 +0,0 @@
-# Backend Testing Guide
-Comprehensive guide for testing the Rescored backend.
-## Table of Contents
-- [Setup](#setup)
-- [Running Tests](#running-tests)
-- [Test Structure](#test-structure)
-- [Writing Tests](#writing-tests)
-- [Testing Patterns](#testing-patterns)
-- [Troubleshooting](#troubleshooting)
-## Setup
-### Install Test Dependencies
-```bash
-cd backend
-pip install -r requirements-test.txt
-```
-This installs:
-- `pytest`: Test framework
-- `pytest-asyncio`: Async test support
-- `pytest-cov`: Coverage reporting
-- `pytest-mock`: Enhanced mocking
-- `httpx`: HTTP testing client
-### Configuration
-Test configuration is in `pytest.ini`:
-```ini
-[pytest]
-testpaths = tests
-markers =
-    unit: Unit tests
-    integration: Integration tests
-    slow: Slow-running tests
-    gpu: Tests requiring GPU
-    network: Tests requiring network
-```
-## Running Tests
-### Basic Commands
-```bash
-# Run all tests
-pytest
-# Run with coverage
-pytest --cov
-# Run specific file
-pytest tests/test_utils.py
-# Run specific test
-pytest tests/test_utils.py::TestValidateYouTubeURL::test_valid_watch_url
-# Run by marker
-pytest -m unit
-pytest -m "unit and not slow"
-```
-### Watch Mode
-Use `pytest-watch` for continuous testing:
-```bash
-pip install pytest-watch
-ptw  # Runs tests on file changes
-```
-### Coverage Reports
-```bash
-# Terminal report
-pytest --cov --cov-report=term-missing
-# HTML report
-pytest --cov --cov-report=html
-open htmlcov/index.html
-# Both
-pytest --cov --cov-report=term-missing --cov-report=html
-```
-## Test Structure
-### Test Files
-Each module has a corresponding test file:
-- `utils.py` → `tests/test_utils.py`
-- `pipeline.py` → `tests/test_pipeline.py`
-- `main.py` → `tests/test_api.py`
-- `tasks.py` → `tests/test_tasks.py`
-### Test Organization
-Group related tests in classes:
-```python
-class TestValidateYouTubeURL:
-    """Test YouTube URL validation."""
-    def test_valid_watch_url(self):
-        """Test standard youtube.com/watch URL."""
-        is_valid, video_id = validate_youtube_url("https://www.youtube.com/watch?v=...")
-        assert is_valid is True
-        assert video_id == "..."
-    def test_invalid_domain(self):
-        """Test URL from wrong domain."""
-        is_valid, error = validate_youtube_url("https://vimeo.com/12345")
-        assert is_valid is False
-```
-## Writing Tests
-### Basic Test Template
-```python
-import pytest
-from module_name import function_to_test
-class TestFunctionName:
-    """Test suite for function_name."""
-    def test_happy_path(self):
-        """Test normal successful execution."""
-        result = function_to_test(valid_input)
-        assert result == expected_output
-    def test_edge_case(self):
-        """Test boundary condition."""
-        result = function_to_test(edge_case_input)
-        assert result == expected_edge_output
-    def test_error_handling(self):
-        """Test error is raised for invalid input."""
-        with pytest.raises(ValueError) as exc_info:
-            function_to_test(invalid_input)
-        assert "expected error message" in str(exc_info.value)
-```
-### Using Fixtures
-Fixtures provide reusable test data:
-```python
-@pytest.fixture
-def sample_audio_file(temp_storage_dir):
-    """Create a sample WAV file for testing."""
-    import numpy as np
-    import soundfile as sf
-    sample_rate = 44100
-    duration = 1.0
-    samples = np.zeros(int(sample_rate * duration), dtype=np.float32)
-    audio_path = temp_storage_dir / "test_audio.wav"
-    sf.write(str(audio_path), samples, sample_rate)
-    return audio_path
-def test_using_fixture(sample_audio_file):
-    """Test that uses the fixture."""
-    assert sample_audio_file.exists()
-    assert sample_audio_file.suffix == ".wav"
-```
-### Mocking External Dependencies
-#### Mock yt-dlp
-```python
-from unittest.mock import patch, MagicMock
-@patch('pipeline.yt_dlp.YoutubeDL')
-def test_download_audio(mock_ydl_class, temp_storage_dir):
-    """Test audio download with mocked yt-dlp."""
-    mock_ydl = MagicMock()
-    mock_ydl_class.return_value.__enter__.return_value = mock_ydl
-    result = download_audio("https://youtube.com/watch?v=...", temp_storage_dir)
-    assert result.exists()
-    mock_ydl.download.assert_called_once()
-```
-#### Mock Redis
-```python
-@pytest.fixture
-def mock_redis():
-    """Mock Redis client."""
-    redis_mock = MagicMock(spec=Redis)
-    redis_mock.ping.return_value = True
-    redis_mock.hgetall.return_value = {}
-    return redis_mock
-def test_with_redis(mock_redis):
-    """Test function that uses Redis."""
-    # Redis is mocked, no real connection needed
-    mock_redis.hset("key", "field", "value")
-    assert mock_redis.hset.called
-```
-#### Mock ML Models
-```python
-@patch('pipeline.basic_pitch.inference.predict')
-def test_transcribe_audio(mock_predict, sample_audio_file, temp_storage_dir):
-    """Test transcription with mocked ML model."""
-    # Mock model output
-    mock_predict.return_value = (
-        np.zeros((100, 88)),  # note activations
-        np.zeros((100, 88)),  # onsets
-        np.zeros((100, 1))    # contours
-    )
-    result = transcribe_audio(sample_audio_file, temp_storage_dir)
-    assert result.exists()
-    assert result.suffix == ".mid"
-```
-## Testing Patterns
-### Testing API Endpoints
-```python
-from fastapi.testclient import TestClient
-def test_submit_transcription(test_client, mock_redis):
-    """Test transcription submission endpoint."""
-    response = test_client.post(
-        "/api/v1/transcribe",
-        json={"youtube_url": "https://www.youtube.com/watch?v=..."}
-    )
-    assert response.status_code == 201
-    data = response.json()
-    assert "job_id" in data
-    assert data["status"] == "queued"
-```
-### Testing Async Functions
-```python
-import pytest
-@pytest.mark.asyncio
-async def test_async_function():
-    """Test async function."""
-    result = await async_operation()
-    assert result == expected_value
-```
-### Testing WebSocket Connections
-```python
-def test_websocket(test_client, sample_job_id):
-    """Test WebSocket connection."""
-    with test_client.websocket_connect(f"/api/v1/jobs/{sample_job_id}/stream") as websocket:
-        data = websocket.receive_json()
-        assert data["type"] == "progress"
-        assert "job_id" in data
-```
-### Testing Error Scenarios
-```python
-def test_video_too_long(test_client):
-    """Test error handling for videos exceeding duration limit."""
-    with patch('utils.check_video_availability') as mock_check:
-        mock_check.return_value = {
-            'available': False,
-            'reason': 'Video too long (max 15 minutes)'
-        }
-        response = test_client.post(
-            "/api/v1/transcribe",
-            json={"youtube_url": "https://www.youtube.com/watch?v=long"}
-        )
-        assert response.status_code == 422
-        assert "too long" in response.json()["detail"]
-```
-### Testing Retries
-```python
-def test_retry_on_network_error():
-    """Test that function retries on network error."""
-    mock_func = MagicMock()
-    mock_func.side_effect = [
-        ConnectionError("Network timeout"),  # First call fails
-        ConnectionError("Network timeout"),  # Second call fails
-        {"success": True}                     # Third call succeeds
-    ]
-    # Function should retry and eventually succeed
-    result = function_with_retry(mock_func)
-    assert result == {"success": True}
-    assert mock_func.call_count == 3
-```
-### Parametrized Tests
-Test multiple inputs efficiently:
-```python
-@pytest.mark.parametrize("url,expected_valid,expected_id", [
-    ("https://www.youtube.com/watch?v=dQw4w9WgXcQ", True, "dQw4w9WgXcQ"),
-    ("https://youtu.be/dQw4w9WgXcQ", True, "dQw4w9WgXcQ"),
-    ("https://vimeo.com/12345", False, None),
-    ("not-a-url", False, None),
-])
-def test_url_validation(url, expected_valid, expected_id):
-    """Test URL validation with multiple inputs."""
-    is_valid, result = validate_youtube_url(url)
-    assert is_valid == expected_valid
-    if expected_valid:
-        assert result == expected_id
-```
-## Testing Pipeline Stages
-### Audio Download
-```python
-@patch('pipeline.yt_dlp.YoutubeDL')
-def test_download_audio_success(mock_ydl_class, temp_storage_dir):
-    """Test successful audio download."""
-    mock_ydl = MagicMock()
-    mock_ydl_class.return_value.__enter__.return_value = mock_ydl
-    result = download_audio("https://youtube.com/watch?v=...", temp_storage_dir)
-    assert result.exists()
-    assert result.suffix == ".wav"
-```
-### Source Separation
-```python
-@patch('pipeline.demucs.separate.main')
-def test_separate_sources(mock_demucs, sample_audio_file, temp_storage_dir):
-    """Test source separation."""
-    # Create mock output files
-    stems_dir = temp_storage_dir / "htdemucs" / "test_audio"
-    stems_dir.mkdir(parents=True)
-    for stem in ["drums", "bass", "vocals", "other"]:
-        (stems_dir / f"{stem}.wav").touch()
-    result = separate_sources(sample_audio_file, temp_storage_dir)
-    assert all(stem in result for stem in ["drums", "bass", "vocals", "other"])
-    assert all(path.exists() for path in result.values())
-```
-### Transcription
-```python
-@patch('pipeline.basic_pitch.inference.predict')
-def test_transcribe_audio(mock_predict, sample_audio_file, temp_storage_dir):
-    """Test audio transcription."""
-    mock_predict.return_value = (
-        np.random.rand(100, 88),
-        np.random.rand(100, 88),
-        np.random.rand(100, 1)
-    )
-    result = transcribe_audio(sample_audio_file, temp_storage_dir)
-    assert result.exists()
-    assert result.suffix == ".mid"
-```
-### MusicXML Generation
-```python
-@patch('pipeline.music21.converter.parse')
-def test_generate_musicxml(mock_parse, sample_midi_file, temp_storage_dir):
-    """Test MusicXML generation."""
-    mock_score = MagicMock()
-    mock_parse.return_value = mock_score
-    result = generate_musicxml(sample_midi_file, temp_storage_dir)
-    assert result.exists()
-    assert result.suffix == ".musicxml"
-    mock_score.write.assert_called_once()
-```
-## Troubleshooting
-### Common Issues
-**Import Errors**
-```bash
-# Ensure backend directory is in PYTHONPATH
-export PYTHONPATH="${PYTHONPATH}:$(pwd)"
-pytest
-```
-**Redis Connection Errors**
-```python
-# Always mock Redis in tests unless testing Redis specifically
-@pytest.fixture(autouse=True)
-def mock_redis():
-    with patch('main.redis_client') as mock:
-        yield mock
-```
-**File Permission Errors**
-```python
-# Always use temp directories
-@pytest.fixture
-def temp_storage_dir():
-    temp_dir = tempfile.mkdtemp()
-    yield Path(temp_dir)
-    shutil.rmtree(temp_dir, ignore_errors=True)
-```
-**GPU Not Available**
-```python
-# Mark GPU tests and skip if unavailable
-@pytest.mark.gpu
-@pytest.mark.skipif(not torch.cuda.is_available(), reason="GPU not available")
-def test_gpu_processing():
-    ...
-```
-### Debugging Failed Tests
-```bash
-# Show print statements
-pytest -s
-# Verbose output
-pytest -vv
-# Drop into debugger on failure
-pytest --pdb
-# Run only failed tests
-pytest --lf
-```
-### Performance Issues
-```bash
-# Identify slow tests
-pytest --durations=10
-# Run tests in parallel
-pytest -n auto  # Requires pytest-xdist
-```
-## Best Practices
-1. **Mock external dependencies**: Don't make real API calls, network requests, or ML inferences
-2. **Use fixtures**: Share common setup code across tests
-3. **Test edge cases**: Empty inputs, None values, boundary conditions
-4. **Clean up resources**: Always clean up temp files, connections
-5. **Keep tests independent**: Tests should not depend on each other
-6. **Write descriptive names**: Test names should explain what they verify
-7. **Test one thing**: Each test should verify one specific behavior
-8. **Use markers**: Tag tests by type (unit, integration, slow, gpu)
-## Example Test File
-Complete example showing best practices:
-```python
-"""Tests for audio processing pipeline."""
-import pytest
-from pathlib import Path
-from unittest.mock import patch, MagicMock
-import numpy as np
-from pipeline import download_audio, separate_sources, transcribe_audio
-class TestAudioDownload:
-    """Test audio download stage."""
-    @patch('pipeline.yt_dlp.YoutubeDL')
-    def test_success(self, mock_ydl_class, temp_storage_dir):
-        """Test successful audio download."""
-        mock_ydl = MagicMock()
-        mock_ydl_class.return_value.__enter__.return_value = mock_ydl
-        result = download_audio("https://youtube.com/watch?v=test", temp_storage_dir)
-        assert result.exists()
-        assert result.suffix == ".wav"
-        mock_ydl.download.assert_called_once()
-    @patch('pipeline.yt_dlp.YoutubeDL')
-    def test_network_error(self, mock_ydl_class, temp_storage_dir):
-        """Test handling of network error."""
-        import yt_dlp
-        mock_ydl = MagicMock()
-        mock_ydl.download.side_effect = yt_dlp.utils.DownloadError("Network error")
-        mock_ydl_class.return_value.__enter__.return_value = mock_ydl
-        with pytest.raises(Exception) as exc_info:
-            download_audio("https://youtube.com/watch?v=test", temp_storage_dir)
-        assert "Network error" in str(exc_info.value)
-```

docs/testing/baseline-accuracy.md DELETED Viewed

@@ -1,178 +0,0 @@
-# Baseline Accuracy Report
-**Date**: 2024-12-24
-**Pipeline Version**: Phase 1 Complete (MusicXML corruption fixes, MIDI export, rate limiting)
-**Test Suite**: 10 diverse piano videos
-## Executive Summary
-This report establishes the baseline transcription accuracy for the Rescored MVP pipeline after Phase 1 improvements.
-**Initial Test Results** (Before Bug Fixes):
-- Overall Success Rate: **10%** (1/10 videos)
-- Videos Blocked: 3 (YouTube copyright/availability)
-- Code Bugs Found: 6 (all fixed ✅)
-- Successful Test: simple_melody (2,588 notes, 122 measures)
-**Expected After Fixes**:
-- Success Rate: **70-80%** (7-8/10 videos, excluding blocked ones)
-- All code bugs resolved
-- Need to replace 3 blocked videos with alternatives
-**Key Finding**: Measure timing accuracy is imperfect (78% of measures show duration warnings), but this is expected for ML-based transcription. MusicXML files load successfully in notation software.
-## Test Videos
-| ID | Description | Difficulty | Expected Accuracy | URL |
-|----|-------------|------------|-------------------|-----|
-| simple_melody | C major scale practice | Easy | >80% | [Link](https://www.youtube.com/watch?v=TK1Ij_-mank) |
-| twinkle_twinkle | Twinkle Twinkle Little Star | Easy | >75% | [Link](https://www.youtube.com/watch?v=YCZ_d_4ZEqk) |
-| fur_elise | Beethoven - Für Elise (simplified) | Medium | 60-70% | [Link](https://www.youtube.com/watch?v=_mVW8tgGY_w) |
-| chopin_nocturne | Chopin - Nocturne Op. 9 No. 2 | Hard | 50-60% | [Link](https://www.youtube.com/watch?v=9E6b3swbnWg) |
-| canon_in_d | Pachelbel - Canon in D | Medium | 60-70% | [Link](https://www.youtube.com/watch?v=NlprozGcs80) |
-| river_flows | Yiruma - River Flows in You | Medium | 60-70% | [Link](https://www.youtube.com/watch?v=7maJOI3QMu0) |
-| moonlight_sonata | Beethoven - Moonlight Sonata | Medium | 60-70% | [Link](https://www.youtube.com/watch?v=4Tr0otuiQuU) |
-| jazz_blues | Simple jazz blues piano | Medium | 55-65% | [Link](https://www.youtube.com/watch?v=F3W_alUuFkA) |
-| claire_de_lune | Debussy - Clair de Lune | Hard | 50-60% | [Link](https://www.youtube.com/watch?v=WNcsUNKlAKw) |
-| la_campanella | Liszt - La Campanella | Very Hard | 40-50% | [Link](https://www.youtube.com/watch?v=MD6xMyuZls0) |
-## Results
-### Overall Statistics
-(To be filled after test completion)
-- **Total Tests**: 10
-- **Successful**: TBD
-- **Failed**: TBD
-- **Success Rate**: TBD%
-### Per-Video Results
-#### Easy Difficulty (2 videos)
-**simple_melody** ✅:
-- Status: **SUCCESS**
-- MIDI Notes: 2,588
-- Measures: 122
-- Duration: 245.2 seconds
-- Separation Quality: 99.3% energy in 'other' stem (excellent)
-- Measure Warnings: 95/122 (78%) - typical for ML transcription
-- Issues: None - clean transcription
-**twinkle_twinkle** ❌:
-- Status: **BLOCKED**
-- Error: "Video unavailable"
-- Action: Replace with alternative video
-#### Medium Difficulty (5 videos)
-**fur_elise** ❌:
-- Status: **BLOCKED**
-- Error: "Video unavailable"
-- Action: Replace with alternative video
-**canon_in_d** ❌ → ✅:
-- Status: **FIXED**
-- Error: NoneType velocity comparison (Bug #2a)
-- Fix Applied: Safe velocity handling in deduplication
-- Expected: Success on re-run
-**river_flows** ❌ → ✅:
-- Status: **FIXED**
-- Error: NoneType velocity comparison (Bug #2a)
-- Fix Applied: Safe velocity handling
-- Expected: Success on re-run
-**moonlight_sonata** ❌ → ✅:
-- Status: **FIXED**
-- Error: NoneType velocity comparison (Bug #2a)
-- Fix Applied: Safe velocity handling
-- Expected: Success on re-run
-**jazz_blues** ❌:
-- Status: **BLOCKED**
-- Error: "Blocked on copyright grounds"
-- Action: Replace with public domain jazz piano
-#### Hard Difficulty (2 videos)
-**chopin_nocturne** ❌ → ✅:
-- Status: **FIXED**
-- Error: 2048th note duration in measure 129 (Bug #2b)
-- Fix Applied: Increased minimum duration threshold to 128th note
-- Expected: Success on re-run
-**claire_de_lune** ❌ → ✅:
-- Status: **FIXED**
-- Error: 2048th note duration in measure 30 (Bug #2b)
-- Fix Applied: Increased minimum duration threshold
-- Expected: Success on re-run
-#### Very Hard Difficulty (1 video)
-**la_campanella** ❌ → ✅:
-- Status: **FIXED**
-- Error: NoneType velocity comparison (Bug #2a)
-- Fix Applied: Safe velocity handling
-- Expected: Success on re-run (may have low accuracy due to extreme difficulty)
-## Common Failure Modes
-Detailed analysis in [failure-modes.md](failure-modes.md)
-### 1. Video Availability (30% of failures)
-- YouTube blocking, copyright claims, unavailable videos
-- **Solution**: Replace with Creative Commons alternatives
-### 2. Code Bugs - All Fixed ✅ (60% of failures)
-- **Bug 2a**: NoneType velocity comparison (4 videos)
-  - Fixed in [pipeline.py:403-409](../../backend/pipeline.py#L403-L409)
-- **Bug 2b**: 2048th note duration errors (2 videos)
-  - Fixed in [pipeline.py:465-502](../../backend/pipeline.py#L465-L502)
-### 3. Measure Timing Accuracy (78% imperfect)
-- Most measures deviate from exact 4.0 beats
-- Range: 0.0 to 7.83 beats (should be 4.0)
-- **Root causes**: basic-pitch timing, duration snapping, polyphony
-- **Impact**: MusicXML loads but rhythms need manual correction
-- **Status**: Expected limitation for ML transcription - Phase 3 will improve
-## Accuracy by Difficulty
-| Difficulty | Avg Success Rate | Avg Notes | Avg Measures | Notes |
-|------------|------------------|-----------|--------------|-------|
-| Easy | TBD | TBD | TBD | TBD |
-| Medium | TBD | TBD | TBD | TBD |
-| Hard | TBD | TBD | TBD | TBD |
-| Very Hard | TBD | TBD | TBD | TBD |
-## Known Limitations
-Based on Phase 1 implementation:
-1. **Measure Timing**: Many measures show duration warnings (3.5-6.5 beats instead of exactly 4.0). This is expected due to:
-   - basic-pitch not perfectly aligned to beats
-   - Duration snapping to nearest valid note values
-   - Imperfect tempo detection
-2. **MusicXML Warnings**: music21 reports some "overfull measures" when parsing. These are handled gracefully but indicate timing imperfections.
-3. **Single Staff Only**: Grand staff (treble + bass) disabled in Phase 1 due to polyphony issues.
-4. **Piano Only**: Currently only transcribes "other" stem from Demucs, assuming piano/keyboard content.
-## Recommendations for Phase 3
-(To be filled based on failure analysis)
-1. **Parameter Tuning**: TBD
-2. **Model Improvements**: TBD
-3. **Post-Processing**: TBD
-4. **Source Separation**: TBD
-## Appendix: Raw Test Data
-Full test results JSON: `/tmp/rescored/accuracy_test_results.json`
-Individual test outputs in: `/tmp/rescored/temp/accuracy_test_*/`

docs/testing/failure-modes.md DELETED Viewed

@@ -1,216 +0,0 @@
-# Failure Mode Analysis
-**Date**: 2024-12-24
-**Test Suite**: Phase 2 Accuracy Baseline (10 videos)
-**Pipeline Version**: Phase 1 Complete + Bug Fixes
-## Executive Summary
-Initial accuracy testing revealed **3 major failure categories** affecting 9 out of 10 test videos:
-1. **Video Availability** (30% of failures) - YouTube blocking/copyright
-2. **Code Bugs** (60% of failures) - NoneType errors and 2048th note duration issues
-3. **MusicXML Export** (20% of failures) - Impossible duration errors
-**All code bugs have been fixed.** Success rate expected to improve significantly with re-run.
-## Failure Categories
-### 1. Video Availability Issues (3 videos - 30%)
-**Videos Affected:**
-- `twinkle_twinkle` - "Video unavailable"
-- `fur_elise` - "Video unavailable"
-- `jazz_blues` - "Blocked in your country on copyright grounds"
-**Root Cause:** YouTube access restrictions, not pipeline issues
-**Mitigation:**
-- Replace with alternative videos for same difficulty level
-- Use Creative Commons licensed videos
-- Host test videos on alternative platforms
-**Impact:** Not a pipeline issue - will replace test videos
----
-### 2. Code Bugs - Fixed ✅ (6 videos - 60%)
-#### Bug 2a: NoneType Velocity Comparison (4 videos)
-**Error:** `'<' not supported between instances of 'int' and 'NoneType'`
-**Videos Affected:**
-- `canon_in_d`
-- `river_flows`
-- `moonlight_sonata`
-- `la_campanella`
-**Root Cause:** In `_deduplicate_overlapping_notes()` at [pipeline.py:403-407](../backend/pipeline.py#L403-L407), the code tried to sort notes by velocity, but `note.volume.velocity` can return `None`.
-**Fix Applied:**
-```python
-def get_velocity(note):
-    if hasattr(note, 'volume') and hasattr(note.volume, 'velocity'):
-        vel = note.volume.velocity
-        return vel if vel is not None else 64
-    return 64
-pitch_notes.sort(key=lambda x: (x.quarterLength, get_velocity(x)), reverse=True)
-```
-**Status:** ✅ Fixed in [pipeline.py:403-409](../backend/pipeline.py#L403-L409)
----
-#### Bug 2b: 2048th Note Duration (2 videos)
-**Error:** `In part (Piano), measure (X): Cannot convert "2048th" duration to MusicXML (too short).`
-**Videos Affected:**
-- `chopin_nocturne` (measure 129)
-- `claire_de_lune` (measure 30)
-**Root Cause:** `music21.makeMeasures()` creates extremely short rests (2048th notes) when filling gaps between notes. MusicXML export fails because these durations are too short to represent.
-**Previous Attempts:**
-1. ❌ Filtered notes < 64th note (0.0625) before `makeMeasures()` - didn't work
-2. ❌ Removed notes < 64th note after `makeMeasures()` - still had issues
-**Final Fix:**
-- Increased minimum duration threshold to **128th note** (0.03125)
-- Added logging to show how many notes/rests were removed
-- Applied in `_remove_impossible_durations()` at [pipeline.py:465-502](../backend/pipeline.py#L465-L502)
-**Status:** ✅ Fixed - more aggressive filtering
----
-### 3. Successful Test Analysis
-**Video:** `simple_melody` (C major scale practice, Easy difficulty)
-**Results:**
-- ✅ Successfully generated MusicXML
-- **2,588 notes** detected
-- **122 measures** created
-- **245 seconds** duration
-- **99.3% energy** preserved in 'other' stem (excellent separation)
-**Key Metrics:**
-| Metric | Value | Assessment |
-|--------|-------|------------|
-| Note density | 5.36 notes/sec | Reasonable for piano |
-| Pitch range | G1 to A6 (62 semitones) | Full piano range |
-| Polyphony | ~1.6 avg, ~6 max | Modest polyphony |
-| Short notes | 271 (21%) under 200ms | Acceptable |
-| Measure warnings | 95/122 (78%) | **High** - timing imperfect |
-**Measure Timing Issues:**
-78% of measures showed duration warnings (range 0.0 - 7.83 beats instead of exactly 4.0). Examples:
-- Measure 1: 0.00 beats (empty)
-- Measure 30: 6.41 beats (overfull)
-- Measure 69: 7.33 beats (very overfull)
-- Measure 77: 7.83 beats (worst case)
-**Root Causes:**
-1. **basic-pitch timing** not aligned to musical beats
-2. **Duration snapping** to nearest valid note value loses precision
-3. **Tempo detection** may be inaccurate
-4. **Polyphonic overlaps** creating extra duration
-**Impact:** MusicXML loads in notation software but rhythms are imperfect. This is expected with ML-based transcription.
----
-## Common Patterns
-### Pattern 1: Quiet Audio Detection
-- Diagnostic shows RMS energy of 0.0432 (very quiet)
-- 20% silence in audio
-- basic-pitch may struggle with quiet inputs
-### Pattern 2: Separation Quality
-- For `simple_melody`: 99.3% energy in 'other' stem ✅
-- Only 0.2% in 'no_other' stem (excellent isolation)
-- Demucs successfully isolated piano
-### Pattern 3: Measure Duration Accuracy
-- **Only 22%** of measures have exactly 4.0 beats
-- **78%** show timing deviations
-- Range: -4.0 to +3.83 beats deviation
-- Largest errors in complex sections (likely polyphony)
----
-## Recommendations
-### Immediate Actions (Phase 2 completion)
-1. **Replace unavailable videos** with Creative Commons alternatives
-2. **Re-run accuracy suite** with bug fixes
-3. **Document actual baseline** with successful tests
-### Phase 3 Improvements (Accuracy Tuning)
-1. **Tempo Detection:**
-   - Implement better tempo detection (analyze beat patterns)
-   - Consider fixed tempo option for practice scales
-2. **Quantization:**
-   - Improve rhythmic quantization to align with detected beats
-   - Consider time signature detection
-3. **Post-Processing:**
-   - Add measure duration normalization
-   - Stretch/compress note timings to fit exact 4.0 beats
-4. **Parameter Tuning:**
-   - Test different `onset-threshold` values (current: 0.5)
-   - Test different `frame-threshold` values (current: 0.4)
-   - Experiment with `minimum-note-length`
-### Alternative Models (Phase 3 - Optional)
-Consider testing:
-- **MT3** (Google's Music Transformer) - better rhythm accuracy
-- **htdemucs_6s** - 6-stem model with dedicated piano stem
-- **Omnizart** - specialized for classical music
----
-## Success Criteria
-After fixes and re-run, we expect:
-- ✅ **Video availability**: 7-8 working videos (replacing blocked ones)
-- ✅ **Code bugs**: 0% failure rate (all fixed)
-- ✅ **MusicXML export**: 100% success for available videos
-- 🎯 **Overall success rate**: 70-80% (from 10%)
-Measure timing accuracy will remain imperfect (~78% with warnings) but this is expected for MVP. Phase 3 will focus on improving timing accuracy.
----
-## Appendix: Error Details
-### NoneType Error Stack Trace
-```
-File "pipeline.py", line 403
-    pitch_notes.sort(key=lambda x: (x.quarterLength, x.volume.velocity if ...))
-TypeError: '<' not supported between instances of 'int' and 'NoneType'
-```
-### 2048th Note Error Stack Trace
-```
-File "music21/musicxml/m21ToXml.py", line 4702
-    mxNormalType.text = typeToMusicXMLType(tup.durationNormal.type)
-MusicXMLExportException: In part (Piano), measure (129): Cannot convert "2048th" duration to MusicXML (too short).
-```
----
-**Last Updated**: 2024-12-24
-**Next Review**: After accuracy suite re-run

docs/testing/frontend-testing.md DELETED Viewed

@@ -1,653 +0,0 @@
-# Frontend Testing Guide
-Comprehensive guide for testing the Rescored frontend.
-## Table of Contents
-- [Setup](#setup)
-- [Running Tests](#running-tests)
-- [Test Structure](#test-structure)
-- [Writing Tests](#writing-tests)
-- [Testing Patterns](#testing-patterns)
-- [Troubleshooting](#troubleshooting)
-## Setup
-### Install Test Dependencies
-```bash
-cd frontend
-npm install
-```
-Test dependencies (already in `package.json`):
-- `vitest`: Test framework
-- `@testing-library/react`: React testing utilities
-- `@testing-library/user-event`: User interaction simulation
-- `@testing-library/jest-dom`: DOM matchers
-- `jsdom`: DOM implementation for Node.js
-- `@vitest/ui`: Interactive test UI
-- `@vitest/coverage-v8`: Coverage reporting
-### Configuration
-Test configuration is in `vitest.config.ts`:
-```typescript
-export default defineConfig({
-  test: {
-    globals: true,
-    environment: 'jsdom',
-    setupFiles: ['./src/tests/setup.ts'],
-    coverage: {
-      provider: 'v8',
-      reporter: ['text', 'html', 'lcov'],
-    },
-  },
-});
-```
-## Running Tests
-### Basic Commands
-```bash
-# Run all tests
-npm test
-# Run in watch mode
-npm test -- --watch
-# Run with UI
-npm run test:ui
-# Run with coverage
-npm run test:coverage
-# Run specific file
-npm test -- src/tests/api/client.test.ts
-# Run tests matching pattern
-npm test -- --grep "JobSubmission"
-```
-### Watch Mode
-Watch mode automatically re-runs tests when files change:
-```bash
-npm test -- --watch
-# Watch specific file
-npm test -- --watch src/tests/components/NotationCanvas.test.tsx
-```
-### Coverage Reports
-```bash
-# Generate coverage report
-npm run test:coverage
-# Open HTML report
-open coverage/index.html
-```
-## Test Structure
-### Test Files
-Component tests live alongside components or in `src/tests/`:
-```
-frontend/src/
-├── components/
-│   ├── JobSubmission.tsx
-│   └── JobSubmission.test.tsx      # Option 1: Co-located
-├── tests/
-│   ├── setup.ts                     # Test configuration
-│   ├── fixtures.ts                  # Shared test data
-│   ├── components/
-│   │   └── JobSubmission.test.tsx  # Option 2: Separate directory
-│   └── api/
-│       └── client.test.ts
-```
-### Test Organization
-```typescript
-import { describe, it, expect, vi, beforeEach } from 'vitest';
-import { render, screen } from '@testing-library/react';
-import Component from './Component';
-describe('Component', () => {
-  beforeEach(() => {
-    // Setup before each test
-  });
-  describe('Rendering', () => {
-    it('should render correctly', () => {
-      // Test rendering
-    });
-  });
-  describe('Interactions', () => {
-    it('should handle user input', async () => {
-      // Test interactions
-    });
-  });
-  describe('Edge Cases', () => {
-    it('should handle empty state', () => {
-      // Test edge cases
-    });
-  });
-});
-```
-## Writing Tests
-### Basic Component Test
-```typescript
-import { describe, it, expect } from 'vitest';
-import { render, screen } from '@testing-library/react';
-import MyComponent from './MyComponent';
-describe('MyComponent', () => {
-  it('should render text', () => {
-    render(<MyComponent text="Hello" />);
-    expect(screen.getByText('Hello')).toBeInTheDocument();
-  });
-  it('should handle button click', async () => {
-    const user = userEvent.setup();
-    const handleClick = vi.fn();
-    render(<MyComponent onClick={handleClick} />);
-    const button = screen.getByRole('button');
-    await user.click(button);
-    expect(handleClick).toHaveBeenCalledTimes(1);
-  });
-});
-```
-### Testing with User Interactions
-Use `@testing-library/user-event` for realistic interactions:
-```typescript
-import userEvent from '@testing-library/user-event';
-it('should accept user input', async () => {
-  const user = userEvent.setup();
-  render(<JobSubmission />);
-  const input = screen.getByPlaceholderText(/youtube url/i);
-  // Type into input
-  await user.type(input, 'https://www.youtube.com/watch?v=...');
-  expect(input).toHaveValue('https://www.youtube.com/watch?v=...');
-  // Click button
-  const button = screen.getByRole('button', { name: /submit/i });
-  await user.click(button);
-  // Verify action
-  await waitFor(() => {
-    expect(mockSubmit).toHaveBeenCalled();
-  });
-});
-```
-### Testing Async Operations
-```typescript
-import { waitFor } from '@testing-library/react';
-it('should load data', async () => {
-  const mockFetch = vi.fn().mockResolvedValue({
-    ok: true,
-    json: async () => ({ data: 'test' }),
-  });
-  global.fetch = mockFetch;
-  render(<DataComponent />);
-  await waitFor(() => {
-    expect(screen.getByText('test')).toBeInTheDocument();
-  });
-});
-```
-### Mocking Dependencies
-#### Mock API Client
-```typescript
-vi.mock('../../api/client', () => ({
-  submitTranscription: vi.fn(),
-  getJobStatus: vi.fn(),
-  downloadScore: vi.fn(),
-}));
-import { submitTranscription } from '../../api/client';
-it('should call API', async () => {
-  const mockSubmit = vi.mocked(submitTranscription);
-  mockSubmit.mockResolvedValue({ job_id: '123', status: 'queued' });
-  // Test component that uses submitTranscription
-  // ...
-  expect(mockSubmit).toHaveBeenCalledWith('https://youtube.com/...');
-});
-```
-#### Mock Zustand Store
-```typescript
-import { renderHook, act } from '@testing-library/react';
-import { useScoreStore } from '../../store/scoreStore';
-it('should update store', () => {
-  const { result } = renderHook(() => useScoreStore());
-  act(() => {
-    result.current.setMusicXML('<musicxml>...</musicxml>');
-  });
-  expect(result.current.musicXML).toBe('<musicxml>...</musicxml>');
-});
-```
-#### Mock VexFlow
-```typescript
-// In setup.ts
-vi.mock('vexflow', () => ({
-  Flow: {
-    Renderer: vi.fn(() => ({
-      resize: vi.fn(),
-      getContext: vi.fn(() => ({
-        clear: vi.fn(),
-        setFont: vi.fn(),
-      })),
-    })),
-    Stave: vi.fn(() => ({
-      addClef: vi.fn().mockReturnThis(),
-      addTimeSignature: vi.fn().mockReturnThis(),
-      setContext: vi.fn().mockReturnThis(),
-      draw: vi.fn(),
-    })),
-  },
-}));
-```
-## Testing Patterns
-### Testing Form Submission
-```typescript
-it('should submit form with valid data', async () => {
-  const user = userEvent.setup();
-  const onSubmit = vi.fn();
-  render(<Form onSubmit={onSubmit} />);
-  // Fill out form
-  await user.type(screen.getByLabelText(/url/i), 'https://youtube.com/...');
-  // Submit
-  await user.click(screen.getByRole('button', { name: /submit/i }));
-  // Verify
-  await waitFor(() => {
-    expect(onSubmit).toHaveBeenCalledWith({
-      url: 'https://youtube.com/...',
-    });
-  });
-});
-```
-### Testing Error States
-```typescript
-it('should show error message', async () => {
-  const mockFetch = vi.fn().mockRejectedValue(new Error('Network error'));
-  global.fetch = mockFetch;
-  render(<Component />);
-  await waitFor(() => {
-    expect(screen.getByText(/network error/i)).toBeInTheDocument();
-  });
-});
-```
-### Testing Loading States
-```typescript
-it('should show loading indicator', async () => {
-  const mockFetch = vi.fn(() =>
-    new Promise(resolve => setTimeout(() => resolve({ ok: true }), 100))
-  );
-  global.fetch = mockFetch;
-  render(<Component />);
-  // Should show loading
-  expect(screen.getByText(/loading/i)).toBeInTheDocument();
-  // Should hide loading after data loads
-  await waitFor(() => {
-    expect(screen.queryByText(/loading/i)).not.toBeInTheDocument();
-  });
-});
-```
-### Testing WebSocket Connections
-```typescript
-it('should handle WebSocket messages', () => {
-  const mockWS = {
-    addEventListener: vi.fn(),
-    send: vi.fn(),
-    close: vi.fn(),
-  };
-  global.WebSocket = vi.fn(() => mockWS) as any;
-  render(<WebSocketComponent />);
-  // Get message handler
-  const messageHandler = mockWS.addEventListener.mock.calls.find(
-    call => call[0] === 'message'
-  )?.[1];
-  // Simulate message
-  messageHandler?.({ data: JSON.stringify({ type: 'progress', progress: 50 }) });
-  // Verify UI updated
-  expect(screen.getByText(/50%/)).toBeInTheDocument();
-});
-```
-### Testing Conditional Rendering
-```typescript
-it('should render different states', () => {
-  const { rerender } = render(<StatusIndicator status="loading" />);
-  expect(screen.getByText(/loading/i)).toBeInTheDocument();
-  rerender(<StatusIndicator status="success" />);
-  expect(screen.getByText(/success/i)).toBeInTheDocument();
-  rerender(<StatusIndicator status="error" />);
-  expect(screen.getByText(/error/i)).toBeInTheDocument();
-});
-```
-### Testing Canvas/VexFlow Components
-```typescript
-it('should render notation', () => {
-  // Mock canvas context
-  const mockContext = {
-    fillRect: vi.fn(),
-    clearRect: vi.fn(),
-    beginPath: vi.fn(),
-    stroke: vi.fn(),
-  };
-  HTMLCanvasElement.prototype.getContext = vi.fn(() => mockContext) as any;
-  const { container } = render(<NotationCanvas musicXML={sampleXML} />);
-  // Verify canvas or SVG exists
-  const canvas = container.querySelector('canvas');
-  expect(canvas).toBeInTheDocument();
-});
-```
-### Snapshot Testing
-Use snapshots for stable UI components:
-```typescript
-it('should match snapshot', () => {
-  const { container } = render(<StaticComponent />);
-  expect(container).toMatchSnapshot();
-});
-```
-**Update snapshots:**
-```bash
-npm test -- -u
-```
-## Testing Custom Hooks
-```typescript
-import { renderHook, act } from '@testing-library/react';
-import { useCustomHook } from './useCustomHook';
-it('should handle state changes', () => {
-  const { result } = renderHook(() => useCustomHook());
-  expect(result.current.count).toBe(0);
-  act(() => {
-    result.current.increment();
-  });
-  expect(result.current.count).toBe(1);
-});
-```
-## Accessibility Testing
-```typescript
-it('should be accessible', () => {
-  render(<Component />);
-  // Check for proper labels
-  expect(screen.getByLabelText(/input field/i)).toBeInTheDocument();
-  // Check for ARIA attributes
-  expect(screen.getByRole('button')).toHaveAttribute('aria-label', 'Submit');
-  // Check keyboard navigation
-  const button = screen.getByRole('button');
-  button.focus();
-  expect(button).toHaveFocus();
-});
-```
-## Troubleshooting
-### Common Issues
-**Canvas/VexFlow Errors**
-```typescript
-// Mock canvas in setup.ts
-beforeEach(() => {
-  HTMLCanvasElement.prototype.getContext = vi.fn(() => ({
-    fillRect: vi.fn(),
-    // ... other canvas methods
-  })) as any;
-});
-```
-**WebSocket Errors**
-```typescript
-// Mock WebSocket in setup.ts
-global.WebSocket = vi.fn(() => ({
-  addEventListener: vi.fn(),
-  send: vi.fn(),
-  close: vi.fn(),
-  readyState: WebSocket.OPEN,
-})) as any;
-```
-**Module Import Errors**
-```typescript
-// Use vi.mock at top of test file
-vi.mock('external-module', () => ({
-  default: vi.fn(),
-  namedExport: vi.fn(),
-}));
-```
-**Async Test Timeouts**
-```typescript
-// Increase timeout for slow tests
-it('slow test', async () => {
-  // ...
-}, { timeout: 10000 });
-```
-### Debugging Tests
-```bash
-# Run with UI for interactive debugging
-npm run test:ui
-# Run specific test in watch mode
-npm test -- --watch --grep "test name"
-# Debug in VS Code
-# Add breakpoint and use "Debug Test" code lens
-```
-### Performance Issues
-```bash
-# Identify slow tests
-npm test -- --reporter=verbose
-# Run tests in parallel (default)
-npm test
-# Run sequentially if needed
-npm test -- --no-threads
-```
-## Best Practices
-1. **Test user behavior, not implementation**: Focus on what users see and do
-2. **Use accessible queries**: Prefer `getByRole`, `getByLabelText` over `getByTestId`
-3. **Avoid testing implementation details**: Don't test internal state or methods
-4. **Keep tests simple**: Each test should verify one thing
-5. **Use realistic data**: Test with data similar to production
-6. **Clean up**: Always clean up side effects (timers, listeners)
-7. **Mock external dependencies**: Don't make real API calls or WebSocket connections
-8. **Test edge cases**: Empty states, errors, loading states
-## Query Priority
-Use queries in this order (most preferred first):
-1. **Accessible Queries**:
-   - `getByRole`
-   - `getByLabelText`
-   - `getByPlaceholderText`
-   - `getByText`
-2. **Semantic Queries**:
-   - `getByAltText`
-   - `getByTitle`
-3. **Test IDs** (last resort):
-   - `getByTestId`
-Example:
-```typescript
-// Good
-const button = screen.getByRole('button', { name: /submit/i });
-const input = screen.getByLabelText(/email/i);
-// Acceptable
-const image = screen.getByAltText('Logo');
-// Last resort
-const element = screen.getByTestId('custom-element');
-```
-## Example Test File
-Complete example showing best practices:
-```typescript
-import { describe, it, expect, vi, beforeEach } from 'vitest';
-import { render, screen, waitFor } from '@testing-library/react';
-import userEvent from '@testing-library/user-event';
-import JobSubmission from './JobSubmission';
-vi.mock('../../api/client', () => ({
-  submitTranscription: vi.fn(),
-}));
-import { submitTranscription } from '../../api/client';
-describe('JobSubmission', () => {
-  beforeEach(() => {
-    vi.clearAllMocks();
-  });
-  describe('Rendering', () => {
-    it('should render input and button', () => {
-      render(<JobSubmission />);
-      expect(screen.getByPlaceholderText(/youtube url/i)).toBeInTheDocument();
-      expect(screen.getByRole('button', { name: /transcribe/i })).toBeInTheDocument();
-    });
-  });
-  describe('User Interactions', () => {
-    it('should accept and submit valid URL', async () => {
-      const user = userEvent.setup();
-      const mockSubmit = vi.mocked(submitTranscription);
-      mockSubmit.mockResolvedValue({ job_id: '123', status: 'queued' });
-      render(<JobSubmission />);
-      const input = screen.getByPlaceholderText(/youtube url/i);
-      const button = screen.getByRole('button', { name: /transcribe/i });
-      await user.type(input, 'https://www.youtube.com/watch?v=...');
-      await user.click(button);
-      await waitFor(() => {
-        expect(mockSubmit).toHaveBeenCalledWith(
-          'https://www.youtube.com/watch?v=...',
-          expect.any(Object)
-        );
-      });
-    });
-  });
-  describe('Error Handling', () => {
-    it('should show error for invalid URL', async () => {
-      const user = userEvent.setup();
-      render(<JobSubmission />);
-      const input = screen.getByPlaceholderText(/youtube url/i);
-      const button = screen.getByRole('button', { name: /transcribe/i });
-      await user.type(input, 'invalid-url');
-      await user.click(button);
-      await waitFor(() => {
-        expect(screen.getByText(/invalid/i)).toBeInTheDocument();
-      });
-    });
-  });
-});
-```

docs/testing/overview.md DELETED Viewed

@@ -1,315 +0,0 @@
-# Testing Guide
-Complete testing guide for the Rescored project.
-## Quick Start
-### Backend Tests
-```bash
-cd backend
-pip install -r requirements-test.txt
-pytest --cov
-```
-### Frontend Tests
-```bash
-cd frontend
-npm install
-npm test
-```
-## Testing Philosophy
-Rescored follows these testing principles:
-1. **Test behavior, not implementation** - Verify what the code does, not how
-2. **Write tests that give confidence** - Focus on high-value tests that catch real bugs
-3. **Keep tests maintainable** - Tests should be easy to understand and modify
-4. **Test at the right level** - Unit tests for logic, integration tests for workflows
-5. **Fast feedback loops** - Tests should run quickly to enable rapid development
-## Test Suites
-### Backend Test Suite (`backend/tests/`)
-- **Unit Tests** (`test_utils.py`) - URL validation, video availability checks
-- **API Tests** (`test_api.py`) - FastAPI endpoints, WebSocket connections
-- **Pipeline Tests** (`test_pipeline.py`) - Audio processing, transcription, MusicXML generation
-- **Task Tests** (`test_tasks.py`) - Celery workers, job processing, progress updates
-**Features**: Mocked external dependencies (yt-dlp, Redis, ML models), temporary file handling, parametrized tests, coverage reporting
-### Frontend Test Suite (`frontend/src/tests/`)
-- **API Client Tests** (`api/client.test.ts`) - HTTP requests, WebSocket connections
-- **Component Tests** (`components/`) - JobSubmission, NotationCanvas, PlaybackControls
-- **Store Tests** (`store/useScoreStore.test.ts`) - Zustand state management
-**Features**: React Testing Library, user event simulation, mocked VexFlow and Tone.js, coverage reporting
-## Coverage Goals
-| Component | Target | Priority |
-|-----------|--------|----------|
-| Backend Utils | 90%+ | High |
-| Backend Pipeline | 85%+ | Critical |
-| Backend API | 80%+ | High |
-| Frontend API Client | 85%+ | Critical |
-| Frontend Components | 75%+ | High |
-| Frontend Store | 80%+ | High |
-## Running Tests
-### Backend
-```bash
-# Run all tests
-pytest
-# With coverage
-pytest --cov --cov-report=html
-# Specific tests
-pytest tests/test_utils.py
-pytest tests/test_utils.py::TestValidateYouTubeURL::test_valid_watch_url
-# By category
-pytest -m unit              # Only unit tests
-pytest -m integration       # Only integration tests
-pytest -m "not slow"        # Exclude slow tests
-pytest -m "not gpu"         # Exclude GPU tests
-# Debugging
-pytest -vv                  # Verbose output
-pytest -s                   # Show print statements
-pytest --pdb                # Drop into debugger on failure
-pytest --lf                 # Run last failed tests
-```
-### Frontend
-```bash
-# Run all tests
-npm test
-# Watch mode
-npm test -- --watch
-# With UI
-npm run test:ui
-# With coverage
-npm run test:coverage
-# Specific tests
-npm test -- src/tests/api/client.test.ts
-npm test -- --grep "JobSubmission"
-```
-## Test Structure
-### Backend
-```
-backend/tests/
-├── conftest.py           # Shared fixtures (temp dirs, mock Redis, sample files)
-├── test_utils.py         # Utility function tests
-├── test_api.py           # API endpoint tests
-├── test_pipeline.py      # Audio processing tests
-└── test_tasks.py         # Celery task tests
-```
-### Frontend
-```
-frontend/src/tests/
-├── setup.ts              # Test configuration (mocks for VexFlow, Tone.js, WebSocket)
-├── fixtures.ts           # Shared test data (MusicXML, job responses, etc.)
-├── api/client.test.ts
-├── components/
-│   ├── JobSubmission.test.tsx
-│   ├── NotationCanvas.test.tsx
-│   └── PlaybackControls.test.tsx
-└── store/useScoreStore.test.ts
-```
-## Common Patterns
-### Backend Testing
-```python
-# Mock external services
-@patch('pipeline.yt_dlp.YoutubeDL')
-def test_download_audio(mock_ydl_class, temp_storage_dir):
-    mock_ydl = MagicMock()
-    mock_ydl_class.return_value.__enter__.return_value = mock_ydl
-    result = download_audio("https://youtube.com/...", temp_storage_dir)
-    assert result.exists()
-    assert result.suffix == ".wav"
-# Test API endpoints
-def test_submit_transcription(test_client):
-    response = test_client.post(
-        "/api/v1/transcribe",
-        json={"youtube_url": "https://www.youtube.com/watch?v=..."}
-    )
-    assert response.status_code == 201
-    assert "job_id" in response.json()
-# Parametrized tests
-@pytest.mark.parametrize("url,expected_valid", [
-    ("https://www.youtube.com/watch?v=dQw4w9WgXcQ", True),
-    ("https://vimeo.com/12345", False),
-])
-def test_url_validation(url, expected_valid):
-    is_valid, _ = validate_youtube_url(url)
-    assert is_valid == expected_valid
-```
-### Frontend Testing
-```typescript
-// Test components with user interaction
-it('should submit form', async () => {
-  const user = userEvent.setup();
-  const onSubmit = vi.fn();
-  render(<JobSubmission onSubmit={onSubmit} />);
-  const input = screen.getByPlaceholderText(/youtube url/i);
-  await user.type(input, 'https://www.youtube.com/watch?v=...');
-  const button = screen.getByRole('button', { name: /submit/i });
-  await user.click(button);
-  await waitFor(() => {
-    expect(onSubmit).toHaveBeenCalled();
-  });
-});
-// Mock API calls
-vi.mock('../../api/client', () => ({
-  submitTranscription: vi.fn(),
-}));
-it('should call API', async () => {
-  const mockSubmit = vi.mocked(submitTranscription);
-  mockSubmit.mockResolvedValue({ job_id: '123' });
-  // Test component that uses submitTranscription
-  // ...
-});
-// Test store
-it('should update store', () => {
-  const { result } = renderHook(() => useScoreStore());
-  act(() => {
-    result.current.setMusicXML('<musicxml>...</musicxml>');
-  });
-  expect(result.current.musicXML).toBe('<musicxml>...</musicxml>');
-});
-```
-## Mocking Strategy
-### Backend
-- **External Services**: Mock yt-dlp, Redis, Celery
-- **ML Models**: Mock Demucs and basic-pitch for fast tests
-- **File System**: Use temporary directories
-### Frontend
-- **API Calls**: Mock fetch with vitest
-- **WebSockets**: Mock WebSocket connections
-- **Browser APIs**: Mock Canvas, Audio, localStorage
-- **Libraries**: Mock VexFlow, Tone.js
-## Best Practices
-### General
-1. ✅ Write descriptive test names that explain the scenario
-2. ✅ Keep tests simple and focused (one thing per test)
-3. ✅ Use Arrange-Act-Assert structure
-4. ✅ Make tests independent (no shared state)
-5. ✅ Clean up resources (files, connections, timers)
-6. ✅ Mock external dependencies
-7. ✅ Add tests when fixing bugs
-8. ✅ Keep test code as clean as production code
-### Backend-Specific
-- Use pytest fixtures for shared setup
-- Mock yt-dlp, Redis, Celery, ML models
-- Use temporary directories for file operations
-- Mark slow/GPU tests with `@pytest.mark.slow` and `@pytest.mark.gpu`
-- Test both success and error paths
-### Frontend-Specific
-- Test user behavior, not implementation details
-- Use accessible queries: `getByRole`, `getByLabelText` (not `getByTestId`)
-- Mock API calls and WebSocket connections
-- Test loading states and error handling
-- Clean up side effects (timers, event listeners)
-## Troubleshooting
-### Backend
-**Import errors**
-```bash
-export PYTHONPATH="${PYTHONPATH}:$(pwd)"
-```
-**Redis connection errors** - Always mock Redis unless testing Redis specifically
-**GPU tests failing** - Mark with `@pytest.mark.gpu` and skip if unavailable
-### Frontend
-**Canvas errors** - Mock canvas context in `setup.ts`
-**WebSocket errors** - Mock WebSocket in `setup.ts`
-**Module import errors** - Use `vi.mock()` at top of test file
-**Async timeouts** - Increase timeout: `it('test', async () => { ... }, { timeout: 10000 })`
-## Test Performance
-**Benchmarks:**
-- Unit tests: < 100ms each
-- Full backend suite: < 30 seconds
-- Full frontend suite: < 20 seconds
-**Optimization:**
-- Mock expensive operations (ML inference, network calls)
-- Use test markers to skip slow tests during development
-- Parallelize tests (pytest-xdist for backend, vitest default)
-- Cache expensive fixtures
-## CI/CD Integration
-Tests run automatically on:
-- **Pull Requests** - All tests must pass
-- **Main Branch** - Full suite including slow tests
-- **Nightly** - Extended test suite with real YouTube videos
-- **Pre-release** - E2E tests, performance benchmarks
-## Detailed Guides
-For detailed information, see:
-- **[Backend Testing Guide](./backend-testing.md)** - In-depth backend testing patterns and examples
-- **[Frontend Testing Guide](./frontend-testing.md)** - In-depth frontend testing patterns and examples
-- **[Test Video Collection](./test-videos.md)** - Curated YouTube videos for testing transcription quality
-## Resources
-- [pytest Documentation](https://docs.pytest.org/)
-- [Vitest Documentation](https://vitest.dev/)
-- [React Testing Library](https://testing-library.com/react)
-- [FastAPI Testing](https://fastapi.tiangolo.com/tutorial/testing/)

docs/testing/test-videos.md DELETED Viewed

@@ -1,371 +0,0 @@
-# Test Video Collection
-Curated collection of YouTube videos for testing transcription quality and edge cases.
-## Table of Contents
-- [Simple Piano Tests](#simple-piano-tests)
-- [Classical Piano](#classical-piano)
-- [Pop Piano Covers](#pop-piano-covers)
-- [Jazz Piano](#jazz-piano)
-- [Complex/Challenging](#complexchallenging)
-- [Edge Cases](#edge-cases)
-- [Testing Criteria](#testing-criteria)
-## Simple Piano Tests
-Use these for basic functionality and quick iteration.
-### 1. Twinkle Twinkle Little Star (Beginner Piano)
-- **Duration**: ~1 minute
-- **Tempo**: Slow (60-80 BPM)
-- **Complexity**: Very simple melody, single notes
-- **Expected Accuracy**: 95%+
-- **Use For**: Smoke tests, basic functionality
-### 2. Mary Had a Little Lamb
-- **Duration**: ~1 minute
-- **Tempo**: Moderate (100 BPM)
-- **Complexity**: Simple melody with consistent rhythm
-- **Expected Accuracy**: 90%+
-- **Use For**: Key signature detection, basic transcription
-### 3. Happy Birthday (Piano Solo)
-- **Duration**: ~1 minute
-- **Tempo**: Moderate (120 BPM)
-- **Complexity**: Simple melody with occasional harmony
-- **Expected Accuracy**: 85%+
-- **Use For**: Time signature detection (3/4 time)
-## Classical Piano
-Test with well-known classical pieces to verify quality.
-### 4. Chopin - Nocturne Op. 9 No. 2
-- **Duration**: 4-5 minutes
-- **Tempo**: Andante (60-70 BPM)
-- **Complexity**: Expressive melody with arpeggiated accompaniment
-- **Expected Accuracy**: 75-80%
-- **Use For**:
-  - Pedal sustain handling
-  - Rubato tempo changes
-  - Expressive timing
-**Challenges**:
-- Overlapping notes from pedal
-- Tempo fluctuations
-- Decorative grace notes
-### 5. Beethoven - Für Elise
-- **Duration**: 3 minutes
-- **Tempo**: Poco moto (120-130 BPM)
-- **Complexity**: Famous melody with consistent rhythm
-- **Expected Accuracy**: 80-85%
-- **Use For**:
-  - A minor key signature
-  - Repeated patterns
-  - Multiple sections
-**Challenges**:
-- Fast 16th note passages
-- Dynamic contrasts
-### 6. Mozart - Piano Sonata K. 545 (1st Movement)
-- **Duration**: 3-4 minutes
-- **Tempo**: Allegro (120-140 BPM)
-- **Complexity**: Clear melody with Alberti bass
-- **Expected Accuracy**: 75-80%
-- **Use For**:
-  - C major scale passages
-  - Alberti bass pattern recognition
-  - Classical form
-**Challenges**:
-- Fast running passages
-- Hand coordination
-## Pop Piano Covers
-Test with contemporary music to verify modern styles.
-### 7. Let It Be (Piano Cover)
-- **Duration**: 3-4 minutes
-- **Tempo**: Moderate (76 BPM)
-- **Complexity**: Block chords with melody
-- **Expected Accuracy**: 70-75%
-- **Use For**:
-  - Chord detection
-  - Popular music transcription
-  - Mixed rhythm patterns
-**Challenges**:
-- Dense chords
-- Vocal line vs accompaniment
-### 8. Someone Like You (Piano Cover)
-- **Duration**: 4-5 minutes
-- **Tempo**: Slow (67 BPM)
-- **Complexity**: Arpeggiated chords with melody
-- **Expected Accuracy**: 70-75%
-- **Use For**:
-  - Sustained notes
-  - Emotional expression
-  - Modern pop harmony
-**Challenges**:
-- Overlapping arpeggios
-- Pedal sustain
-### 9. River Flows in You (Original Piano)
-- **Duration**: 3-4 minutes
-- **Tempo**: Moderato (110 BPM)
-- **Complexity**: Flowing arpeggios with melody
-- **Expected Accuracy**: 75-80%
-- **Use For**:
-  - Continuous motion
-  - Pattern recognition
-  - Popular instrumental
-**Challenges**:
-- Rapid note sequences
-- Consistent texture
-## Jazz Piano
-Test improvisation and complex harmony.
-### 10. Bill Evans - Waltz for Debby
-- **Duration**: 5-7 minutes
-- **Tempo**: Moderate waltz (140-160 BPM)
-- **Complexity**: Jazz voicings, walking bass, improvisation
-- **Expected Accuracy**: 60-70%
-- **Use For**:
-  - Jazz harmony
-  - 3/4 time signature
-  - Complex chord voicings
-**Challenges**:
-- Extended chords (7ths, 9ths, 11ths)
-- Improvised passages
-- Swing feel
-### 11. Oscar Peterson - C Jam Blues
-- **Duration**: 3-4 minutes
-- **Tempo**: Fast (200+ BPM)
-- **Complexity**: Blues progression with virtuosic runs
-- **Expected Accuracy**: 55-65%
-- **Use For**:
-  - Fast tempo handling
-  - Blues scale
-  - Virtuosic passages
-**Challenges**:
-- Extremely fast notes
-- Grace notes and ornaments
-- Complex rhythm
-## Complex/Challenging
-Stress tests for the transcription system.
-### 12. Flight of the Bumblebee (Piano)
-- **Duration**: 1-2 minutes
-- **Tempo**: Presto (170-200 BPM)
-- **Complexity**: Extremely fast chromatic runs
-- **Expected Accuracy**: 50-60%
-- **Use For**:
-  - Stress testing
-  - Fast passage detection
-  - Chromatic scales
-**Challenges**:
-- Very fast notes (32nd notes)
-- Chromatic passages
-- Continuous motion
-### 13. Liszt - La Campanella
-- **Duration**: 4-5 minutes
-- **Tempo**: Allegretto (120 BPM)
-- **Complexity**: Virtuosic with wide leaps and rapid passages
-- **Expected Accuracy**: 55-65%
-- **Use For**:
-  - Wide register jumps
-  - Repeated notes
-  - Virtuosic technique
-**Challenges**:
-- Octave leaps
-- Repeated staccato notes
-- Ornamentation
-### 14. Rachmaninoff - Prelude in C# Minor
-- **Duration**: 3-4 minutes
-- **Tempo**: Lento (60 BPM) to Agitato
-- **Complexity**: Dense chords, dramatic dynamics
-- **Expected Accuracy**: 60-70%
-- **Use For**:
-  - Heavy chords
-  - Dramatic contrasts
-  - Multiple voices
-**Challenges**:
-- 6+ note chords
-- Extreme dynamics
-- Multiple simultaneous voices
-## Edge Cases
-Special cases to test error handling and boundaries.
-### 15. Prepared Piano / Extended Techniques
-- **Use For**: Testing unusual timbres
-- **Expected Accuracy**: 30-50%
-- **Expected Behavior**: Should handle gracefully
-### 16. Piano with Background Noise
-- **Use For**: Testing source separation quality
-- **Expected Accuracy**: Variable
-- **Expected Behavior**: Should isolate piano reasonably
-### 17. Poor Audio Quality
-- **Use For**: Testing robustness
-- **Expected Accuracy**: Reduced
-- **Expected Behavior**: Should not crash
-### 18. Non-Piano Video (Should Fail Gracefully)
-- **Examples**:
-  - Drum solo
-  - A cappella singing
-  - Electronic music
-- **Expected Behavior**: Should complete but with poor results
-## Testing Criteria
-### Accuracy Metrics
-**High Priority (Must Work Well)**:
-- Note pitch accuracy: 85%+ for simple pieces
-- Note onset timing: 80%+ within 50ms
-- Note duration: 70%+ within one quantization unit
-**Medium Priority (Should Work)**:
-- Key signature detection: 80%+ accuracy
-- Time signature detection: 75%+ accuracy
-- Tempo detection: 70%+ within 10 BPM
-**Low Priority (Nice to Have)**:
-- Dynamic markings: Not implemented in MVP
-- Articulations: Not implemented in MVP
-- Pedal markings: Not implemented in MVP
-### Performance Benchmarks
-| Video Duration | Target Processing Time (GPU) | Max Processing Time (CPU) |
-|---------------|------------------------------|---------------------------|
-| 1 minute      | < 30 seconds                 | < 5 minutes               |
-| 3 minutes     | < 2 minutes                  | < 10 minutes              |
-| 5 minutes     | < 3 minutes                  | < 15 minutes              |
-### Success Criteria
-A transcription is considered successful if:
-1. **Job completes without error**: 95%+ success rate
-2. **Basic pitch accuracy**: 70%+ correct notes for simple pieces, 60%+ for complex
-3. **Playback sounds recognizable**: User can identify the piece
-4. **Usable for editing**: Notation is clean enough to edit and correct
-### Quality Grades
-**A (90%+ accuracy)**:
-- Simple melodies
-- Clear recordings
-- Slow to moderate tempo
-- Minimal harmony
-**B (75-89% accuracy)**:
-- Standard classical pieces
-- Good recordings
-- Moderate tempo
-- Some harmony
-**C (60-74% accuracy)**:
-- Complex pieces
-- Standard recordings
-- Fast tempo or complex harmony
-- Multiple voices
-**D (50-59% accuracy)**:
-- Virtuosic pieces
-- Poor recordings
-- Very fast or complex
-- Jazz/improvisation
-**F (< 50% accuracy)**:
-- Extended techniques
-- Very poor quality
-- Non-piano instruments
-- Extreme complexity
-## Using Test Videos
-### Manual Testing
-1. Submit each video URL through the UI
-2. Wait for processing to complete
-3. Check for errors in each pipeline stage
-4. Download and inspect MusicXML output
-5. Load in MuseScore or similar to verify quality
-6. Note accuracy, timing issues, and artifacts
-### Automated Testing
-```python
-# In tests/test_integration.py
-@pytest.mark.parametrize("video_id,expected_grade", [
-    ("simple_melody", "A"),
-    ("fur_elise", "B"),
-    ("jazz_piece", "C"),
-])
-def test_transcription_quality(video_id, expected_grade):
-    """Test transcription quality meets expectations."""
-    result = transcribe_video(video_id)
-    assert result['status'] == 'success'
-    accuracy = calculate_accuracy(result['musicxml'])
-    assert accuracy >= grade_threshold(expected_grade)
-```
-### Regression Testing
-Maintain a suite of test videos and track accuracy over time:
-```bash
-# Run regression test suite
-python scripts/run_regression_tests.py
-# Compare with baseline
-python scripts/compare_results.py --baseline v1.0.0 --current HEAD
-```
-## Maintaining Test Collection
-1. **Add new test cases** when bugs are found
-2. **Update expected accuracy** as system improves
-3. **Remove broken links** and replace with alternatives
-4. **Document edge cases** that reveal system limitations
-5. **Share results** with team to track progress
-## Test Video Sources
-When selecting test videos:
-- ✅ Use videos with clear audio
-- ✅ Prefer solo piano recordings
-- ✅ Choose varied difficulty levels
-- ✅ Include different musical styles
-- ✅ Ensure videos are publicly accessible
-- ✅ Respect copyright and fair use
-- ❌ Avoid videos with talking/commentary
-- ❌ Avoid poor audio quality unless testing robustness
-- ❌ Don't use videos over 15 minutes (MVP limit)

frontend/Dockerfile CHANGED Viewed

@@ -7,7 +7,7 @@ WORKDIR /app
 COPY package*.json ./
 # Install dependencies
-RUN npm install
 # Copy application code
 COPY . .

 COPY package*.json ./
 # Install dependencies
+RUN npm install --legacy-peer-deps
 # Copy application code
 COPY . .

frontend/package.json CHANGED Viewed

@@ -23,7 +23,7 @@
   "devDependencies": {
     "@eslint/js": "^9.39.1",
     "@testing-library/jest-dom": "^6.1.5",
-    "@testing-library/react": "^14.1.2",
     "@testing-library/user-event": "^14.5.1",
     "@types/node": "^24.10.1",
     "@types/react": "^19.2.5",

   "devDependencies": {
     "@eslint/js": "^9.39.1",
     "@testing-library/jest-dom": "^6.1.5",
+    "@testing-library/react": "^15.0.0",
     "@testing-library/user-event": "^14.5.1",
     "@types/node": "^24.10.1",
     "@types/react": "^19.2.5",

frontend/src/components/JobSubmission.css CHANGED Viewed

@@ -43,21 +43,42 @@ button:hover {
 .progress-container {
   text-align: center;
 }
-.progress-bar {
   width: 100%;
   height: 30px;
   background-color: #f0f0f0;
   border-radius: 15px;
   overflow: hidden;
-  margin: 1rem 0;
 }
-.progress-fill {
   height: 100%;
-  background-color: #28a745;
   transition: width 0.3s ease;
 }
 .progress-text {

 .progress-container {
   text-align: center;
+  padding: 2rem;
 }
+.progress-container h2 {
+  margin-bottom: 1rem;
+  color: #333;
+}
+.progress-bar-container {
   width: 100%;
   height: 30px;
   background-color: #f0f0f0;
   border-radius: 15px;
   overflow: hidden;
+  margin: 1.5rem 0;
+  border: 1px solid #ddd;
 }
+.progress-bar {
   height: 100%;
+  background: linear-gradient(90deg, #007bff, #0056b3);
   transition: width 0.3s ease;
+  box-shadow: inset 0 2px 4px rgba(0, 0, 0, 0.1);
+}
+.progress-message {
+  color: #555;
+  font-size: 1rem;
+  margin: 0.5rem 0;
+  font-weight: 500;
+}
+.progress-info {
+  color: #888;
+  font-size: 0.9rem;
+  margin-top: 1rem;
 }
 .progress-text {

frontend/src/components/JobSubmission.tsx CHANGED Viewed

@@ -1,8 +1,9 @@
 /**
  * Job submission form with progress tracking.
  */
-import { useState } from 'react';
-import { submitTranscription } from '../api/client';
 import './JobSubmission.css';
 interface JobSubmissionProps {
@@ -12,8 +13,20 @@ interface JobSubmissionProps {
 export function JobSubmission({ onComplete, onJobSubmitted }: JobSubmissionProps) {
   const [youtubeUrl, setYoutubeUrl] = useState('');
-  const [status, setStatus] = useState<'idle' | 'submitting' | 'failed'>('idle');
   const [error, setError] = useState<string | null>(null);
   const validateUrl = (value: string): string | null => {
     try {
@@ -38,13 +51,91 @@ export function JobSubmission({ onComplete, onJobSubmitted }: JobSubmissionProps
     setStatus('submitting');
     try {
-      const response = await submitTranscription(youtubeUrl, { instruments: ['piano'] });
       setYoutubeUrl('');
       if (onJobSubmitted) onJobSubmitted(response);
-      if (onComplete) onComplete(response.job_id);
-      // Reset to idle so the form stays usable after submissions in tests.
-      setStatus('idle');
     } catch (err) {
       setStatus('failed');
       setError(err instanceof Error ? err.message : 'Failed to submit job');
@@ -66,7 +157,7 @@ export function JobSubmission({ onComplete, onJobSubmitted }: JobSubmissionProps
               type="text"
               value={youtubeUrl}
               onChange={(e) => setYoutubeUrl(e.target.value)}
-              placeholder="YouTube URL"
               required
               onBlur={() => {
                 const validation = validateUrl(youtubeUrl);
@@ -80,6 +171,19 @@ export function JobSubmission({ onComplete, onJobSubmitted }: JobSubmissionProps
         </form>
       )}
       {status === 'failed' && (
         <div className="error-message">
           <h2>✗ Transcription Failed</h2>

 /**
  * Job submission form with progress tracking.
  */
+import { useState, useRef, useEffect } from 'react';
+import { api } from '../api/client';
+import type { ProgressUpdate } from '../api/client';
 import './JobSubmission.css';
 interface JobSubmissionProps {
 export function JobSubmission({ onComplete, onJobSubmitted }: JobSubmissionProps) {
   const [youtubeUrl, setYoutubeUrl] = useState('');
+  const [status, setStatus] = useState<'idle' | 'submitting' | 'processing' | 'failed'>('idle');
   const [error, setError] = useState<string | null>(null);
+  const [progress, setProgress] = useState(0);
+  const [progressMessage, setProgressMessage] = useState('');
+  const wsRef = useRef<WebSocket | null>(null);
+  // Cleanup WebSocket on unmount
+  useEffect(() => {
+    return () => {
+      if (wsRef.current) {
+        wsRef.current.close();
+      }
+    };
+  }, []);
   const validateUrl = (value: string): string | null => {
     try {
     setStatus('submitting');
     try {
+      const response = await api.submitJob(youtubeUrl, { instruments: ['piano'] });
       setYoutubeUrl('');
       if (onJobSubmitted) onJobSubmitted(response);
+      // Switch to processing status and connect WebSocket
+      setStatus('processing');
+      setProgress(0);
+      setProgressMessage('Starting transcription...');
+      // Connect WebSocket for progress updates
+      wsRef.current = api.connectWebSocket(
+        response.job_id,
+        (update: ProgressUpdate) => {
+          if (update.type === 'progress') {
+            setProgress(update.progress || 0);
+            setProgressMessage(update.message || `Processing: ${update.stage}`);
+          } else if (update.type === 'completed') {
+            setProgress(100);
+            setProgressMessage('Transcription complete!');
+            if (wsRef.current) {
+              wsRef.current.close();
+              wsRef.current = null;
+            }
+            // Wait a moment to show completion, then switch to editor
+            setTimeout(() => {
+              if (onComplete) onComplete(response.job_id);
+              setStatus('idle');
+            }, 500);
+          } else if (update.type === 'error') {
+            setStatus('failed');
+            setError(update.error?.message || 'Transcription failed');
+            if (wsRef.current) {
+              wsRef.current.close();
+              wsRef.current = null;
+            }
+          }
+        },
+        (error) => {
+          console.error('WebSocket error:', error);
+          setStatus('failed');
+          setError('Connection error. Please try again.');
+        }
+      );
+      // Poll for progress updates as fallback (in case WebSocket misses early updates)
+      const pollInterval = setInterval(async () => {
+        try {
+          const jobStatus = await api.getJobStatus(response.job_id);
+          setProgress(jobStatus.progress);
+          setProgressMessage(jobStatus.status_message || 'Processing...');
+          if (jobStatus.status === 'completed') {
+            clearInterval(pollInterval);
+            setProgress(100);
+            setProgressMessage('Transcription complete!');
+            if (wsRef.current) {
+              wsRef.current.close();
+              wsRef.current = null;
+            }
+            setTimeout(() => {
+              if (onComplete) onComplete(response.job_id);
+              setStatus('idle');
+            }, 500);
+          } else if (jobStatus.status === 'failed') {
+            clearInterval(pollInterval);
+            setStatus('failed');
+            setError(jobStatus.error?.message || 'Transcription failed');
+            if (wsRef.current) {
+              wsRef.current.close();
+              wsRef.current = null;
+            }
+          }
+        } catch (err) {
+          console.error('Polling error:', err);
+        }
+      }, 1000); // Poll every second
+      // Store interval ID for cleanup
+      const currentInterval = pollInterval;
+      return () => {
+        clearInterval(currentInterval);
+        if (wsRef.current) {
+          wsRef.current.close();
+        }
+      };
     } catch (err) {
       setStatus('failed');
       setError(err instanceof Error ? err.message : 'Failed to submit job');
               type="text"
               value={youtubeUrl}
               onChange={(e) => setYoutubeUrl(e.target.value)}
+              placeholder="https://www.youtube.com/watch?v=..."
               required
               onBlur={() => {
                 const validation = validateUrl(youtubeUrl);
         </form>
       )}
+      {status === 'processing' && (
+        <div className="progress-container">
+          <h2>Transcribing...</h2>
+          <div className="progress-bar-container">
+            <div className="progress-bar" style={{ width: `${progress}%` }} />
+          </div>
+          <p className="progress-message">{progress}% - {progressMessage}</p>
+          <p className="progress-info">
+            This may take 1-2 minutes. Please don't close this window.
+          </p>
+        </div>
+      )}
       {status === 'failed' && (
         <div className="error-message">
           <h2>✗ Transcription Failed</h2>

start.sh ADDED Viewed

	@@ -0,0 +1,139 @@

+#!/bin/bash
+# Rescored Startup Script
+# Starts all services: Redis, Backend API, Celery Worker, and Frontend
+set -e  # Exit on error
+# Colors for output
+GREEN='\033[0;32m'
+BLUE='\033[0;34m'
+YELLOW='\033[1;33m'
+RED='\033[0;31m'
+NC='\033[0m' # No Color
+echo -e "${BLUE}======================================${NC}"
+echo -e "${BLUE}  Rescored - Starting All Services${NC}"
+echo -e "${BLUE}======================================${NC}"
+echo ""
+# Check if Redis is running
+echo -e "${YELLOW}Checking Redis...${NC}"
+if ! redis-cli ping > /dev/null 2>&1; then
+    echo -e "${YELLOW}Starting Redis service...${NC}"
+    brew services start redis
+    sleep 2
+    if ! redis-cli ping > /dev/null 2>&1; then
+        echo -e "${RED}✗ Failed to start Redis${NC}"
+        exit 1
+    fi
+fi
+echo -e "${GREEN}✓ Redis is running${NC}"
+echo ""
+# Check virtual environment exists
+if [ ! -d "backend/.venv" ]; then
+    echo -e "${RED}✗ Backend virtual environment not found at backend/.venv${NC}"
+    echo -e "${YELLOW}Please set up the backend first (see README.md)${NC}"
+    exit 1
+fi
+# Check frontend dependencies
+if [ ! -d "frontend/node_modules" ]; then
+    echo -e "${YELLOW}Installing frontend dependencies...${NC}"
+    cd frontend
+    npm install
+    cd ..
+    echo -e "${GREEN}✓ Frontend dependencies installed${NC}"
+    echo ""
+fi
+# Check storage directory
+if [ ! -d "storage" ]; then
+    echo -e "${YELLOW}Creating storage directory...${NC}"
+    mkdir -p storage
+fi
+# Check for YouTube cookies
+if [ ! -f "storage/youtube_cookies.txt" ]; then
+    echo -e "${YELLOW}⚠️  Warning: YouTube cookies not found at storage/youtube_cookies.txt${NC}"
+    echo -e "${YELLOW}   You will need to set this up for video downloads to work${NC}"
+    echo -e "${YELLOW}   See README.md for instructions${NC}"
+    echo ""
+fi
+echo -e "${BLUE}Starting services...${NC}"
+echo -e "${YELLOW}Press Ctrl+C to stop all services${NC}"
+echo ""
+# Function to cleanup on exit
+cleanup() {
+    echo ""
+    echo -e "${YELLOW}Stopping all services...${NC}"
+    jobs -p | xargs -r kill 2>/dev/null
+    echo -e "${GREEN}✓ All services stopped${NC}"
+    exit 0
+}
+trap cleanup SIGINT SIGTERM
+# Create logs directory and files
+mkdir -p logs
+rm -f logs/api.log logs/worker.log logs/frontend.log
+touch logs/api.log logs/worker.log logs/frontend.log
+# Start Backend API
+echo -e "${BLUE}[1/3] Starting Backend API...${NC}"
+cd backend
+source .venv/bin/activate
+uvicorn main:app --host 0.0.0.0 --port 8000 --reload > ../logs/api.log 2>&1 &
+API_PID=$!
+cd ..
+echo -e "${GREEN}✓ Backend API started (PID: $API_PID)${NC}"
+echo -e "      Logs: logs/api.log"
+echo ""
+# Start Celery Worker
+echo -e "${BLUE}[2/3] Starting Celery Worker...${NC}"
+cd backend
+source .venv/bin/activate
+# Use --pool=solo for macOS to avoid fork() issues with ML libraries
+celery -A tasks worker --loglevel=info --pool=solo > ../logs/worker.log 2>&1 &
+WORKER_PID=$!
+cd ..
+echo -e "${GREEN}✓ Celery Worker started (PID: $WORKER_PID)${NC}"
+echo -e "      Logs: logs/worker.log"
+echo ""
+# Start Frontend
+echo -e "${BLUE}[3/3] Starting Frontend...${NC}"
+cd frontend
+npm run dev > ../logs/frontend.log 2>&1 &
+FRONTEND_PID=$!
+cd ..
+echo -e "${GREEN}✓ Frontend started (PID: $FRONTEND_PID)${NC}"
+echo -e "      Logs: logs/frontend.log"
+echo ""
+# Wait a moment for services to start
+sleep 3
+echo -e "${GREEN}======================================${NC}"
+echo -e "${GREEN}  All Services Running!${NC}"
+echo -e "${GREEN}======================================${NC}"
+echo ""
+echo -e "${BLUE}Services:${NC}"
+echo -e "  Frontend:  ${GREEN}http://localhost:5173${NC}"
+echo -e "  Backend:   ${GREEN}http://localhost:8000${NC}"
+echo -e "  API Docs:  ${GREEN}http://localhost:8000/docs${NC}"
+echo ""
+echo -e "${BLUE}Logs:${NC}"
+echo -e "  API:       tail -f logs/api.log"
+echo -e "  Worker:    tail -f logs/worker.log"
+echo -e "  Frontend:  tail -f logs/frontend.log"
+echo ""
+echo -e "${YELLOW}Press Ctrl+C to stop all services${NC}"
+echo ""
+# Wait for all background processes
+wait

stop.sh ADDED Viewed

	@@ -0,0 +1,18 @@

+#!/bin/bash
+# Rescored Stop Script
+# Stops all running Rescored services
+# Colors for output
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+NC='\033[0m' # No Color
+echo -e "${YELLOW}Stopping Rescored services...${NC}"
+# Kill processes by name
+pkill -f "uvicorn main:app" && echo -e "${GREEN}✓ Stopped Backend API${NC}"
+pkill -f "celery -A tasks worker" && echo -e "${GREEN}✓ Stopped Celery Worker${NC}"
+pkill -f "vite" && echo -e "${GREEN}✓ Stopped Frontend${NC}"
+echo -e "${GREEN}All services stopped${NC}"