# Plan: Standalone Stem Separator App for HuggingFace Spaces ## Context The user wants a standalone web app (separate from Studio13-v3) deployed on HuggingFace Spaces for audio stem separation. Users upload audio, choose stems, run BS-RoFormer model inference, then play/download individual stems with SoundCloud-style waveform visualization. The app must be mobile-responsive. ## Architecture ``` HuggingFace Docker Space (port 7860) ├── FastAPI backend (Python) │ ├── File upload + temp storage │ ├── audio-separator inference (BS-RoFormer 6-stem) │ ├── SSE progress streaming │ └── Serves built React frontend as static files └── React frontend (Vite build → static) ├── Upload zone (drag-drop) ├── WaveSurfer.js v7 waveforms (SoundCloud-style bars) ├── Stem selection checkboxes ├── Progress bar (SSE-driven) └── Stem result rows (waveform + play + download) ``` **Model**: `jarredou/BS-ROFO-SW-Fixed` (699MB .ckpt) - BS-RoFormer, 6 stems: Vocals, Drums, Bass, Guitar, Piano, Other ## Deliverable A self-contained prompt (below) the user can paste into another Claude Code window to build the entire app from scratch. --- ## Prompt to use in another window The prompt is designed to be comprehensive and self-contained. Copy everything between the `---START PROMPT---` and `---END PROMPT---` markers. ---START PROMPT--- Build a standalone web app for HuggingFace Spaces that does audio stem separation. The app should be production-ready, responsive, and polished. ## What the app does 1. User uploads an audio file (drag-drop or file picker) 2. Original track appears with a SoundCloud-style scrolling peak waveform + play button 3. User selects which stems to separate via checkboxes (Vocals, Drums, Bass, Guitar, Piano, Other) 4. User clicks "Separate" - progress bar shows real-time progress via SSE 5. Once done, each stem appears in its own row: colored label + waveform (flex-grow) + play + download 6. "Download All" button creates a ZIP of all stems ## Tech Stack - **Frontend**: React 19, TypeScript, Tailwind CSS v4, Vite 5 - **Backend**: Python 3.11, FastAPI, uvicorn - **Waveform**: WaveSurfer.js v7 (`wavesurfer.js` npm package) - **Model**: `jarredou/BS-ROFO-SW-Fixed` from HuggingFace (BS-RoFormer, 699MB .ckpt, 6 stems) - **Inference**: `audio-separator` Python package - **Progress**: SSE (Server-Sent Events) via `sse-starlette` - **Deploy**: HuggingFace Docker Space, port 7860 ## Directory Structure ``` stem-separator/ ├── Dockerfile ├── README.md # HF Spaces YAML front matter ├── .dockerignore ├── backend/ │ ├── main.py # FastAPI: routes, SSE, static serving │ ├── separator.py # audio-separator wrapper with progress callback │ ├── file_manager.py # Temp file lifecycle, cleanup │ ├── task_queue.py # asyncio queue (1 concurrent separation) │ └── requirements.txt ├── frontend/ │ ├── index.html │ ├── package.json │ ├── vite.config.ts │ ├── tsconfig.json │ └── src/ │ ├── main.tsx │ ├── App.tsx │ ├── index.css # Tailwind theme (dark, music-oriented) │ ├── api.ts # fetch wrappers + SSE EventSource │ ├── types.ts # Shared interfaces │ ├── hooks/ │ │ ├── useWaveSurfer.ts # WaveSurfer.js v7 hook │ │ └── useSeparation.ts # Upload->Separate->Results state machine │ └── components/ │ ├── UploadZone.tsx │ ├── OriginalTrack.tsx │ ├── StemCheckboxes.tsx │ ├── SeparateButton.tsx │ ├── ProgressBar.tsx │ ├── WaveformPlayer.tsx # Reusable: [play] [waveform===] [time] [download?] │ ├── StemRow.tsx │ ├── StemResults.tsx │ └── Footer.tsx ``` ## Backend Details ### API Endpoints ``` POST /api/upload - Multipart file upload (max 100MB), returns { job_id, filename } POST /api/separate - Body: { job_id, stems: string[] }, enqueues task GET /api/progress/{id} - SSE stream: { state, progress, message, stems? } GET /api/audio/{id}/{f} - Serve audio for WaveSurfer playback GET /api/download/{id}/{f} - Download stem with Content-Disposition: attachment GET /api/download/{id}/all - ZIP of all stems, streamed DELETE /api/job/{id} - Manual cleanup ``` ### `backend/separator.py` - Separation Logic ```python # Singleton pattern - keep model loaded between requests # Adapted from this working pattern: from audio_separator.separator import Separator class StemSeparatorService: _instance = None _model_loaded = False def __new__(cls): if cls._instance is None: cls._instance = super().__new__(cls) return cls._instance def load_model(self): if self._model_loaded: return self.separator = Separator( output_dir="/tmp/output", output_format="WAV", output_single_stem=None, ) self.separator.load_model(model_filename="BS-Rofo-SW-Fixed.ckpt") self._model_loaded = True def separate(self, input_path, output_dir, stems, progress_callback): # Run separation self.separator.output_dir = output_dir output_files = self.separator.separate(input_path) # Map output files to stem names using aliases: # vocals: [vocals, vocal, voice, singing] # drums: [drums, drum, percussion] # bass: [bass] # guitar: [guitar, guitars] # piano: [piano, keys, keyboard] # other: [other, instrumental, residual, remainder, no_] # Rename files from "input_(Vocals).wav" -> "Vocals.wav" # Return dict of stem_name -> file_path ``` **tqdm monkey-patching for progress**: Before importing audio-separator, patch `tqdm.std.tqdm` with a subclass that calls `progress_callback("analyzing", fraction)` in its `update()` method. Map tqdm progress 0-1 to overall progress 0.2-0.9. ### `backend/task_queue.py` - Concurrency - `asyncio.Queue(maxsize=5)` - max 5 pending jobs, return 429 if full - Single worker consuming tasks sequentially (BS-RoFormer needs ~4-6GB RAM) - Job progress stored in a dict, consumed by SSE endpoints ### `backend/file_manager.py` - File Lifecycle - Base dir: `/tmp/stem-sep/` - Each job: `/tmp/stem-sep/{uuid}/` with `input.{ext}` and stem outputs - Auto-cleanup: background task every 5 minutes, deletes dirs older than 30 minutes ### `backend/main.py` - FastAPI App - Register API routes BEFORE the static file mount - Mount `frontend/dist/` at `/` with `html=True` for SPA fallback - On startup: launch queue worker + cleanup loop as `asyncio.create_task` - SSE via `sse-starlette`'s `EventSourceResponse` ### `backend/requirements.txt` ``` fastapi>=0.104.0 uvicorn[standard]>=0.24.0 python-multipart>=0.0.6 sse-starlette>=1.8.0 audio-separator[cpu]>=0.17.0 pydub>=0.25.1 aiofiles>=23.2.1 ``` ## Frontend Details ### State Machine (`useSeparation` hook) ```typescript type AppState = | { phase: "idle" } | { phase: "uploading"; progress: number } | { phase: "uploaded"; jobId: string; filename: string } | { phase: "separating"; jobId: string; state: string; progress: number; message: string } | { phase: "done"; jobId: string; stems: StemResult[] } | { phase: "error"; message: string } ``` Use `useReducer` for clean state transitions. SSE subscription in `separate()` action. ### WaveSurfer.js v7 Configuration (SoundCloud-style) ```typescript WaveSurfer.create({ container: containerRef.current, url: audioUrl, waveColor: color + "66", // 40% opacity progressColor: color, // full opacity for played portion height: 64, // 48 for stem rows barWidth: 2, barGap: 1, barRadius: 2, cursorWidth: 1, cursorColor: "#ffffff40", normalize: true, interact: true, // click to seek }); ``` Import: `import WaveSurfer from 'wavesurfer.js'` ### `WaveformPlayer.tsx` - Reusable Component Layout: `[play/pause circle] [waveform div (flex-grow)] [MM:SS / MM:SS] [download icon?]` - Play button: circle with play/pause icon - Waveform container: `flex-grow` div, WaveSurfer renders into it - Time: `currentTime / duration` in `M:SS` format - Download: optional, shown via `onDownload` prop **Exclusive playback**: When one player starts, dispatch `window.dispatchEvent(new CustomEvent("stem-play", { detail: instanceId }))`. All other players listen and pause. ### Stem Colors ```typescript const STEM_CONFIG = { Vocals: { color: "#ec4899", icon: "mic" }, // pink Drums: { color: "#f97316", icon: "drum" }, // orange Bass: { color: "#3b82f6", icon: "music" }, // blue Guitar: { color: "#a855f7", icon: "guitar" }, // purple Piano: { color: "#06b6d4", icon: "piano" }, // cyan Other: { color: "#22c55e", icon: "waveform" }, // green }; ``` ### `UploadZone.tsx` Drag-and-drop zone with dashed border. Accepts: wav, mp3, flac, ogg, m4a, aac (max 100MB). Shows file icon + "Drop audio file here or click to browse" + supported formats. Drag-over state: border color changes to accent. Hidden ``. ### `StemRow.tsx` Desktop layout: `[colored dot + label (w-24)] [WaveformPlayer (flex-grow)]` Mobile layout: label on top row, waveform on bottom row (`flex-col sm:flex-row`) ### Mobile Responsive Strategy - Main container: `max-w-3xl mx-auto px-4` - `StemCheckboxes`: `grid-cols-2 md:grid-cols-3` - `StemRow`: `flex-col sm:flex-row` (label stacks above waveform on mobile) - Waveform height: `h-12 md:h-16` - Touch targets: minimum 44px - Font sizes: `text-sm md:text-base` ### Theme (index.css) ```css @import "tailwindcss"; @theme { --color-bg-primary: #0a0a0f; --color-bg-secondary: #13131a; --color-bg-card: #1a1a24; --color-bg-hover: #252530; --color-text-primary: #e8e8ef; --color-text-secondary: #8888a0; --color-accent: #7c3aed; --color-accent-hover: #6d28d9; --color-border: #2a2a38; } body { background-color: var(--color-bg-primary); color: var(--color-text-primary); } ``` ## Docker Setup ### Dockerfile ```dockerfile FROM python:3.11-slim RUN apt-get update && apt-get install -y --no-install-recommends \ ffmpeg curl && rm -rf /var/lib/apt/lists/* RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \ && apt-get install -y nodejs && rm -rf /var/lib/apt/lists/* WORKDIR /app COPY backend/requirements.txt backend/requirements.txt RUN pip install --no-cache-dir -r backend/requirements.txt COPY frontend/ frontend/ RUN cd frontend && npm ci && npm run build COPY backend/ backend/ EXPOSE 7860 RUN useradd -m -u 1000 user USER user ENV HOME=/home/user PATH=/home/user/.local/bin:$PATH CMD ["python", "-m", "uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "7860"] ``` ### README.md (HF Spaces metadata) ```yaml --- title: Stem Separator emoji: 🎵 colorFrom: purple colorTo: pink sdk: docker app_port: 7860 pinned: false license: mit --- ``` ### .dockerignore ``` frontend/node_modules frontend/dist **/__pycache__ *.pyc .git ``` ## Implementation Order 1. Scaffold project structure (all dirs + config files) 2. `backend/requirements.txt` + `backend/file_manager.py` + `backend/separator.py` 3. `backend/task_queue.py` + `backend/main.py` (all API endpoints + SSE) 4. Frontend scaffold: `package.json`, `vite.config.ts`, `tsconfig.json`, `index.html`, `index.css` 5. `types.ts` + `api.ts` (API client + SSE subscription) 6. `useWaveSurfer.ts` hook 7. `useSeparation.ts` hook (state machine) 8. Components: `UploadZone` -> `WaveformPlayer` -> `OriginalTrack` -> `StemCheckboxes` -> `SeparateButton` -> `ProgressBar` -> `StemRow` -> `StemResults` -> `Footer` -> `App.tsx` 9. `Dockerfile` + `README.md` + `.dockerignore` ## Verification 1. Local dev: `cd frontend && npm run dev` (with Vite proxy to backend) 2. Local backend: `cd backend && uvicorn main:app --port 7860` 3. Docker build: `docker build -t stem-sep .` 4. Docker run: `docker run -p 7860:7860 stem-sep` 5. Test: upload a song, select all 6 stems, verify progress + waveforms + play + download ---END PROMPT---