stem-separator / plan.md
sourav-das's picture
Upload folder using huggingface_hub
7dfae77 verified
# Plan: Standalone Stem Separator App for HuggingFace Spaces
## Context
The user wants a standalone web app (separate from Studio13-v3) deployed on HuggingFace Spaces for audio stem separation. Users upload audio, choose stems, run BS-RoFormer model inference, then play/download individual stems with SoundCloud-style waveform visualization. The app must be mobile-responsive.
## Architecture
```
HuggingFace Docker Space (port 7860)
├── FastAPI backend (Python)
│ ├── File upload + temp storage
│ ├── audio-separator inference (BS-RoFormer 6-stem)
│ ├── SSE progress streaming
│ └── Serves built React frontend as static files
└── React frontend (Vite build → static)
├── Upload zone (drag-drop)
├── WaveSurfer.js v7 waveforms (SoundCloud-style bars)
├── Stem selection checkboxes
├── Progress bar (SSE-driven)
└── Stem result rows (waveform + play + download)
```
**Model**: `jarredou/BS-ROFO-SW-Fixed` (699MB .ckpt) - BS-RoFormer, 6 stems: Vocals, Drums, Bass, Guitar, Piano, Other
## Deliverable
A self-contained prompt (below) the user can paste into another Claude Code window to build the entire app from scratch.
---
## Prompt to use in another window
The prompt is designed to be comprehensive and self-contained. Copy everything between the `---START PROMPT---` and `---END PROMPT---` markers.
---START PROMPT---
Build a standalone web app for HuggingFace Spaces that does audio stem separation. The app should be production-ready, responsive, and polished.
## What the app does
1. User uploads an audio file (drag-drop or file picker)
2. Original track appears with a SoundCloud-style scrolling peak waveform + play button
3. User selects which stems to separate via checkboxes (Vocals, Drums, Bass, Guitar, Piano, Other)
4. User clicks "Separate" - progress bar shows real-time progress via SSE
5. Once done, each stem appears in its own row: colored label + waveform (flex-grow) + play + download
6. "Download All" button creates a ZIP of all stems
## Tech Stack
- **Frontend**: React 19, TypeScript, Tailwind CSS v4, Vite 5
- **Backend**: Python 3.11, FastAPI, uvicorn
- **Waveform**: WaveSurfer.js v7 (`wavesurfer.js` npm package)
- **Model**: `jarredou/BS-ROFO-SW-Fixed` from HuggingFace (BS-RoFormer, 699MB .ckpt, 6 stems)
- **Inference**: `audio-separator` Python package
- **Progress**: SSE (Server-Sent Events) via `sse-starlette`
- **Deploy**: HuggingFace Docker Space, port 7860
## Directory Structure
```
stem-separator/
├── Dockerfile
├── README.md # HF Spaces YAML front matter
├── .dockerignore
├── backend/
│ ├── main.py # FastAPI: routes, SSE, static serving
│ ├── separator.py # audio-separator wrapper with progress callback
│ ├── file_manager.py # Temp file lifecycle, cleanup
│ ├── task_queue.py # asyncio queue (1 concurrent separation)
│ └── requirements.txt
├── frontend/
│ ├── index.html
│ ├── package.json
│ ├── vite.config.ts
│ ├── tsconfig.json
│ └── src/
│ ├── main.tsx
│ ├── App.tsx
│ ├── index.css # Tailwind theme (dark, music-oriented)
│ ├── api.ts # fetch wrappers + SSE EventSource
│ ├── types.ts # Shared interfaces
│ ├── hooks/
│ │ ├── useWaveSurfer.ts # WaveSurfer.js v7 hook
│ │ └── useSeparation.ts # Upload->Separate->Results state machine
│ └── components/
│ ├── UploadZone.tsx
│ ├── OriginalTrack.tsx
│ ├── StemCheckboxes.tsx
│ ├── SeparateButton.tsx
│ ├── ProgressBar.tsx
│ ├── WaveformPlayer.tsx # Reusable: [play] [waveform===] [time] [download?]
│ ├── StemRow.tsx
│ ├── StemResults.tsx
│ └── Footer.tsx
```
## Backend Details
### API Endpoints
```
POST /api/upload - Multipart file upload (max 100MB), returns { job_id, filename }
POST /api/separate - Body: { job_id, stems: string[] }, enqueues task
GET /api/progress/{id} - SSE stream: { state, progress, message, stems? }
GET /api/audio/{id}/{f} - Serve audio for WaveSurfer playback
GET /api/download/{id}/{f} - Download stem with Content-Disposition: attachment
GET /api/download/{id}/all - ZIP of all stems, streamed
DELETE /api/job/{id} - Manual cleanup
```
### `backend/separator.py` - Separation Logic
```python
# Singleton pattern - keep model loaded between requests
# Adapted from this working pattern:
from audio_separator.separator import Separator
class StemSeparatorService:
_instance = None
_model_loaded = False
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
def load_model(self):
if self._model_loaded:
return
self.separator = Separator(
output_dir="/tmp/output",
output_format="WAV",
output_single_stem=None,
)
self.separator.load_model(model_filename="BS-Rofo-SW-Fixed.ckpt")
self._model_loaded = True
def separate(self, input_path, output_dir, stems, progress_callback):
# Run separation
self.separator.output_dir = output_dir
output_files = self.separator.separate(input_path)
# Map output files to stem names using aliases:
# vocals: [vocals, vocal, voice, singing]
# drums: [drums, drum, percussion]
# bass: [bass]
# guitar: [guitar, guitars]
# piano: [piano, keys, keyboard]
# other: [other, instrumental, residual, remainder, no_]
# Rename files from "input_(Vocals).wav" -> "Vocals.wav"
# Return dict of stem_name -> file_path
```
**tqdm monkey-patching for progress**: Before importing audio-separator, patch `tqdm.std.tqdm` with a subclass that calls `progress_callback("analyzing", fraction)` in its `update()` method. Map tqdm progress 0-1 to overall progress 0.2-0.9.
### `backend/task_queue.py` - Concurrency
- `asyncio.Queue(maxsize=5)` - max 5 pending jobs, return 429 if full
- Single worker consuming tasks sequentially (BS-RoFormer needs ~4-6GB RAM)
- Job progress stored in a dict, consumed by SSE endpoints
### `backend/file_manager.py` - File Lifecycle
- Base dir: `/tmp/stem-sep/`
- Each job: `/tmp/stem-sep/{uuid}/` with `input.{ext}` and stem outputs
- Auto-cleanup: background task every 5 minutes, deletes dirs older than 30 minutes
### `backend/main.py` - FastAPI App
- Register API routes BEFORE the static file mount
- Mount `frontend/dist/` at `/` with `html=True` for SPA fallback
- On startup: launch queue worker + cleanup loop as `asyncio.create_task`
- SSE via `sse-starlette`'s `EventSourceResponse`
### `backend/requirements.txt`
```
fastapi>=0.104.0
uvicorn[standard]>=0.24.0
python-multipart>=0.0.6
sse-starlette>=1.8.0
audio-separator[cpu]>=0.17.0
pydub>=0.25.1
aiofiles>=23.2.1
```
## Frontend Details
### State Machine (`useSeparation` hook)
```typescript
type AppState =
| { phase: "idle" }
| { phase: "uploading"; progress: number }
| { phase: "uploaded"; jobId: string; filename: string }
| { phase: "separating"; jobId: string; state: string; progress: number; message: string }
| { phase: "done"; jobId: string; stems: StemResult[] }
| { phase: "error"; message: string }
```
Use `useReducer` for clean state transitions. SSE subscription in `separate()` action.
### WaveSurfer.js v7 Configuration (SoundCloud-style)
```typescript
WaveSurfer.create({
container: containerRef.current,
url: audioUrl,
waveColor: color + "66", // 40% opacity
progressColor: color, // full opacity for played portion
height: 64, // 48 for stem rows
barWidth: 2,
barGap: 1,
barRadius: 2,
cursorWidth: 1,
cursorColor: "#ffffff40",
normalize: true,
interact: true, // click to seek
});
```
Import: `import WaveSurfer from 'wavesurfer.js'`
### `WaveformPlayer.tsx` - Reusable Component
Layout: `[play/pause circle] [waveform div (flex-grow)] [MM:SS / MM:SS] [download icon?]`
- Play button: circle with play/pause icon
- Waveform container: `flex-grow` div, WaveSurfer renders into it
- Time: `currentTime / duration` in `M:SS` format
- Download: optional, shown via `onDownload` prop
**Exclusive playback**: When one player starts, dispatch `window.dispatchEvent(new CustomEvent("stem-play", { detail: instanceId }))`. All other players listen and pause.
### Stem Colors
```typescript
const STEM_CONFIG = {
Vocals: { color: "#ec4899", icon: "mic" }, // pink
Drums: { color: "#f97316", icon: "drum" }, // orange
Bass: { color: "#3b82f6", icon: "music" }, // blue
Guitar: { color: "#a855f7", icon: "guitar" }, // purple
Piano: { color: "#06b6d4", icon: "piano" }, // cyan
Other: { color: "#22c55e", icon: "waveform" }, // green
};
```
### `UploadZone.tsx`
Drag-and-drop zone with dashed border. Accepts: wav, mp3, flac, ogg, m4a, aac (max 100MB).
Shows file icon + "Drop audio file here or click to browse" + supported formats.
Drag-over state: border color changes to accent. Hidden `<input type="file" accept="audio/*">`.
### `StemRow.tsx`
Desktop layout: `[colored dot + label (w-24)] [WaveformPlayer (flex-grow)]`
Mobile layout: label on top row, waveform on bottom row (`flex-col sm:flex-row`)
### Mobile Responsive Strategy
- Main container: `max-w-3xl mx-auto px-4`
- `StemCheckboxes`: `grid-cols-2 md:grid-cols-3`
- `StemRow`: `flex-col sm:flex-row` (label stacks above waveform on mobile)
- Waveform height: `h-12 md:h-16`
- Touch targets: minimum 44px
- Font sizes: `text-sm md:text-base`
### Theme (index.css)
```css
@import "tailwindcss";
@theme {
--color-bg-primary: #0a0a0f;
--color-bg-secondary: #13131a;
--color-bg-card: #1a1a24;
--color-bg-hover: #252530;
--color-text-primary: #e8e8ef;
--color-text-secondary: #8888a0;
--color-accent: #7c3aed;
--color-accent-hover: #6d28d9;
--color-border: #2a2a38;
}
body {
background-color: var(--color-bg-primary);
color: var(--color-text-primary);
}
```
## Docker Setup
### Dockerfile
```dockerfile
FROM python:3.11-slim
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg curl && rm -rf /var/lib/apt/lists/*
RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
&& apt-get install -y nodejs && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY backend/requirements.txt backend/requirements.txt
RUN pip install --no-cache-dir -r backend/requirements.txt
COPY frontend/ frontend/
RUN cd frontend && npm ci && npm run build
COPY backend/ backend/
EXPOSE 7860
RUN useradd -m -u 1000 user
USER user
ENV HOME=/home/user PATH=/home/user/.local/bin:$PATH
CMD ["python", "-m", "uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "7860"]
```
### README.md (HF Spaces metadata)
```yaml
---
title: Stem Separator
emoji: 🎵
colorFrom: purple
colorTo: pink
sdk: docker
app_port: 7860
pinned: false
license: mit
---
```
### .dockerignore
```
frontend/node_modules
frontend/dist
**/__pycache__
*.pyc
.git
```
## Implementation Order
1. Scaffold project structure (all dirs + config files)
2. `backend/requirements.txt` + `backend/file_manager.py` + `backend/separator.py`
3. `backend/task_queue.py` + `backend/main.py` (all API endpoints + SSE)
4. Frontend scaffold: `package.json`, `vite.config.ts`, `tsconfig.json`, `index.html`, `index.css`
5. `types.ts` + `api.ts` (API client + SSE subscription)
6. `useWaveSurfer.ts` hook
7. `useSeparation.ts` hook (state machine)
8. Components: `UploadZone` -> `WaveformPlayer` -> `OriginalTrack` -> `StemCheckboxes` -> `SeparateButton` -> `ProgressBar` -> `StemRow` -> `StemResults` -> `Footer` -> `App.tsx`
9. `Dockerfile` + `README.md` + `.dockerignore`
## Verification
1. Local dev: `cd frontend && npm run dev` (with Vite proxy to backend)
2. Local backend: `cd backend && uvicorn main:app --port 7860`
3. Docker build: `docker build -t stem-sep .`
4. Docker run: `docker run -p 7860:7860 stem-sep`
5. Test: upload a song, select all 6 stems, verify progress + waveforms + play + download
---END PROMPT---