stem-separator

Sleeping

App Files Files Community

stem-separator / plan.md

sourav-das

Upload folder using huggingface_hub

7dfae77 verified 3 months ago

preview code

raw

history blame contribute delete

12.9 kB

	# Plan: Standalone Stem Separator App for HuggingFace Spaces

	## Context

	The user wants a standalone web app (separate from Studio13-v3) deployed on HuggingFace Spaces for audio stem separation. Users upload audio, choose stems, run BS-RoFormer model inference, then play/download individual stems with SoundCloud-style waveform visualization. The app must be mobile-responsive.

	## Architecture

	```
	HuggingFace Docker Space (port 7860)
	├── FastAPI backend (Python)
	│ ├── File upload + temp storage
	│ ├── audio-separator inference (BS-RoFormer 6-stem)
	│ ├── SSE progress streaming
	│ └── Serves built React frontend as static files
	└── React frontend (Vite build → static)
	├── Upload zone (drag-drop)
	├── WaveSurfer.js v7 waveforms (SoundCloud-style bars)
	├── Stem selection checkboxes
	├── Progress bar (SSE-driven)
	└── Stem result rows (waveform + play + download)
	```

	Model: `jarredou/BS-ROFO-SW-Fixed` (699MB .ckpt) - BS-RoFormer, 6 stems: Vocals, Drums, Bass, Guitar, Piano, Other

	## Deliverable

	A self-contained prompt (below) the user can paste into another Claude Code window to build the entire app from scratch.

	---

	## Prompt to use in another window

	The prompt is designed to be comprehensive and self-contained. Copy everything between the `---START PROMPT---` and `---END PROMPT---` markers.

	---START PROMPT---

	Build a standalone web app for HuggingFace Spaces that does audio stem separation. The app should be production-ready, responsive, and polished.

	## What the app does

	1. User uploads an audio file (drag-drop or file picker)
	2. Original track appears with a SoundCloud-style scrolling peak waveform + play button
	3. User selects which stems to separate via checkboxes (Vocals, Drums, Bass, Guitar, Piano, Other)
	4. User clicks "Separate" - progress bar shows real-time progress via SSE
	5. Once done, each stem appears in its own row: colored label + waveform (flex-grow) + play + download
	6. "Download All" button creates a ZIP of all stems

	## Tech Stack

	- Frontend: React 19, TypeScript, Tailwind CSS v4, Vite 5
	- Backend: Python 3.11, FastAPI, uvicorn
	- Waveform: WaveSurfer.js v7 (`wavesurfer.js` npm package)
	- Model: `jarredou/BS-ROFO-SW-Fixed` from HuggingFace (BS-RoFormer, 699MB .ckpt, 6 stems)
	- Inference: `audio-separator` Python package
	- Progress: SSE (Server-Sent Events) via `sse-starlette`
	- Deploy: HuggingFace Docker Space, port 7860

	## Directory Structure

	```
	stem-separator/
	├── Dockerfile
	├── README.md # HF Spaces YAML front matter
	├── .dockerignore
	├── backend/
	│ ├── main.py # FastAPI: routes, SSE, static serving
	│ ├── separator.py # audio-separator wrapper with progress callback
	│ ├── file_manager.py # Temp file lifecycle, cleanup
	│ ├── task_queue.py # asyncio queue (1 concurrent separation)
	│ └── requirements.txt
	├── frontend/
	│ ├── index.html
	│ ├── package.json
	│ ├── vite.config.ts
	│ ├── tsconfig.json
	│ └── src/
	│ ├── main.tsx
	│ ├── App.tsx
	│ ├── index.css # Tailwind theme (dark, music-oriented)
	│ ├── api.ts # fetch wrappers + SSE EventSource
	│ ├── types.ts # Shared interfaces
	│ ├── hooks/
	│ │ ├── useWaveSurfer.ts # WaveSurfer.js v7 hook
	│ │ └── useSeparation.ts # Upload->Separate->Results state machine
	│ └── components/
	│ ├── UploadZone.tsx
	│ ├── OriginalTrack.tsx
	│ ├── StemCheckboxes.tsx
	│ ├── SeparateButton.tsx
	│ ├── ProgressBar.tsx
	│ ├── WaveformPlayer.tsx # Reusable: [play] [waveform===] [time] [download?]
	│ ├── StemRow.tsx
	│ ├── StemResults.tsx
	│ └── Footer.tsx
	```

	## Backend Details

	### API Endpoints

	```
	POST /api/upload - Multipart file upload (max 100MB), returns { job_id, filename }
	POST /api/separate - Body: { job_id, stems: string[] }, enqueues task
	GET /api/progress/{id} - SSE stream: { state, progress, message, stems? }
	GET /api/audio/{id}/{f} - Serve audio for WaveSurfer playback
	GET /api/download/{id}/{f} - Download stem with Content-Disposition: attachment
	GET /api/download/{id}/all - ZIP of all stems, streamed
	DELETE /api/job/{id} - Manual cleanup
	```

	### `backend/separator.py` - Separation Logic

	```python
	# Singleton pattern - keep model loaded between requests
	# Adapted from this working pattern:

	from audio_separator.separator import Separator

	class StemSeparatorService:
	_instance = None
	_model_loaded = False

	def __new__(cls):
	if cls._instance is None:
	cls._instance = super().__new__(cls)
	return cls._instance

	def load_model(self):
	if self._model_loaded:
	return
	self.separator = Separator(
	output_dir="/tmp/output",
	output_format="WAV",
	output_single_stem=None,
	)
	self.separator.load_model(model_filename="BS-Rofo-SW-Fixed.ckpt")
	self._model_loaded = True

	def separate(self, input_path, output_dir, stems, progress_callback):
	# Run separation
	self.separator.output_dir = output_dir
	output_files = self.separator.separate(input_path)

	# Map output files to stem names using aliases:
	# vocals: [vocals, vocal, voice, singing]
	# drums: [drums, drum, percussion]
	# bass: [bass]
	# guitar: [guitar, guitars]
	# piano: [piano, keys, keyboard]
	# other: [other, instrumental, residual, remainder, no_]

	# Rename files from "input_(Vocals).wav" -> "Vocals.wav"
	# Return dict of stem_name -> file_path
	```

	tqdm monkey-patching for progress: Before importing audio-separator, patch `tqdm.std.tqdm` with a subclass that calls `progress_callback("analyzing", fraction)` in its `update()` method. Map tqdm progress 0-1 to overall progress 0.2-0.9.

	### `backend/task_queue.py` - Concurrency

	- `asyncio.Queue(maxsize=5)` - max 5 pending jobs, return 429 if full
	- Single worker consuming tasks sequentially (BS-RoFormer needs ~4-6GB RAM)
	- Job progress stored in a dict, consumed by SSE endpoints

	### `backend/file_manager.py` - File Lifecycle

	- Base dir: `/tmp/stem-sep/`
	- Each job: `/tmp/stem-sep/{uuid}/` with `input.{ext}` and stem outputs
	- Auto-cleanup: background task every 5 minutes, deletes dirs older than 30 minutes

	### `backend/main.py` - FastAPI App

	- Register API routes BEFORE the static file mount
	- Mount `frontend/dist/` at `/` with `html=True` for SPA fallback
	- On startup: launch queue worker + cleanup loop as `asyncio.create_task`
	- SSE via `sse-starlette`'s `EventSourceResponse`

	### `backend/requirements.txt`

	```
	fastapi>=0.104.0
	uvicorn[standard]>=0.24.0
	python-multipart>=0.0.6
	sse-starlette>=1.8.0
	audio-separator[cpu]>=0.17.0
	pydub>=0.25.1
	aiofiles>=23.2.1
	```

	## Frontend Details

	### State Machine (`useSeparation` hook)

	```typescript
	type AppState =
	\| { phase: "idle" }
	\| { phase: "uploading"; progress: number }
	\| { phase: "uploaded"; jobId: string; filename: string }
	\| { phase: "separating"; jobId: string; state: string; progress: number; message: string }
	\| { phase: "done"; jobId: string; stems: StemResult[] }
	\| { phase: "error"; message: string }
	```

	Use `useReducer` for clean state transitions. SSE subscription in `separate()` action.

	### WaveSurfer.js v7 Configuration (SoundCloud-style)

	```typescript
	WaveSurfer.create({
	container: containerRef.current,
	url: audioUrl,
	waveColor: color + "66", // 40% opacity
	progressColor: color, // full opacity for played portion
	height: 64, // 48 for stem rows
	barWidth: 2,
	barGap: 1,
	barRadius: 2,
	cursorWidth: 1,
	cursorColor: "#ffffff40",
	normalize: true,
	interact: true, // click to seek
	});
	```

	Import: `import WaveSurfer from 'wavesurfer.js'`

	### `WaveformPlayer.tsx` - Reusable Component

	Layout: `[play/pause circle] [waveform div (flex-grow)] [MM:SS / MM:SS] [download icon?]`

	- Play button: circle with play/pause icon
	- Waveform container: `flex-grow` div, WaveSurfer renders into it
	- Time: `currentTime / duration` in `M:SS` format
	- Download: optional, shown via `onDownload` prop

	Exclusive playback: When one player starts, dispatch `window.dispatchEvent(new CustomEvent("stem-play", { detail: instanceId }))`. All other players listen and pause.

	### Stem Colors

	```typescript
	const STEM_CONFIG = {
	Vocals: { color: "#ec4899", icon: "mic" }, // pink
	Drums: { color: "#f97316", icon: "drum" }, // orange
	Bass: { color: "#3b82f6", icon: "music" }, // blue
	Guitar: { color: "#a855f7", icon: "guitar" }, // purple
	Piano: { color: "#06b6d4", icon: "piano" }, // cyan
	Other: { color: "#22c55e", icon: "waveform" }, // green
	};
	```

	### `UploadZone.tsx`

	Drag-and-drop zone with dashed border. Accepts: wav, mp3, flac, ogg, m4a, aac (max 100MB).
	Shows file icon + "Drop audio file here or click to browse" + supported formats.
	Drag-over state: border color changes to accent. Hidden `<input type="file" accept="audio/*">`.

	### `StemRow.tsx`

	Desktop layout: `[colored dot + label (w-24)] [WaveformPlayer (flex-grow)]`
	Mobile layout: label on top row, waveform on bottom row (`flex-col sm:flex-row`)

	### Mobile Responsive Strategy

	- Main container: `max-w-3xl mx-auto px-4`
	- `StemCheckboxes`: `grid-cols-2 md:grid-cols-3`
	- `StemRow`: `flex-col sm:flex-row` (label stacks above waveform on mobile)
	- Waveform height: `h-12 md:h-16`
	- Touch targets: minimum 44px
	- Font sizes: `text-sm md:text-base`

	### Theme (index.css)

	```css
	@import "tailwindcss";

	@theme {
	--color-bg-primary: #0a0a0f;
	--color-bg-secondary: #13131a;
	--color-bg-card: #1a1a24;
	--color-bg-hover: #252530;
	--color-text-primary: #e8e8ef;
	--color-text-secondary: #8888a0;
	--color-accent: #7c3aed;
	--color-accent-hover: #6d28d9;
	--color-border: #2a2a38;
	}

	body {
	background-color: var(--color-bg-primary);
	color: var(--color-text-primary);
	}
	```

	## Docker Setup

	### Dockerfile

	```dockerfile
	FROM python:3.11-slim

	RUN apt-get update && apt-get install -y --no-install-recommends \
	ffmpeg curl && rm -rf /var/lib/apt/lists/*

	RUN curl -fsSL https://deb.nodesource.com/setup_20.x \| bash - \
	&& apt-get install -y nodejs && rm -rf /var/lib/apt/lists/*

	WORKDIR /app

	COPY backend/requirements.txt backend/requirements.txt
	RUN pip install --no-cache-dir -r backend/requirements.txt

	COPY frontend/ frontend/
	RUN cd frontend && npm ci && npm run build

	COPY backend/ backend/

	EXPOSE 7860

	RUN useradd -m -u 1000 user
	USER user
	ENV HOME=/home/user PATH=/home/user/.local/bin:$PATH

	CMD ["python", "-m", "uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "7860"]
	```

	### README.md (HF Spaces metadata)

	```yaml
	---
	title: Stem Separator
	emoji: 🎵
	colorFrom: purple
	colorTo: pink
	sdk: docker
	app_port: 7860
	pinned: false
	license: mit
	---
	```

	### .dockerignore

	```
	frontend/node_modules
	frontend/dist
	**/__pycache__
	*.pyc
	.git
	```

	## Implementation Order

	1. Scaffold project structure (all dirs + config files)
	2. `backend/requirements.txt` + `backend/file_manager.py` + `backend/separator.py`
	3. `backend/task_queue.py` + `backend/main.py` (all API endpoints + SSE)
	4. Frontend scaffold: `package.json`, `vite.config.ts`, `tsconfig.json`, `index.html`, `index.css`
	5. `types.ts` + `api.ts` (API client + SSE subscription)
	6. `useWaveSurfer.ts` hook
	7. `useSeparation.ts` hook (state machine)
	8. Components: `UploadZone` -> `WaveformPlayer` -> `OriginalTrack` -> `StemCheckboxes` -> `SeparateButton` -> `ProgressBar` -> `StemRow` -> `StemResults` -> `Footer` -> `App.tsx`
	9. `Dockerfile` + `README.md` + `.dockerignore`

	## Verification

	1. Local dev: `cd frontend && npm run dev` (with Vite proxy to backend)
	2. Local backend: `cd backend && uvicorn main:app --port 7860`
	3. Docker build: `docker build -t stem-sep .`
	4. Docker run: `docker run -p 7860:7860 stem-sep`
	5. Test: upload a song, select all 6 stems, verify progress + waveforms + play + download

	---END PROMPT---