Spaces:

sourav-das
/

stem-separator

Running

App Files Files Community

stem-separator / plan.md

sourav-das

Upload folder using huggingface_hub

7dfae77 verified 29 days ago

preview code

raw

history blame contribute delete

12.9 kB

Plan: Standalone Stem Separator App for HuggingFace Spaces

Context

The user wants a standalone web app (separate from Studio13-v3) deployed on HuggingFace Spaces for audio stem separation. Users upload audio, choose stems, run BS-RoFormer model inference, then play/download individual stems with SoundCloud-style waveform visualization. The app must be mobile-responsive.

Architecture

HuggingFace Docker Space (port 7860)
├── FastAPI backend (Python)
│   ├── File upload + temp storage
│   ├── audio-separator inference (BS-RoFormer 6-stem)
│   ├── SSE progress streaming
│   └── Serves built React frontend as static files
└── React frontend (Vite build → static)
    ├── Upload zone (drag-drop)
    ├── WaveSurfer.js v7 waveforms (SoundCloud-style bars)
    ├── Stem selection checkboxes
    ├── Progress bar (SSE-driven)
    └── Stem result rows (waveform + play + download)

Model: jarredou/BS-ROFO-SW-Fixed (699MB .ckpt) - BS-RoFormer, 6 stems: Vocals, Drums, Bass, Guitar, Piano, Other

Deliverable

A self-contained prompt (below) the user can paste into another Claude Code window to build the entire app from scratch.

Prompt to use in another window

The prompt is designed to be comprehensive and self-contained. Copy everything between the ---START PROMPT--- and ---END PROMPT--- markers.

---START PROMPT---

Build a standalone web app for HuggingFace Spaces that does audio stem separation. The app should be production-ready, responsive, and polished.

What the app does

User uploads an audio file (drag-drop or file picker)
Original track appears with a SoundCloud-style scrolling peak waveform + play button
User selects which stems to separate via checkboxes (Vocals, Drums, Bass, Guitar, Piano, Other)
User clicks "Separate" - progress bar shows real-time progress via SSE
Once done, each stem appears in its own row: colored label + waveform (flex-grow) + play + download
"Download All" button creates a ZIP of all stems

Tech Stack

Frontend: React 19, TypeScript, Tailwind CSS v4, Vite 5
Backend: Python 3.11, FastAPI, uvicorn
Waveform: WaveSurfer.js v7 (wavesurfer.js npm package)
Model: jarredou/BS-ROFO-SW-Fixed from HuggingFace (BS-RoFormer, 699MB .ckpt, 6 stems)
Inference: audio-separator Python package
Progress: SSE (Server-Sent Events) via sse-starlette
Deploy: HuggingFace Docker Space, port 7860

Directory Structure

stem-separator/
├── Dockerfile
├── README.md                     # HF Spaces YAML front matter
├── .dockerignore
├── backend/
│   ├── main.py                   # FastAPI: routes, SSE, static serving
│   ├── separator.py              # audio-separator wrapper with progress callback
│   ├── file_manager.py           # Temp file lifecycle, cleanup
│   ├── task_queue.py             # asyncio queue (1 concurrent separation)
│   └── requirements.txt
├── frontend/
│   ├── index.html
│   ├── package.json
│   ├── vite.config.ts
│   ├── tsconfig.json
│   └── src/
│       ├── main.tsx
│       ├── App.tsx
│       ├── index.css             # Tailwind theme (dark, music-oriented)
│       ├── api.ts                # fetch wrappers + SSE EventSource
│       ├── types.ts              # Shared interfaces
│       ├── hooks/
│       │   ├── useWaveSurfer.ts  # WaveSurfer.js v7 hook
│       │   └── useSeparation.ts  # Upload->Separate->Results state machine
│       └── components/
│           ├── UploadZone.tsx
│           ├── OriginalTrack.tsx
│           ├── StemCheckboxes.tsx
│           ├── SeparateButton.tsx
│           ├── ProgressBar.tsx
│           ├── WaveformPlayer.tsx # Reusable: [play] [waveform===] [time] [download?]
│           ├── StemRow.tsx
│           ├── StemResults.tsx
│           └── Footer.tsx

Backend Details

API Endpoints

POST /api/upload          - Multipart file upload (max 100MB), returns { job_id, filename }
POST /api/separate        - Body: { job_id, stems: string[] }, enqueues task
GET  /api/progress/{id}   - SSE stream: { state, progress, message, stems? }
GET  /api/audio/{id}/{f}  - Serve audio for WaveSurfer playback
GET  /api/download/{id}/{f} - Download stem with Content-Disposition: attachment
GET  /api/download/{id}/all - ZIP of all stems, streamed
DELETE /api/job/{id}      - Manual cleanup

`backend/separator.py` - Separation Logic

# Singleton pattern - keep model loaded between requests
# Adapted from this working pattern:

from audio_separator.separator import Separator

class StemSeparatorService:
    _instance = None
    _model_loaded = False

    def __new__(cls):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
        return cls._instance

    def load_model(self):
        if self._model_loaded:
            return
        self.separator = Separator(
            output_dir="/tmp/output",
            output_format="WAV",
            output_single_stem=None,
        )
        self.separator.load_model(model_filename="BS-Rofo-SW-Fixed.ckpt")
        self._model_loaded = True

    def separate(self, input_path, output_dir, stems, progress_callback):
        # Run separation
        self.separator.output_dir = output_dir
        output_files = self.separator.separate(input_path)

        # Map output files to stem names using aliases:
        # vocals: [vocals, vocal, voice, singing]
        # drums: [drums, drum, percussion]
        # bass: [bass]
        # guitar: [guitar, guitars]
        # piano: [piano, keys, keyboard]
        # other: [other, instrumental, residual, remainder, no_]

        # Rename files from "input_(Vocals).wav" -> "Vocals.wav"
        # Return dict of stem_name -> file_path

tqdm monkey-patching for progress: Before importing audio-separator, patch tqdm.std.tqdm with a subclass that calls progress_callback("analyzing", fraction) in its update() method. Map tqdm progress 0-1 to overall progress 0.2-0.9.

`backend/task_queue.py` - Concurrency

asyncio.Queue(maxsize=5) - max 5 pending jobs, return 429 if full
Single worker consuming tasks sequentially (BS-RoFormer needs ~4-6GB RAM)
Job progress stored in a dict, consumed by SSE endpoints

`backend/file_manager.py` - File Lifecycle

Base dir: /tmp/stem-sep/
Each job: /tmp/stem-sep/{uuid}/ with input.{ext} and stem outputs
Auto-cleanup: background task every 5 minutes, deletes dirs older than 30 minutes

`backend/main.py` - FastAPI App

Register API routes BEFORE the static file mount
Mount frontend/dist/ at / with html=True for SPA fallback
On startup: launch queue worker + cleanup loop as asyncio.create_task
SSE via sse-starlette's EventSourceResponse

`backend/requirements.txt`

fastapi>=0.104.0
uvicorn[standard]>=0.24.0
python-multipart>=0.0.6
sse-starlette>=1.8.0
audio-separator[cpu]>=0.17.0
pydub>=0.25.1
aiofiles>=23.2.1

Frontend Details

State Machine (`useSeparation` hook)

type AppState =
  | { phase: "idle" }
  | { phase: "uploading"; progress: number }
  | { phase: "uploaded"; jobId: string; filename: string }
  | { phase: "separating"; jobId: string; state: string; progress: number; message: string }
  | { phase: "done"; jobId: string; stems: StemResult[] }
  | { phase: "error"; message: string }

Use useReducer for clean state transitions. SSE subscription in separate() action.

WaveSurfer.js v7 Configuration (SoundCloud-style)

WaveSurfer.create({
  container: containerRef.current,
  url: audioUrl,
  waveColor: color + "66",      // 40% opacity
  progressColor: color,          // full opacity for played portion
  height: 64,                    // 48 for stem rows
  barWidth: 2,
  barGap: 1,
  barRadius: 2,
  cursorWidth: 1,
  cursorColor: "#ffffff40",
  normalize: true,
  interact: true,                // click to seek
});

Import: import WaveSurfer from 'wavesurfer.js'

`WaveformPlayer.tsx` - Reusable Component

Layout: [play/pause circle] [waveform div (flex-grow)] [MM:SS / MM:SS] [download icon?]

Play button: circle with play/pause icon
Waveform container: flex-grow div, WaveSurfer renders into it
Time: currentTime / duration in M:SS format
Download: optional, shown via onDownload prop

Exclusive playback: When one player starts, dispatch window.dispatchEvent(new CustomEvent("stem-play", { detail: instanceId })). All other players listen and pause.

Stem Colors

const STEM_CONFIG = {
  Vocals: { color: "#ec4899", icon: "mic" },       // pink
  Drums:  { color: "#f97316", icon: "drum" },       // orange
  Bass:   { color: "#3b82f6", icon: "music" },      // blue
  Guitar: { color: "#a855f7", icon: "guitar" },     // purple
  Piano:  { color: "#06b6d4", icon: "piano" },      // cyan
  Other:  { color: "#22c55e", icon: "waveform" },   // green
};

`UploadZone.tsx`

Drag-and-drop zone with dashed border. Accepts: wav, mp3, flac, ogg, m4a, aac (max 100MB). Shows file icon + "Drop audio file here or click to browse" + supported formats. Drag-over state: border color changes to accent. Hidden <input type="file" accept="audio/*">.

`StemRow.tsx`

Desktop layout: [colored dot + label (w-24)] [WaveformPlayer (flex-grow)] Mobile layout: label on top row, waveform on bottom row (flex-col sm:flex-row)

Mobile Responsive Strategy

Main container: max-w-3xl mx-auto px-4
StemCheckboxes: grid-cols-2 md:grid-cols-3
StemRow: flex-col sm:flex-row (label stacks above waveform on mobile)
Waveform height: h-12 md:h-16
Touch targets: minimum 44px
Font sizes: text-sm md:text-base

Theme (index.css)

@import "tailwindcss";

@theme {
  --color-bg-primary: #0a0a0f;
  --color-bg-secondary: #13131a;
  --color-bg-card: #1a1a24;
  --color-bg-hover: #252530;
  --color-text-primary: #e8e8ef;
  --color-text-secondary: #8888a0;
  --color-accent: #7c3aed;
  --color-accent-hover: #6d28d9;
  --color-border: #2a2a38;
}

body {
  background-color: var(--color-bg-primary);
  color: var(--color-text-primary);
}

Docker Setup

Dockerfile

FROM python:3.11-slim

RUN apt-get update && apt-get install -y --no-install-recommends \
    ffmpeg curl && rm -rf /var/lib/apt/lists/*

RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
    && apt-get install -y nodejs && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY backend/requirements.txt backend/requirements.txt
RUN pip install --no-cache-dir -r backend/requirements.txt

COPY frontend/ frontend/
RUN cd frontend && npm ci && npm run build

COPY backend/ backend/

EXPOSE 7860

RUN useradd -m -u 1000 user
USER user
ENV HOME=/home/user PATH=/home/user/.local/bin:$PATH

CMD ["python", "-m", "uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "7860"]

README.md (HF Spaces metadata)

---
title: Stem Separator
emoji: 🎵
colorFrom: purple
colorTo: pink
sdk: docker
app_port: 7860
pinned: false
license: mit
---

.dockerignore

frontend/node_modules
frontend/dist
**/__pycache__
*.pyc
.git

Implementation Order

Scaffold project structure (all dirs + config files)
backend/requirements.txt + backend/file_manager.py + backend/separator.py
backend/task_queue.py + backend/main.py (all API endpoints + SSE)
Frontend scaffold: package.json, vite.config.ts, tsconfig.json, index.html, index.css
types.ts + api.ts (API client + SSE subscription)
useWaveSurfer.ts hook
useSeparation.ts hook (state machine)
Components: UploadZone -> WaveformPlayer -> OriginalTrack -> StemCheckboxes -> SeparateButton -> ProgressBar -> StemRow -> StemResults -> Footer -> App.tsx
Dockerfile + README.md + .dockerignore

Verification

Local dev: cd frontend && npm run dev (with Vite proxy to backend)
Local backend: cd backend && uvicorn main:app --port 7860
Docker build: docker build -t stem-sep .
Docker run: docker run -p 7860:7860 stem-sep
Test: upload a song, select all 6 stems, verify progress + waveforms + play + download

---END PROMPT---