stem-separator / plan.md
sourav-das's picture
Upload folder using huggingface_hub
7dfae77 verified

Plan: Standalone Stem Separator App for HuggingFace Spaces

Context

The user wants a standalone web app (separate from Studio13-v3) deployed on HuggingFace Spaces for audio stem separation. Users upload audio, choose stems, run BS-RoFormer model inference, then play/download individual stems with SoundCloud-style waveform visualization. The app must be mobile-responsive.

Architecture

HuggingFace Docker Space (port 7860)
β”œβ”€β”€ FastAPI backend (Python)
β”‚   β”œβ”€β”€ File upload + temp storage
β”‚   β”œβ”€β”€ audio-separator inference (BS-RoFormer 6-stem)
β”‚   β”œβ”€β”€ SSE progress streaming
β”‚   └── Serves built React frontend as static files
└── React frontend (Vite build β†’ static)
    β”œβ”€β”€ Upload zone (drag-drop)
    β”œβ”€β”€ WaveSurfer.js v7 waveforms (SoundCloud-style bars)
    β”œβ”€β”€ Stem selection checkboxes
    β”œβ”€β”€ Progress bar (SSE-driven)
    └── Stem result rows (waveform + play + download)

Model: jarredou/BS-ROFO-SW-Fixed (699MB .ckpt) - BS-RoFormer, 6 stems: Vocals, Drums, Bass, Guitar, Piano, Other

Deliverable

A self-contained prompt (below) the user can paste into another Claude Code window to build the entire app from scratch.


Prompt to use in another window

The prompt is designed to be comprehensive and self-contained. Copy everything between the ---START PROMPT--- and ---END PROMPT--- markers.

---START PROMPT---

Build a standalone web app for HuggingFace Spaces that does audio stem separation. The app should be production-ready, responsive, and polished.

What the app does

  1. User uploads an audio file (drag-drop or file picker)
  2. Original track appears with a SoundCloud-style scrolling peak waveform + play button
  3. User selects which stems to separate via checkboxes (Vocals, Drums, Bass, Guitar, Piano, Other)
  4. User clicks "Separate" - progress bar shows real-time progress via SSE
  5. Once done, each stem appears in its own row: colored label + waveform (flex-grow) + play + download
  6. "Download All" button creates a ZIP of all stems

Tech Stack

  • Frontend: React 19, TypeScript, Tailwind CSS v4, Vite 5
  • Backend: Python 3.11, FastAPI, uvicorn
  • Waveform: WaveSurfer.js v7 (wavesurfer.js npm package)
  • Model: jarredou/BS-ROFO-SW-Fixed from HuggingFace (BS-RoFormer, 699MB .ckpt, 6 stems)
  • Inference: audio-separator Python package
  • Progress: SSE (Server-Sent Events) via sse-starlette
  • Deploy: HuggingFace Docker Space, port 7860

Directory Structure

stem-separator/
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ README.md                     # HF Spaces YAML front matter
β”œβ”€β”€ .dockerignore
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py                   # FastAPI: routes, SSE, static serving
β”‚   β”œβ”€β”€ separator.py              # audio-separator wrapper with progress callback
β”‚   β”œβ”€β”€ file_manager.py           # Temp file lifecycle, cleanup
β”‚   β”œβ”€β”€ task_queue.py             # asyncio queue (1 concurrent separation)
β”‚   └── requirements.txt
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ index.html
β”‚   β”œβ”€β”€ package.json
β”‚   β”œβ”€β”€ vite.config.ts
β”‚   β”œβ”€β”€ tsconfig.json
β”‚   └── src/
β”‚       β”œβ”€β”€ main.tsx
β”‚       β”œβ”€β”€ App.tsx
β”‚       β”œβ”€β”€ index.css             # Tailwind theme (dark, music-oriented)
β”‚       β”œβ”€β”€ api.ts                # fetch wrappers + SSE EventSource
β”‚       β”œβ”€β”€ types.ts              # Shared interfaces
β”‚       β”œβ”€β”€ hooks/
β”‚       β”‚   β”œβ”€β”€ useWaveSurfer.ts  # WaveSurfer.js v7 hook
β”‚       β”‚   └── useSeparation.ts  # Upload->Separate->Results state machine
β”‚       └── components/
β”‚           β”œβ”€β”€ UploadZone.tsx
β”‚           β”œβ”€β”€ OriginalTrack.tsx
β”‚           β”œβ”€β”€ StemCheckboxes.tsx
β”‚           β”œβ”€β”€ SeparateButton.tsx
β”‚           β”œβ”€β”€ ProgressBar.tsx
β”‚           β”œβ”€β”€ WaveformPlayer.tsx # Reusable: [play] [waveform===] [time] [download?]
β”‚           β”œβ”€β”€ StemRow.tsx
β”‚           β”œβ”€β”€ StemResults.tsx
β”‚           └── Footer.tsx

Backend Details

API Endpoints

POST /api/upload          - Multipart file upload (max 100MB), returns { job_id, filename }
POST /api/separate        - Body: { job_id, stems: string[] }, enqueues task
GET  /api/progress/{id}   - SSE stream: { state, progress, message, stems? }
GET  /api/audio/{id}/{f}  - Serve audio for WaveSurfer playback
GET  /api/download/{id}/{f} - Download stem with Content-Disposition: attachment
GET  /api/download/{id}/all - ZIP of all stems, streamed
DELETE /api/job/{id}      - Manual cleanup

backend/separator.py - Separation Logic

# Singleton pattern - keep model loaded between requests
# Adapted from this working pattern:

from audio_separator.separator import Separator

class StemSeparatorService:
    _instance = None
    _model_loaded = False

    def __new__(cls):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
        return cls._instance

    def load_model(self):
        if self._model_loaded:
            return
        self.separator = Separator(
            output_dir="/tmp/output",
            output_format="WAV",
            output_single_stem=None,
        )
        self.separator.load_model(model_filename="BS-Rofo-SW-Fixed.ckpt")
        self._model_loaded = True

    def separate(self, input_path, output_dir, stems, progress_callback):
        # Run separation
        self.separator.output_dir = output_dir
        output_files = self.separator.separate(input_path)

        # Map output files to stem names using aliases:
        # vocals: [vocals, vocal, voice, singing]
        # drums: [drums, drum, percussion]
        # bass: [bass]
        # guitar: [guitar, guitars]
        # piano: [piano, keys, keyboard]
        # other: [other, instrumental, residual, remainder, no_]

        # Rename files from "input_(Vocals).wav" -> "Vocals.wav"
        # Return dict of stem_name -> file_path

tqdm monkey-patching for progress: Before importing audio-separator, patch tqdm.std.tqdm with a subclass that calls progress_callback("analyzing", fraction) in its update() method. Map tqdm progress 0-1 to overall progress 0.2-0.9.

backend/task_queue.py - Concurrency

  • asyncio.Queue(maxsize=5) - max 5 pending jobs, return 429 if full
  • Single worker consuming tasks sequentially (BS-RoFormer needs ~4-6GB RAM)
  • Job progress stored in a dict, consumed by SSE endpoints

backend/file_manager.py - File Lifecycle

  • Base dir: /tmp/stem-sep/
  • Each job: /tmp/stem-sep/{uuid}/ with input.{ext} and stem outputs
  • Auto-cleanup: background task every 5 minutes, deletes dirs older than 30 minutes

backend/main.py - FastAPI App

  • Register API routes BEFORE the static file mount
  • Mount frontend/dist/ at / with html=True for SPA fallback
  • On startup: launch queue worker + cleanup loop as asyncio.create_task
  • SSE via sse-starlette's EventSourceResponse

backend/requirements.txt

fastapi>=0.104.0
uvicorn[standard]>=0.24.0
python-multipart>=0.0.6
sse-starlette>=1.8.0
audio-separator[cpu]>=0.17.0
pydub>=0.25.1
aiofiles>=23.2.1

Frontend Details

State Machine (useSeparation hook)

type AppState =
  | { phase: "idle" }
  | { phase: "uploading"; progress: number }
  | { phase: "uploaded"; jobId: string; filename: string }
  | { phase: "separating"; jobId: string; state: string; progress: number; message: string }
  | { phase: "done"; jobId: string; stems: StemResult[] }
  | { phase: "error"; message: string }

Use useReducer for clean state transitions. SSE subscription in separate() action.

WaveSurfer.js v7 Configuration (SoundCloud-style)

WaveSurfer.create({
  container: containerRef.current,
  url: audioUrl,
  waveColor: color + "66",      // 40% opacity
  progressColor: color,          // full opacity for played portion
  height: 64,                    // 48 for stem rows
  barWidth: 2,
  barGap: 1,
  barRadius: 2,
  cursorWidth: 1,
  cursorColor: "#ffffff40",
  normalize: true,
  interact: true,                // click to seek
});

Import: import WaveSurfer from 'wavesurfer.js'

WaveformPlayer.tsx - Reusable Component

Layout: [play/pause circle] [waveform div (flex-grow)] [MM:SS / MM:SS] [download icon?]

  • Play button: circle with play/pause icon
  • Waveform container: flex-grow div, WaveSurfer renders into it
  • Time: currentTime / duration in M:SS format
  • Download: optional, shown via onDownload prop

Exclusive playback: When one player starts, dispatch window.dispatchEvent(new CustomEvent("stem-play", { detail: instanceId })). All other players listen and pause.

Stem Colors

const STEM_CONFIG = {
  Vocals: { color: "#ec4899", icon: "mic" },       // pink
  Drums:  { color: "#f97316", icon: "drum" },       // orange
  Bass:   { color: "#3b82f6", icon: "music" },      // blue
  Guitar: { color: "#a855f7", icon: "guitar" },     // purple
  Piano:  { color: "#06b6d4", icon: "piano" },      // cyan
  Other:  { color: "#22c55e", icon: "waveform" },   // green
};

UploadZone.tsx

Drag-and-drop zone with dashed border. Accepts: wav, mp3, flac, ogg, m4a, aac (max 100MB). Shows file icon + "Drop audio file here or click to browse" + supported formats. Drag-over state: border color changes to accent. Hidden <input type="file" accept="audio/*">.

StemRow.tsx

Desktop layout: [colored dot + label (w-24)] [WaveformPlayer (flex-grow)] Mobile layout: label on top row, waveform on bottom row (flex-col sm:flex-row)

Mobile Responsive Strategy

  • Main container: max-w-3xl mx-auto px-4
  • StemCheckboxes: grid-cols-2 md:grid-cols-3
  • StemRow: flex-col sm:flex-row (label stacks above waveform on mobile)
  • Waveform height: h-12 md:h-16
  • Touch targets: minimum 44px
  • Font sizes: text-sm md:text-base

Theme (index.css)

@import "tailwindcss";

@theme {
  --color-bg-primary: #0a0a0f;
  --color-bg-secondary: #13131a;
  --color-bg-card: #1a1a24;
  --color-bg-hover: #252530;
  --color-text-primary: #e8e8ef;
  --color-text-secondary: #8888a0;
  --color-accent: #7c3aed;
  --color-accent-hover: #6d28d9;
  --color-border: #2a2a38;
}

body {
  background-color: var(--color-bg-primary);
  color: var(--color-text-primary);
}

Docker Setup

Dockerfile

FROM python:3.11-slim

RUN apt-get update && apt-get install -y --no-install-recommends \
    ffmpeg curl && rm -rf /var/lib/apt/lists/*

RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
    && apt-get install -y nodejs && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY backend/requirements.txt backend/requirements.txt
RUN pip install --no-cache-dir -r backend/requirements.txt

COPY frontend/ frontend/
RUN cd frontend && npm ci && npm run build

COPY backend/ backend/

EXPOSE 7860

RUN useradd -m -u 1000 user
USER user
ENV HOME=/home/user PATH=/home/user/.local/bin:$PATH

CMD ["python", "-m", "uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "7860"]

README.md (HF Spaces metadata)

---
title: Stem Separator
emoji: 🎡
colorFrom: purple
colorTo: pink
sdk: docker
app_port: 7860
pinned: false
license: mit
---

.dockerignore

frontend/node_modules
frontend/dist
**/__pycache__
*.pyc
.git

Implementation Order

  1. Scaffold project structure (all dirs + config files)
  2. backend/requirements.txt + backend/file_manager.py + backend/separator.py
  3. backend/task_queue.py + backend/main.py (all API endpoints + SSE)
  4. Frontend scaffold: package.json, vite.config.ts, tsconfig.json, index.html, index.css
  5. types.ts + api.ts (API client + SSE subscription)
  6. useWaveSurfer.ts hook
  7. useSeparation.ts hook (state machine)
  8. Components: UploadZone -> WaveformPlayer -> OriginalTrack -> StemCheckboxes -> SeparateButton -> ProgressBar -> StemRow -> StemResults -> Footer -> App.tsx
  9. Dockerfile + README.md + .dockerignore

Verification

  1. Local dev: cd frontend && npm run dev (with Vite proxy to backend)
  2. Local backend: cd backend && uvicorn main:app --port 7860
  3. Docker build: docker build -t stem-sep .
  4. Docker run: docker run -p 7860:7860 stem-sep
  5. Test: upload a song, select all 6 stems, verify progress + waveforms + play + download

---END PROMPT---