SeparateTracks / specs /build.md
Surn's picture
Refactor SeparateTracks application for enhanced audio processing and UI features
36e50f2

SeparateTracks β€” Build Plan

Goal

Produce a running Gradio application (app.py) that accepts a YouTube ID, YouTube URL, or uploaded .wav/.mp3 audio, separates it into instrument stems via Demucs, displays results in an AudioGallery UI, and exposes an MCP server endpoint β€” deployable locally and as a HuggingFace Docker Space (Surn/SeparateTracks).


Project Map

File Status Purpose
app.py βœ… created Gradio UI entry point + MCP server
modules/AudioGallery.py βœ… created AudioGallery(gr.HTML) β€” 7-stem audio grid with play and download controls
modules/AudioGallery.pyi βœ… created Type stub for AudioGallery
modules/yt_audio_get_tracks.py βœ… moved + updated download_audio() + separate_tracks() with progress callbacks
modules/constants.py exists Env vars, shared constants
modules/version_info.py exists Footer HTML with versions
modules/file_utils.py exists File helper utilities
requirements.txt βœ… updated gradio[mcp], python-dotenv, numpy, Pillow, requests added
Dockerfile βœ… updated ffmpeg apt, git, proper pip install order
.gitignore βœ… updated .env entry added

Removed: Root-level yt_audio_get_tracks.py β€” replaced by modules/yt_audio_get_tracks.py.


Step 1 β€” Fix .gitignore

Problem: .env contains real credentials (HF_TOKEN, CRYPTO_PK) and is not excluded from git tracking.

Action: Add .env to .gitignore.

.env
separated/
*.webm

Warning: Rotate or regenerate the HF_TOKEN and CRYPTO_PK values in .env if they have ever been committed to git or shared publicly.


Step 2 β€” Fix requirements.txt

Current file is missing packages that modules/ and the planned app.py need.

# core audio pipeline
yt-dlp
demucs
pydub
youtube-transcript-api
youtube-channel-transcript-api

# gradio UI + MCP
gradio[mcp]>=5.0

# utility deps used by modules/
python-dotenv
numpy
Pillow
requests

ffmpeg must be installed at the OS level, not via pip; handle that in Dockerfile. torch and torchaudio are installed separately in Docker.


Step 3 β€” Fix Dockerfile

Current Dockerfile:

  • Missing apt-get install ffmpeg.
  • Missing pip install -r requirements.txt.
  • Missing demucs, yt-dlp, and pydub installs.

Updated Dockerfile structure:

FROM python:3.12-slim

RUN apt-get update && apt-get install -y --no-install-recommends \
        ffmpeg curl unzip git \
    && curl -fsSL https://deno.land/install.sh | sh \
    && cp /root/.deno/bin/deno /usr/local/bin/ \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY requirements.txt .

RUN pip install --no-cache-dir torch torchaudio \
    --index-url https://download.pytorch.org/whl/cpu
RUN pip install --no-cache-dir gradio[mcp] transformers
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 7860
CMD ["python", "app.py"]

For HF Spaces GPU, the base image and torch install may be handled by the runtime instead.


Step 4 β€” Create app.py βœ… COMPLETE

Actual implementation (differs from the original skeleton):

  • Imports from modules.AudioGallery and modules.yt_audio_get_tracks.
  • SEPARATED_DIR = Path("separated").resolve() is used in allowed_paths.
  • audio_gallery_head = f"<script>{modules.AudioGallery.GALLERY_JS}</script>" is injected via demo.launch(head=...).
  • _extract_video_id(video_input) accepts raw YouTube IDs plus supported YouTube URL formats and returns the canonical video ID.
  • _prepare_uploaded_audio(uploaded_audio) copies .wav/.mp3 uploads into separated/ and derives a sanitized local job_id.
  • Two processing functions:
    • process_video(video_id) β€” simple, MCP-exposed tool.
    • process_video_with_progress(video_id, uploaded_audio) β€” UI handler.
  • UI: YouTube Video ID or URL input + Separate Tracks button + Audio File Override (.wav or .mp3) upload β†’ Progress textbox β†’ AudioGallery HTML β†’ footer.
  • UI handler uses progress=gr.Progress(track_tqdm=True).
  • If an upload is present, it overrides the YouTube field and skips download_audio().
  • Button is wired to process_video_with_progress β†’ [audio_output, progress_output].
  • demo.launch(mcp_server=True, head=audio_gallery_head, allowed_paths=[str(SEPARATED_DIR)]).
  • Audio URLs are built with modules.file_utils.make_gradio_file_url() so the /gradio_api/file= endpoint receives a safe relative path.
  • gr.set_static_paths(paths=["separated/", ".separated/"]) registers local output folders for direct Gradio serving.

Step 5 β€” Implement AudioGallery Component βœ… COMPLETE

Actual implementation β€” moved to modules/AudioGallery.py:

  • _CSS β€” module-level string covering the gallery grid and controls.
  • GALLERY_JS β€” module-level string loaded globally through demo.launch(head=...); defines formatTime(), drawWaveform(), initAudioItem(), and a MutationObserver.
  • AudioGallery(gr.HTML):
    • DEFAULT_LABELS = ["Drums", "Vocals", "Guitar", "Bass", "Other", "Piano", "Music"]
    • __init__(audio_urls, *, labels, columns=3, ...)
    • _build_html(audio_urls, labels, columns)
  • data-initialized="false" prevents double event binding on Gradio re-renders.
  • app.py calls AudioGallery._build_html(...) directly.
  • Play buttons use type="button" so they do not submit the Gradio form.
  • Each stem card also renders a download link directly below the play button.

The time display is client-side and comes from the <audio> element runtime playback state, not from the URL string.


Step 6 β€” MCP Server Integration βœ… COMPLETE

  • demo.launch(mcp_server=True) exposes /gradio_api/mcp/sse.
  • process_video() is the MCP-exposed tool.
  • jCodeMunch MCP server is also configured in .claude/settings.json.

Step 7 β€” Fix modules/constants.py for Local Dev βœ… COMPLETE

.env is present with HF_TOKEN, so no code change was needed.

Note: constants.py also imports numpy and python-dotenv, both of which must remain in requirements.txt.


Step 8 β€” Local Run Verification

# Prerequisites
# - Python 3.12
# - ffmpeg in PATH
# - .env file with HF_TOKEN set

pip install -r requirements.txt
python app.py
# β†’ Open http://localhost:7860
# β†’ Enter a YouTube video ID or full URL, or upload a .wav/.mp3 file
# β†’ Click "Separate Tracks"
# β†’ Verify 7 stems appear in AudioGallery
# β†’ Verify each stem includes a working download link below the play button
# β†’ Verify MCP endpoint at http://localhost:7860/gradio_api/mcp/sse

Step 9 β€” Docker Verification

docker build -t separatetracks .
docker run -p 7860:7860 --env-file .env separatetracks
# β†’ Open http://localhost:7860 and verify the same behavior as Step 8

Step 10 β€” HuggingFace Space Deployment

  1. README.md already has the correct HF Space header (sdk: docker, app_file: app.py).
  2. Push to the Surn/SeparateTracks HF Space repo.
  3. Set Space secrets: HF_TOKEN, CRYPTO_PK, HF_REPO_ID, SPACE_NAME.
  4. Space auto-builds from Dockerfile on push.

Dependency Map

app.py
 β”œβ”€β”€ modules/AudioGallery.py
 β”‚    └── gradio (pip)
 β”œβ”€β”€ modules/yt_audio_get_tracks.py
 β”‚    β”œβ”€β”€ yt-dlp (pip)
 β”‚    β”œβ”€β”€ pydub (pip) β†’ ffmpeg (apt)
 β”‚    └── demucs (pip) β†’ torch (pip)
 β”œβ”€β”€ modules/constants.py
 β”‚    β”œβ”€β”€ python-dotenv (pip)
 β”‚    └── numpy (pip)
 β”œβ”€β”€ modules/version_info.py
 β”‚    └── gradio + torch (pip)
 └── modules/file_utils.py
      β”œβ”€β”€ Pillow (pip)
      └── requests (pip)

File Checklist

# File Action Done
1 .gitignore Add .env entry [x]
2 requirements.txt Add gradio, dotenv, numpy, Pillow, requests [x]
3 Dockerfile Add ffmpeg apt, fix pip installs [x]
4 app.py Create Gradio app with AudioGallery + MCP [x]
5 modules/AudioGallery.py AudioGallery(gr.HTML) component [x]
6 modules/AudioGallery.pyi Type stub [x]
7 modules/yt_audio_get_tracks.py Moved from root + progress callbacks added [x]
8 .claude/settings.json jCodeMunch MCP server config [x]
9 modules/constants.py Verify local-safe [x]
10 Local run Step 8 verification [ ]
11 Docker build Step 9 verification [ ]
12 HF Space deploy Step 10 push [ ]

Notes

  • Deno: Required by yt-dlp for some YouTube JS extraction. Docker installs it from deno.land/install.sh. Locally, download deno.exe and add it to PATH.
  • Demucs model: htdemucs_6s downloads on first run unless pre-cached.
  • Python style: Black + ruff + isort per agent conventions.
  • AudioGallery JS: Use {{ }} for JS template literals inside Python f-strings.