Spaces:
Sleeping
SeparateTracks β Build Plan
Goal
Produce a running Gradio application (app.py) that accepts a YouTube ID,
YouTube URL, or uploaded .wav/.mp3 audio, separates it into instrument
stems via Demucs, displays results in an AudioGallery UI, and exposes an MCP
server endpoint β deployable locally and as a HuggingFace Docker Space
(Surn/SeparateTracks).
Project Map
| File | Status | Purpose |
|---|---|---|
app.py |
β created | Gradio UI entry point + MCP server |
modules/AudioGallery.py |
β created | AudioGallery(gr.HTML) β 7-stem audio grid with play and download controls |
modules/AudioGallery.pyi |
β created | Type stub for AudioGallery |
modules/yt_audio_get_tracks.py |
β moved + updated | download_audio() + separate_tracks() with progress callbacks |
modules/constants.py |
exists | Env vars, shared constants |
modules/version_info.py |
exists | Footer HTML with versions |
modules/file_utils.py |
exists | File helper utilities |
requirements.txt |
β updated | gradio[mcp], python-dotenv, numpy, Pillow, requests added |
Dockerfile |
β updated | ffmpeg apt, git, proper pip install order |
.gitignore |
β updated | .env entry added |
Removed: Root-level
yt_audio_get_tracks.pyβ replaced bymodules/yt_audio_get_tracks.py.
Step 1 β Fix .gitignore
Problem: .env contains real credentials (HF_TOKEN, CRYPTO_PK) and is not
excluded from git tracking.
Action: Add .env to .gitignore.
.env
separated/
*.webm
Warning: Rotate or regenerate the
HF_TOKENandCRYPTO_PKvalues in.envif they have ever been committed to git or shared publicly.
Step 2 β Fix requirements.txt
Current file is missing packages that modules/ and the planned app.py need.
# core audio pipeline
yt-dlp
demucs
pydub
youtube-transcript-api
youtube-channel-transcript-api
# gradio UI + MCP
gradio[mcp]>=5.0
# utility deps used by modules/
python-dotenv
numpy
Pillow
requests
ffmpegmust be installed at the OS level, not via pip; handle that inDockerfile.torchandtorchaudioare installed separately in Docker.
Step 3 β Fix Dockerfile
Current Dockerfile:
- Missing
apt-get install ffmpeg. - Missing
pip install -r requirements.txt. - Missing demucs, yt-dlp, and pydub installs.
Updated Dockerfile structure:
FROM python:3.12-slim
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg curl unzip git \
&& curl -fsSL https://deno.land/install.sh | sh \
&& cp /root/.deno/bin/deno /usr/local/bin/ \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir torch torchaudio \
--index-url https://download.pytorch.org/whl/cpu
RUN pip install --no-cache-dir gradio[mcp] transformers
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 7860
CMD ["python", "app.py"]
For HF Spaces GPU, the base image and torch install may be handled by the runtime instead.
Step 4 β Create app.py β
COMPLETE
Actual implementation (differs from the original skeleton):
- Imports from
modules.AudioGalleryandmodules.yt_audio_get_tracks. SEPARATED_DIR = Path("separated").resolve()is used inallowed_paths.audio_gallery_head = f"<script>{modules.AudioGallery.GALLERY_JS}</script>"is injected viademo.launch(head=...)._extract_video_id(video_input)accepts raw YouTube IDs plus supported YouTube URL formats and returns the canonical video ID._prepare_uploaded_audio(uploaded_audio)copies.wav/.mp3uploads intoseparated/and derives a sanitized localjob_id.- Two processing functions:
process_video(video_id)β simple, MCP-exposed tool.process_video_with_progress(video_id, uploaded_audio)β UI handler.
- UI:
YouTube Video ID or URLinput +Separate Tracksbutton +Audio File Override (.wav or .mp3)upload β Progress textbox β AudioGallery HTML β footer. - UI handler uses
progress=gr.Progress(track_tqdm=True). - If an upload is present, it overrides the YouTube field and skips
download_audio(). - Button is wired to
process_video_with_progressβ[audio_output, progress_output]. demo.launch(mcp_server=True, head=audio_gallery_head, allowed_paths=[str(SEPARATED_DIR)]).- Audio URLs are built with
modules.file_utils.make_gradio_file_url()so the/gradio_api/file=endpoint receives a safe relative path. gr.set_static_paths(paths=["separated/", ".separated/"])registers local output folders for direct Gradio serving.
Step 5 β Implement AudioGallery Component β
COMPLETE
Actual implementation β moved to modules/AudioGallery.py:
_CSSβ module-level string covering the gallery grid and controls.GALLERY_JSβ module-level string loaded globally throughdemo.launch(head=...); definesformatTime(),drawWaveform(),initAudioItem(), and aMutationObserver.AudioGallery(gr.HTML):DEFAULT_LABELS = ["Drums", "Vocals", "Guitar", "Bass", "Other", "Piano", "Music"]__init__(audio_urls, *, labels, columns=3, ...)_build_html(audio_urls, labels, columns)
data-initialized="false"prevents double event binding on Gradio re-renders.app.pycallsAudioGallery._build_html(...)directly.- Play buttons use
type="button"so they do not submit the Gradio form. - Each stem card also renders a download link directly below the play button.
The time display is client-side and comes from the
<audio>element runtime playback state, not from the URL string.
Step 6 β MCP Server Integration β COMPLETE
demo.launch(mcp_server=True)exposes/gradio_api/mcp/sse.process_video()is the MCP-exposed tool.- jCodeMunch MCP server is also configured in
.claude/settings.json.
Step 7 β Fix modules/constants.py for Local Dev β
COMPLETE
.env is present with HF_TOKEN, so no code change was needed.
Note: constants.py also imports numpy and python-dotenv, both of which
must remain in requirements.txt.
Step 8 β Local Run Verification
# Prerequisites
# - Python 3.12
# - ffmpeg in PATH
# - .env file with HF_TOKEN set
pip install -r requirements.txt
python app.py
# β Open http://localhost:7860
# β Enter a YouTube video ID or full URL, or upload a .wav/.mp3 file
# β Click "Separate Tracks"
# β Verify 7 stems appear in AudioGallery
# β Verify each stem includes a working download link below the play button
# β Verify MCP endpoint at http://localhost:7860/gradio_api/mcp/sse
Step 9 β Docker Verification
docker build -t separatetracks .
docker run -p 7860:7860 --env-file .env separatetracks
# β Open http://localhost:7860 and verify the same behavior as Step 8
Step 10 β HuggingFace Space Deployment
README.mdalready has the correct HF Space header (sdk: docker,app_file: app.py).- Push to the
Surn/SeparateTracksHF Space repo. - Set Space secrets:
HF_TOKEN,CRYPTO_PK,HF_REPO_ID,SPACE_NAME. - Space auto-builds from
Dockerfileon push.
Dependency Map
app.py
βββ modules/AudioGallery.py
β βββ gradio (pip)
βββ modules/yt_audio_get_tracks.py
β βββ yt-dlp (pip)
β βββ pydub (pip) β ffmpeg (apt)
β βββ demucs (pip) β torch (pip)
βββ modules/constants.py
β βββ python-dotenv (pip)
β βββ numpy (pip)
βββ modules/version_info.py
β βββ gradio + torch (pip)
βββ modules/file_utils.py
βββ Pillow (pip)
βββ requests (pip)
File Checklist
| # | File | Action | Done |
|---|---|---|---|
| 1 | .gitignore |
Add .env entry |
[x] |
| 2 | requirements.txt |
Add gradio, dotenv, numpy, Pillow, requests | [x] |
| 3 | Dockerfile |
Add ffmpeg apt, fix pip installs | [x] |
| 4 | app.py |
Create Gradio app with AudioGallery + MCP | [x] |
| 5 | modules/AudioGallery.py |
AudioGallery(gr.HTML) component | [x] |
| 6 | modules/AudioGallery.pyi |
Type stub | [x] |
| 7 | modules/yt_audio_get_tracks.py |
Moved from root + progress callbacks added | [x] |
| 8 | .claude/settings.json |
jCodeMunch MCP server config | [x] |
| 9 | modules/constants.py |
Verify local-safe | [x] |
| 10 | Local run | Step 8 verification | [ ] |
| 11 | Docker build | Step 9 verification | [ ] |
| 12 | HF Space deploy | Step 10 push | [ ] |
Notes
- Deno: Required by yt-dlp for some YouTube JS extraction. Docker installs it
from
deno.land/install.sh. Locally, downloaddeno.exeand add it to PATH. - Demucs model:
htdemucs_6sdownloads on first run unless pre-cached. - Python style: Black + ruff + isort per agent conventions.
- AudioGallery JS: Use
{{ }}for JS template literals inside Python f-strings.