Spaces:

Surn
/

SeparateTracks

Sleeping

App Files Files Community

SeparateTracks / specs /build.md

Surn

Refactor SeparateTracks application for enhanced audio processing and UI features

36e50f2 about 1 month ago

preview code

raw

history blame contribute delete

9.03 kB

SeparateTracks — Build Plan

Goal

Produce a running Gradio application (app.py) that accepts a YouTube ID, YouTube URL, or uploaded .wav/.mp3 audio, separates it into instrument stems via Demucs, displays results in an AudioGallery UI, and exposes an MCP server endpoint — deployable locally and as a HuggingFace Docker Space (Surn/SeparateTracks).

Project Map

File	Status	Purpose
`app.py`	✅ created	Gradio UI entry point + MCP server
`modules/AudioGallery.py`	✅ created	`AudioGallery(gr.HTML)` — 7-stem audio grid with play and download controls
`modules/AudioGallery.pyi`	✅ created	Type stub for AudioGallery
`modules/yt_audio_get_tracks.py`	✅ moved + updated	`download_audio()` + `separate_tracks()` with progress callbacks
`modules/constants.py`	exists	Env vars, shared constants
`modules/version_info.py`	exists	Footer HTML with versions
`modules/file_utils.py`	exists	File helper utilities
`requirements.txt`	✅ updated	gradio[mcp], python-dotenv, numpy, Pillow, requests added
`Dockerfile`	✅ updated	ffmpeg apt, git, proper pip install order
`.gitignore`	✅ updated	`.env` entry added

Removed: Root-level yt_audio_get_tracks.py — replaced by modules/yt_audio_get_tracks.py.

Step 1 — Fix `.gitignore`

Problem: .env contains real credentials (HF_TOKEN, CRYPTO_PK) and is not excluded from git tracking.

Action: Add .env to .gitignore.

.env
separated/
*.webm

Warning: Rotate or regenerate the HF_TOKEN and CRYPTO_PK values in .env if they have ever been committed to git or shared publicly.

Step 2 — Fix `requirements.txt`

Current file is missing packages that modules/ and the planned app.py need.

# core audio pipeline
yt-dlp
demucs
pydub
youtube-transcript-api
youtube-channel-transcript-api

# gradio UI + MCP
gradio[mcp]>=5.0

# utility deps used by modules/
python-dotenv
numpy
Pillow
requests

ffmpeg must be installed at the OS level, not via pip; handle that in Dockerfile. torch and torchaudio are installed separately in Docker.

Step 3 — Fix `Dockerfile`

Current Dockerfile:

Missing apt-get install ffmpeg.
Missing pip install -r requirements.txt.
Missing demucs, yt-dlp, and pydub installs.

Updated Dockerfile structure:

FROM python:3.12-slim

RUN apt-get update && apt-get install -y --no-install-recommends \
        ffmpeg curl unzip git \
    && curl -fsSL https://deno.land/install.sh | sh \
    && cp /root/.deno/bin/deno /usr/local/bin/ \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY requirements.txt .

RUN pip install --no-cache-dir torch torchaudio \
    --index-url https://download.pytorch.org/whl/cpu
RUN pip install --no-cache-dir gradio[mcp] transformers
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 7860
CMD ["python", "app.py"]

For HF Spaces GPU, the base image and torch install may be handled by the runtime instead.

Step 4 — Create `app.py` ✅ COMPLETE

Actual implementation (differs from the original skeleton):

Imports from modules.AudioGallery and modules.yt_audio_get_tracks.
SEPARATED_DIR = Path("separated").resolve() is used in allowed_paths.
audio_gallery_head = f"<script>{modules.AudioGallery.GALLERY_JS}</script>" is injected via demo.launch(head=...).
_extract_video_id(video_input) accepts raw YouTube IDs plus supported YouTube URL formats and returns the canonical video ID.
_prepare_uploaded_audio(uploaded_audio) copies .wav/.mp3 uploads into separated/ and derives a sanitized local job_id.
Two processing functions:
- process_video(video_id) — simple, MCP-exposed tool.
- process_video_with_progress(video_id, uploaded_audio) — UI handler.
UI: YouTube Video ID or URL input + Separate Tracks button + Audio File Override (.wav or .mp3) upload → Progress textbox → AudioGallery HTML → footer.
UI handler uses progress=gr.Progress(track_tqdm=True).
If an upload is present, it overrides the YouTube field and skips download_audio().
Button is wired to process_video_with_progress → [audio_output, progress_output].
demo.launch(mcp_server=True, head=audio_gallery_head, allowed_paths=[str(SEPARATED_DIR)]).
Audio URLs are built with modules.file_utils.make_gradio_file_url() so the /gradio_api/file= endpoint receives a safe relative path.
gr.set_static_paths(paths=["separated/", ".separated/"]) registers local output folders for direct Gradio serving.

Step 5 — Implement `AudioGallery` Component ✅ COMPLETE

Actual implementation — moved to modules/AudioGallery.py:

_CSS — module-level string covering the gallery grid and controls.
GALLERY_JS — module-level string loaded globally through demo.launch(head=...); defines formatTime(), drawWaveform(), initAudioItem(), and a MutationObserver.
AudioGallery(gr.HTML):
- DEFAULT_LABELS = ["Drums", "Vocals", "Guitar", "Bass", "Other", "Piano", "Music"]
- __init__(audio_urls, *, labels, columns=3, ...)
- _build_html(audio_urls, labels, columns)
data-initialized="false" prevents double event binding on Gradio re-renders.
app.py calls AudioGallery._build_html(...) directly.
Play buttons use type="button" so they do not submit the Gradio form.
Each stem card also renders a download link directly below the play button.

The time display is client-side and comes from the <audio> element runtime playback state, not from the URL string.

Step 6 — MCP Server Integration ✅ COMPLETE

demo.launch(mcp_server=True) exposes /gradio_api/mcp/sse.
process_video() is the MCP-exposed tool.
jCodeMunch MCP server is also configured in .claude/settings.json.

Step 7 — Fix `modules/constants.py` for Local Dev ✅ COMPLETE

.env is present with HF_TOKEN, so no code change was needed.

Note: constants.py also imports numpy and python-dotenv, both of which must remain in requirements.txt.

Step 8 — Local Run Verification

# Prerequisites
# - Python 3.12
# - ffmpeg in PATH
# - .env file with HF_TOKEN set

pip install -r requirements.txt
python app.py
# → Open http://localhost:7860
# → Enter a YouTube video ID or full URL, or upload a .wav/.mp3 file
# → Click "Separate Tracks"
# → Verify 7 stems appear in AudioGallery
# → Verify each stem includes a working download link below the play button
# → Verify MCP endpoint at http://localhost:7860/gradio_api/mcp/sse

Step 9 — Docker Verification

docker build -t separatetracks .
docker run -p 7860:7860 --env-file .env separatetracks
# → Open http://localhost:7860 and verify the same behavior as Step 8

Step 10 — HuggingFace Space Deployment

README.md already has the correct HF Space header (sdk: docker, app_file: app.py).
Push to the Surn/SeparateTracks HF Space repo.
Set Space secrets: HF_TOKEN, CRYPTO_PK, HF_REPO_ID, SPACE_NAME.
Space auto-builds from Dockerfile on push.

Dependency Map

app.py
 ├── modules/AudioGallery.py
 │    └── gradio (pip)
 ├── modules/yt_audio_get_tracks.py
 │    ├── yt-dlp (pip)
 │    ├── pydub (pip) → ffmpeg (apt)
 │    └── demucs (pip) → torch (pip)
 ├── modules/constants.py
 │    ├── python-dotenv (pip)
 │    └── numpy (pip)
 ├── modules/version_info.py
 │    └── gradio + torch (pip)
 └── modules/file_utils.py
      ├── Pillow (pip)
      └── requests (pip)

File Checklist

#	File	Action	Done
1	`.gitignore`	Add `.env` entry	[x]
2	`requirements.txt`	Add gradio, dotenv, numpy, Pillow, requests	[x]
3	`Dockerfile`	Add ffmpeg apt, fix pip installs	[x]
4	`app.py`	Create Gradio app with AudioGallery + MCP	[x]
5	`modules/AudioGallery.py`	AudioGallery(gr.HTML) component	[x]
6	`modules/AudioGallery.pyi`	Type stub	[x]
7	`modules/yt_audio_get_tracks.py`	Moved from root + progress callbacks added	[x]
8	`.claude/settings.json`	jCodeMunch MCP server config	[x]
9	`modules/constants.py`	Verify local-safe	[x]
10	Local run	Step 8 verification	[ ]
11	Docker build	Step 9 verification	[ ]
12	HF Space deploy	Step 10 push	[ ]

Notes

Deno: Required by yt-dlp for some YouTube JS extraction. Docker installs it from deno.land/install.sh. Locally, download deno.exe and add it to PATH.
Demucs model: htdemucs_6s downloads on first run unless pre-cached.
Python style: Black + ruff + isort per agent conventions.
AudioGallery JS: Use {{ }} for JS template literals inside Python f-strings.