# SeparateTracks — Build Plan ## Goal Produce a running Gradio application (`app.py`) that accepts a YouTube ID, YouTube URL, or uploaded `.wav`/`.mp3` audio, separates it into instrument stems via Demucs, displays results in an `AudioGallery` UI, and exposes an MCP server endpoint — deployable locally and as a HuggingFace Docker Space (`Surn/SeparateTracks`). --- ## Project Map | File | Status | Purpose | | ---- | ------ | ------- | | `app.py` | ✅ created | Gradio UI entry point + MCP server | | `modules/AudioGallery.py` | ✅ created | `AudioGallery(gr.HTML)` — 7-stem audio grid with play and download controls | | `modules/AudioGallery.pyi` | ✅ created | Type stub for AudioGallery | | `modules/yt_audio_get_tracks.py` | ✅ moved + updated | `download_audio()` + `separate_tracks()` with progress callbacks | | `modules/constants.py` | exists | Env vars, shared constants | | `modules/version_info.py` | exists | Footer HTML with versions | | `modules/file_utils.py` | exists | File helper utilities | | `requirements.txt` | ✅ updated | gradio[mcp], python-dotenv, numpy, Pillow, requests added | | `Dockerfile` | ✅ updated | ffmpeg apt, git, proper pip install order | | `.gitignore` | ✅ updated | `.env` entry added | > **Removed:** Root-level `yt_audio_get_tracks.py` — replaced by > `modules/yt_audio_get_tracks.py`. --- ## Step 1 — Fix `.gitignore` **Problem:** `.env` contains real credentials (`HF_TOKEN`, `CRYPTO_PK`) and is not excluded from git tracking. **Action:** Add `.env` to `.gitignore`. ```gitignore .env separated/ *.webm ``` > **Warning:** Rotate or regenerate the `HF_TOKEN` and `CRYPTO_PK` values in `.env` > if they have ever been committed to git or shared publicly. --- ## Step 2 — Fix `requirements.txt` Current file is missing packages that `modules/` and the planned `app.py` need. ```txt # core audio pipeline yt-dlp demucs pydub youtube-transcript-api youtube-channel-transcript-api # gradio UI + MCP gradio[mcp]>=5.0 # utility deps used by modules/ python-dotenv numpy Pillow requests ``` > `ffmpeg` must be installed at the OS level, not via pip; handle that in > `Dockerfile`. `torch` and `torchaudio` are installed separately in Docker. --- ## Step 3 — Fix `Dockerfile` Current Dockerfile: - Missing `apt-get install ffmpeg`. - Missing `pip install -r requirements.txt`. - Missing demucs, yt-dlp, and pydub installs. Updated Dockerfile structure: ```dockerfile FROM python:3.12-slim RUN apt-get update && apt-get install -y --no-install-recommends \ ffmpeg curl unzip git \ && curl -fsSL https://deno.land/install.sh | sh \ && cp /root/.deno/bin/deno /usr/local/bin/ \ && rm -rf /var/lib/apt/lists/* WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir torch torchaudio \ --index-url https://download.pytorch.org/whl/cpu RUN pip install --no-cache-dir gradio[mcp] transformers RUN pip install --no-cache-dir -r requirements.txt COPY . . EXPOSE 7860 CMD ["python", "app.py"] ``` > For HF Spaces GPU, the base image and torch install may be handled by the > runtime instead. --- ## Step 4 — Create `app.py` ✅ COMPLETE **Actual implementation** (differs from the original skeleton): - Imports from `modules.AudioGallery` and `modules.yt_audio_get_tracks`. - `SEPARATED_DIR = Path("separated").resolve()` is used in `allowed_paths`. - `audio_gallery_head = f""` is injected via `demo.launch(head=...)`. - `_extract_video_id(video_input)` accepts raw YouTube IDs plus supported YouTube URL formats and returns the canonical video ID. - `_prepare_uploaded_audio(uploaded_audio)` copies `.wav`/`.mp3` uploads into `separated/` and derives a sanitized local `job_id`. - Two processing functions: - `process_video(video_id)` — simple, MCP-exposed tool. - `process_video_with_progress(video_id, uploaded_audio)` — UI handler. - UI: `YouTube Video ID or URL` input + `Separate Tracks` button + `Audio File Override (.wav or .mp3)` upload → Progress textbox → AudioGallery HTML → footer. - UI handler uses `progress=gr.Progress(track_tqdm=True)`. - If an upload is present, it overrides the YouTube field and skips `download_audio()`. - Button is wired to `process_video_with_progress` → `[audio_output, progress_output]`. - `demo.launch(mcp_server=True, head=audio_gallery_head, allowed_paths=[str(SEPARATED_DIR)])`. - Audio URLs are built with `modules.file_utils.make_gradio_file_url()` so the `/gradio_api/file=` endpoint receives a safe relative path. - `gr.set_static_paths(paths=["separated/", ".separated/"])` registers local output folders for direct Gradio serving. --- ## Step 5 — Implement `AudioGallery` Component ✅ COMPLETE **Actual implementation** — moved to `modules/AudioGallery.py`: - `_CSS` — module-level string covering the gallery grid and controls. - `GALLERY_JS` — module-level string loaded globally through `demo.launch(head=...)`; defines `formatTime()`, `drawWaveform()`, `initAudioItem()`, and a `MutationObserver`. - `AudioGallery(gr.HTML)`: - `DEFAULT_LABELS = ["Drums", "Vocals", "Guitar", "Bass", "Other", "Piano", "Music"]` - `__init__(audio_urls, *, labels, columns=3, ...)` - `_build_html(audio_urls, labels, columns)` - `data-initialized="false"` prevents double event binding on Gradio re-renders. - `app.py` calls `AudioGallery._build_html(...)` directly. - Play buttons use `type="button"` so they do not submit the Gradio form. - Each stem card also renders a download link directly below the play button. > The time display is client-side and comes from the `