# SeparateTracks — Build Plan
## Goal
Produce a running Gradio application (`app.py`) that accepts a YouTube ID,
YouTube URL, or uploaded `.wav`/`.mp3` audio, separates it into instrument
stems via Demucs, displays results in an `AudioGallery` UI, and exposes an MCP
server endpoint — deployable locally and as a HuggingFace Docker Space
(`Surn/SeparateTracks`).
---
## Project Map
| File | Status | Purpose |
| ---- | ------ | ------- |
| `app.py` | ✅ created | Gradio UI entry point + MCP server |
| `modules/AudioGallery.py` | ✅ created | `AudioGallery(gr.HTML)` — 7-stem audio grid with play and download controls |
| `modules/AudioGallery.pyi` | ✅ created | Type stub for AudioGallery |
| `modules/yt_audio_get_tracks.py` | ✅ moved + updated | `download_audio()` + `separate_tracks()` with progress callbacks |
| `modules/constants.py` | exists | Env vars, shared constants |
| `modules/version_info.py` | exists | Footer HTML with versions |
| `modules/file_utils.py` | exists | File helper utilities |
| `requirements.txt` | ✅ updated | gradio[mcp], python-dotenv, numpy, Pillow, requests added |
| `Dockerfile` | ✅ updated | ffmpeg apt, git, proper pip install order |
| `.gitignore` | ✅ updated | `.env` entry added |
> **Removed:** Root-level `yt_audio_get_tracks.py` — replaced by
> `modules/yt_audio_get_tracks.py`.
---
## Step 1 — Fix `.gitignore`
**Problem:** `.env` contains real credentials (`HF_TOKEN`, `CRYPTO_PK`) and is not
excluded from git tracking.
**Action:** Add `.env` to `.gitignore`.
```gitignore
.env
separated/
*.webm
```
> **Warning:** Rotate or regenerate the `HF_TOKEN` and `CRYPTO_PK` values in `.env`
> if they have ever been committed to git or shared publicly.
---
## Step 2 — Fix `requirements.txt`
Current file is missing packages that `modules/` and the planned `app.py` need.
```txt
# core audio pipeline
yt-dlp
demucs
pydub
youtube-transcript-api
youtube-channel-transcript-api
# gradio UI + MCP
gradio[mcp]>=5.0
# utility deps used by modules/
python-dotenv
numpy
Pillow
requests
```
> `ffmpeg` must be installed at the OS level, not via pip; handle that in
> `Dockerfile`. `torch` and `torchaudio` are installed separately in Docker.
---
## Step 3 — Fix `Dockerfile`
Current Dockerfile:
- Missing `apt-get install ffmpeg`.
- Missing `pip install -r requirements.txt`.
- Missing demucs, yt-dlp, and pydub installs.
Updated Dockerfile structure:
```dockerfile
FROM python:3.12-slim
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg curl unzip git \
&& curl -fsSL https://deno.land/install.sh | sh \
&& cp /root/.deno/bin/deno /usr/local/bin/ \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir torch torchaudio \
--index-url https://download.pytorch.org/whl/cpu
RUN pip install --no-cache-dir gradio[mcp] transformers
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 7860
CMD ["python", "app.py"]
```
> For HF Spaces GPU, the base image and torch install may be handled by the
> runtime instead.
---
## Step 4 — Create `app.py` ✅ COMPLETE
**Actual implementation** (differs from the original skeleton):
- Imports from `modules.AudioGallery` and `modules.yt_audio_get_tracks`.
- `SEPARATED_DIR = Path("separated").resolve()` is used in `allowed_paths`.
- `audio_gallery_head = f""`
is injected via `demo.launch(head=...)`.
- `_extract_video_id(video_input)` accepts raw YouTube IDs plus supported
YouTube URL formats and returns the canonical video ID.
- `_prepare_uploaded_audio(uploaded_audio)` copies `.wav`/`.mp3` uploads into
`separated/` and derives a sanitized local `job_id`.
- Two processing functions:
- `process_video(video_id)` — simple, MCP-exposed tool.
- `process_video_with_progress(video_id, uploaded_audio)` — UI handler.
- UI: `YouTube Video ID or URL` input + `Separate Tracks` button +
`Audio File Override (.wav or .mp3)` upload → Progress textbox →
AudioGallery HTML → footer.
- UI handler uses `progress=gr.Progress(track_tqdm=True)`.
- If an upload is present, it overrides the YouTube field and skips
`download_audio()`.
- Button is wired to `process_video_with_progress` →
`[audio_output, progress_output]`.
- `demo.launch(mcp_server=True, head=audio_gallery_head, allowed_paths=[str(SEPARATED_DIR)])`.
- Audio URLs are built with `modules.file_utils.make_gradio_file_url()` so the
`/gradio_api/file=` endpoint receives a safe relative path.
- `gr.set_static_paths(paths=["separated/", ".separated/"])` registers local
output folders for direct Gradio serving.
---
## Step 5 — Implement `AudioGallery` Component ✅ COMPLETE
**Actual implementation** — moved to `modules/AudioGallery.py`:
- `_CSS` — module-level string covering the gallery grid and controls.
- `GALLERY_JS` — module-level string loaded globally through
`demo.launch(head=...)`; defines `formatTime()`, `drawWaveform()`,
`initAudioItem()`, and a `MutationObserver`.
- `AudioGallery(gr.HTML)`:
- `DEFAULT_LABELS = ["Drums", "Vocals", "Guitar", "Bass", "Other",
"Piano", "Music"]`
- `__init__(audio_urls, *, labels, columns=3, ...)`
- `_build_html(audio_urls, labels, columns)`
- `data-initialized="false"` prevents double event binding on Gradio re-renders.
- `app.py` calls `AudioGallery._build_html(...)` directly.
- Play buttons use `type="button"` so they do not submit the Gradio form.
- Each stem card also renders a download link directly below the play button.
> The time display is client-side and comes from the `