SeparateTracks / specs /build.md
Surn's picture
Refactor SeparateTracks application for enhanced audio processing and UI features
36e50f2
# SeparateTracks β€” Build Plan
## Goal
Produce a running Gradio application (`app.py`) that accepts a YouTube ID,
YouTube URL, or uploaded `.wav`/`.mp3` audio, separates it into instrument
stems via Demucs, displays results in an `AudioGallery` UI, and exposes an MCP
server endpoint β€” deployable locally and as a HuggingFace Docker Space
(`Surn/SeparateTracks`).
---
## Project Map
| File | Status | Purpose |
| ---- | ------ | ------- |
| `app.py` | βœ… created | Gradio UI entry point + MCP server |
| `modules/AudioGallery.py` | βœ… created | `AudioGallery(gr.HTML)` β€” 7-stem audio grid with play and download controls |
| `modules/AudioGallery.pyi` | βœ… created | Type stub for AudioGallery |
| `modules/yt_audio_get_tracks.py` | βœ… moved + updated | `download_audio()` + `separate_tracks()` with progress callbacks |
| `modules/constants.py` | exists | Env vars, shared constants |
| `modules/version_info.py` | exists | Footer HTML with versions |
| `modules/file_utils.py` | exists | File helper utilities |
| `requirements.txt` | βœ… updated | gradio[mcp], python-dotenv, numpy, Pillow, requests added |
| `Dockerfile` | βœ… updated | ffmpeg apt, git, proper pip install order |
| `.gitignore` | βœ… updated | `.env` entry added |
> **Removed:** Root-level `yt_audio_get_tracks.py` β€” replaced by
> `modules/yt_audio_get_tracks.py`.
---
## Step 1 β€” Fix `.gitignore`
**Problem:** `.env` contains real credentials (`HF_TOKEN`, `CRYPTO_PK`) and is not
excluded from git tracking.
**Action:** Add `.env` to `.gitignore`.
```gitignore
.env
separated/
*.webm
```
> **Warning:** Rotate or regenerate the `HF_TOKEN` and `CRYPTO_PK` values in `.env`
> if they have ever been committed to git or shared publicly.
---
## Step 2 β€” Fix `requirements.txt`
Current file is missing packages that `modules/` and the planned `app.py` need.
```txt
# core audio pipeline
yt-dlp
demucs
pydub
youtube-transcript-api
youtube-channel-transcript-api
# gradio UI + MCP
gradio[mcp]>=5.0
# utility deps used by modules/
python-dotenv
numpy
Pillow
requests
```
> `ffmpeg` must be installed at the OS level, not via pip; handle that in
> `Dockerfile`. `torch` and `torchaudio` are installed separately in Docker.
---
## Step 3 β€” Fix `Dockerfile`
Current Dockerfile:
- Missing `apt-get install ffmpeg`.
- Missing `pip install -r requirements.txt`.
- Missing demucs, yt-dlp, and pydub installs.
Updated Dockerfile structure:
```dockerfile
FROM python:3.12-slim
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg curl unzip git \
&& curl -fsSL https://deno.land/install.sh | sh \
&& cp /root/.deno/bin/deno /usr/local/bin/ \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir torch torchaudio \
--index-url https://download.pytorch.org/whl/cpu
RUN pip install --no-cache-dir gradio[mcp] transformers
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 7860
CMD ["python", "app.py"]
```
> For HF Spaces GPU, the base image and torch install may be handled by the
> runtime instead.
---
## Step 4 β€” Create `app.py` βœ… COMPLETE
**Actual implementation** (differs from the original skeleton):
- Imports from `modules.AudioGallery` and `modules.yt_audio_get_tracks`.
- `SEPARATED_DIR = Path("separated").resolve()` is used in `allowed_paths`.
- `audio_gallery_head = f"<script>{modules.AudioGallery.GALLERY_JS}</script>"`
is injected via `demo.launch(head=...)`.
- `_extract_video_id(video_input)` accepts raw YouTube IDs plus supported
YouTube URL formats and returns the canonical video ID.
- `_prepare_uploaded_audio(uploaded_audio)` copies `.wav`/`.mp3` uploads into
`separated/` and derives a sanitized local `job_id`.
- Two processing functions:
- `process_video(video_id)` β€” simple, MCP-exposed tool.
- `process_video_with_progress(video_id, uploaded_audio)` β€” UI handler.
- UI: `YouTube Video ID or URL` input + `Separate Tracks` button +
`Audio File Override (.wav or .mp3)` upload β†’ Progress textbox β†’
AudioGallery HTML β†’ footer.
- UI handler uses `progress=gr.Progress(track_tqdm=True)`.
- If an upload is present, it overrides the YouTube field and skips
`download_audio()`.
- Button is wired to `process_video_with_progress` β†’
`[audio_output, progress_output]`.
- `demo.launch(mcp_server=True, head=audio_gallery_head, allowed_paths=[str(SEPARATED_DIR)])`.
- Audio URLs are built with `modules.file_utils.make_gradio_file_url()` so the
`/gradio_api/file=` endpoint receives a safe relative path.
- `gr.set_static_paths(paths=["separated/", ".separated/"])` registers local
output folders for direct Gradio serving.
---
## Step 5 β€” Implement `AudioGallery` Component βœ… COMPLETE
**Actual implementation** β€” moved to `modules/AudioGallery.py`:
- `_CSS` β€” module-level string covering the gallery grid and controls.
- `GALLERY_JS` β€” module-level string loaded globally through
`demo.launch(head=...)`; defines `formatTime()`, `drawWaveform()`,
`initAudioItem()`, and a `MutationObserver`.
- `AudioGallery(gr.HTML)`:
- `DEFAULT_LABELS = ["Drums", "Vocals", "Guitar", "Bass", "Other",
"Piano", "Music"]`
- `__init__(audio_urls, *, labels, columns=3, ...)`
- `_build_html(audio_urls, labels, columns)`
- `data-initialized="false"` prevents double event binding on Gradio re-renders.
- `app.py` calls `AudioGallery._build_html(...)` directly.
- Play buttons use `type="button"` so they do not submit the Gradio form.
- Each stem card also renders a download link directly below the play button.
> The time display is client-side and comes from the `<audio>` element runtime
> playback state, not from the URL string.
---
## Step 6 β€” MCP Server Integration βœ… COMPLETE
- `demo.launch(mcp_server=True)` exposes `/gradio_api/mcp/sse`.
- `process_video()` is the MCP-exposed tool.
- jCodeMunch MCP server is also configured in `.claude/settings.json`.
---
## Step 7 β€” Fix `modules/constants.py` for Local Dev βœ… COMPLETE
`.env` is present with `HF_TOKEN`, so no code change was needed.
**Note:** `constants.py` also imports `numpy` and `python-dotenv`, both of which
must remain in `requirements.txt`.
---
## Step 8 β€” Local Run Verification
```bash
# Prerequisites
# - Python 3.12
# - ffmpeg in PATH
# - .env file with HF_TOKEN set
pip install -r requirements.txt
python app.py
# β†’ Open http://localhost:7860
# β†’ Enter a YouTube video ID or full URL, or upload a .wav/.mp3 file
# β†’ Click "Separate Tracks"
# β†’ Verify 7 stems appear in AudioGallery
# β†’ Verify each stem includes a working download link below the play button
# β†’ Verify MCP endpoint at http://localhost:7860/gradio_api/mcp/sse
```
---
## Step 9 β€” Docker Verification
```bash
docker build -t separatetracks .
docker run -p 7860:7860 --env-file .env separatetracks
# β†’ Open http://localhost:7860 and verify the same behavior as Step 8
```
---
## Step 10 β€” HuggingFace Space Deployment
1. `README.md` already has the correct HF Space header (`sdk: docker`,
`app_file: app.py`).
2. Push to the `Surn/SeparateTracks` HF Space repo.
3. Set Space secrets: `HF_TOKEN`, `CRYPTO_PK`, `HF_REPO_ID`, `SPACE_NAME`.
4. Space auto-builds from `Dockerfile` on push.
---
## Dependency Map
```text
app.py
β”œβ”€β”€ modules/AudioGallery.py
β”‚ └── gradio (pip)
β”œβ”€β”€ modules/yt_audio_get_tracks.py
β”‚ β”œβ”€β”€ yt-dlp (pip)
β”‚ β”œβ”€β”€ pydub (pip) β†’ ffmpeg (apt)
β”‚ └── demucs (pip) β†’ torch (pip)
β”œβ”€β”€ modules/constants.py
β”‚ β”œβ”€β”€ python-dotenv (pip)
β”‚ └── numpy (pip)
β”œβ”€β”€ modules/version_info.py
β”‚ └── gradio + torch (pip)
└── modules/file_utils.py
β”œβ”€β”€ Pillow (pip)
└── requests (pip)
```
---
## File Checklist
| # | File | Action | Done |
| - | ---- | ------ | ---- |
| 1 | `.gitignore` | Add `.env` entry | [x] |
| 2 | `requirements.txt` | Add gradio, dotenv, numpy, Pillow, requests | [x] |
| 3 | `Dockerfile` | Add ffmpeg apt, fix pip installs | [x] |
| 4 | `app.py` | Create Gradio app with AudioGallery + MCP | [x] |
| 5 | `modules/AudioGallery.py` | AudioGallery(gr.HTML) component | [x] |
| 6 | `modules/AudioGallery.pyi` | Type stub | [x] |
| 7 | `modules/yt_audio_get_tracks.py` | Moved from root + progress callbacks added | [x] |
| 8 | `.claude/settings.json` | jCodeMunch MCP server config | [x] |
| 9 | `modules/constants.py` | Verify local-safe | [x] |
| 10 | Local run | Step 8 verification | [ ] |
| 11 | Docker build | Step 9 verification | [ ] |
| 12 | HF Space deploy | Step 10 push | [ ] |
---
## Notes
- **Deno:** Required by yt-dlp for some YouTube JS extraction. Docker installs it
from `deno.land/install.sh`. Locally, download `deno.exe` and add it to PATH.
- **Demucs model:** `htdemucs_6s` downloads on first run unless pre-cached.
- **Python style:** Black + ruff + isort per agent conventions.
- **AudioGallery JS:** Use `{{ }}` for JS template literals inside Python
f-strings.