Spaces:

Surn
/

SeparateTracks

Sleeping

App Files Files Community

SeparateTracks / specs /build.md

Surn

Refactor SeparateTracks application for enhanced audio processing and UI features

36e50f2 about 1 month ago

preview code

raw

history blame contribute delete

9.03 kB

	# SeparateTracks — Build Plan

	## Goal

	Produce a running Gradio application (`app.py`) that accepts a YouTube ID,
	YouTube URL, or uploaded `.wav`/`.mp3` audio, separates it into instrument
	stems via Demucs, displays results in an `AudioGallery` UI, and exposes an MCP
	server endpoint — deployable locally and as a HuggingFace Docker Space
	(`Surn/SeparateTracks`).

	---

	## Project Map

	\| File \| Status \| Purpose \|
	\| ---- \| ------ \| ------- \|
	\| `app.py` \| ✅ created \| Gradio UI entry point + MCP server \|
	\| `modules/AudioGallery.py` \| ✅ created \| `AudioGallery(gr.HTML)` — 7-stem audio grid with play and download controls \|
	\| `modules/AudioGallery.pyi` \| ✅ created \| Type stub for AudioGallery \|
	\| `modules/yt_audio_get_tracks.py` \| ✅ moved + updated \| `download_audio()` + `separate_tracks()` with progress callbacks \|
	\| `modules/constants.py` \| exists \| Env vars, shared constants \|
	\| `modules/version_info.py` \| exists \| Footer HTML with versions \|
	\| `modules/file_utils.py` \| exists \| File helper utilities \|
	\| `requirements.txt` \| ✅ updated \| gradio[mcp], python-dotenv, numpy, Pillow, requests added \|
	\| `Dockerfile` \| ✅ updated \| ffmpeg apt, git, proper pip install order \|
	\| `.gitignore` \| ✅ updated \| `.env` entry added \|

	> Removed: Root-level `yt_audio_get_tracks.py` — replaced by
	> `modules/yt_audio_get_tracks.py`.

	---

	## Step 1 — Fix `.gitignore`

	Problem: `.env` contains real credentials (`HF_TOKEN`, `CRYPTO_PK`) and is not
	excluded from git tracking.

	Action: Add `.env` to `.gitignore`.

	```gitignore
	.env
	separated/
	*.webm
	```

	> Warning: Rotate or regenerate the `HF_TOKEN` and `CRYPTO_PK` values in `.env`
	> if they have ever been committed to git or shared publicly.

	---

	## Step 2 — Fix `requirements.txt`

	Current file is missing packages that `modules/` and the planned `app.py` need.

	```txt
	# core audio pipeline
	yt-dlp
	demucs
	pydub
	youtube-transcript-api
	youtube-channel-transcript-api

	# gradio UI + MCP
	gradio[mcp]>=5.0

	# utility deps used by modules/
	python-dotenv
	numpy
	Pillow
	requests
	```

	> `ffmpeg` must be installed at the OS level, not via pip; handle that in
	> `Dockerfile`. `torch` and `torchaudio` are installed separately in Docker.

	---

	## Step 3 — Fix `Dockerfile`

	Current Dockerfile:

	- Missing `apt-get install ffmpeg`.
	- Missing `pip install -r requirements.txt`.
	- Missing demucs, yt-dlp, and pydub installs.

	Updated Dockerfile structure:

	```dockerfile
	FROM python:3.12-slim

	RUN apt-get update && apt-get install -y --no-install-recommends \
	ffmpeg curl unzip git \
	&& curl -fsSL https://deno.land/install.sh \| sh \
	&& cp /root/.deno/bin/deno /usr/local/bin/ \
	&& rm -rf /var/lib/apt/lists/*

	WORKDIR /app
	COPY requirements.txt .

	RUN pip install --no-cache-dir torch torchaudio \
	--index-url https://download.pytorch.org/whl/cpu
	RUN pip install --no-cache-dir gradio[mcp] transformers
	RUN pip install --no-cache-dir -r requirements.txt

	COPY . .

	EXPOSE 7860
	CMD ["python", "app.py"]
	```

	> For HF Spaces GPU, the base image and torch install may be handled by the
	> runtime instead.

	---

	## Step 4 — Create `app.py` ✅ COMPLETE

	Actual implementation (differs from the original skeleton):

	- Imports from `modules.AudioGallery` and `modules.yt_audio_get_tracks`.
	- `SEPARATED_DIR = Path("separated").resolve()` is used in `allowed_paths`.
	- `audio_gallery_head = f"<script>{modules.AudioGallery.GALLERY_JS}</script>"`
	is injected via `demo.launch(head=...)`.
	- `_extract_video_id(video_input)` accepts raw YouTube IDs plus supported
	YouTube URL formats and returns the canonical video ID.
	- `_prepare_uploaded_audio(uploaded_audio)` copies `.wav`/`.mp3` uploads into
	`separated/` and derives a sanitized local `job_id`.
	- Two processing functions:
	- `process_video(video_id)` — simple, MCP-exposed tool.
	- `process_video_with_progress(video_id, uploaded_audio)` — UI handler.
	- UI: `YouTube Video ID or URL` input + `Separate Tracks` button +
	`Audio File Override (.wav or .mp3)` upload → Progress textbox →
	AudioGallery HTML → footer.
	- UI handler uses `progress=gr.Progress(track_tqdm=True)`.
	- If an upload is present, it overrides the YouTube field and skips
	`download_audio()`.
	- Button is wired to `process_video_with_progress` →
	`[audio_output, progress_output]`.
	- `demo.launch(mcp_server=True, head=audio_gallery_head, allowed_paths=[str(SEPARATED_DIR)])`.
	- Audio URLs are built with `modules.file_utils.make_gradio_file_url()` so the
	`/gradio_api/file=` endpoint receives a safe relative path.
	- `gr.set_static_paths(paths=["separated/", ".separated/"])` registers local
	output folders for direct Gradio serving.

	---

	## Step 5 — Implement `AudioGallery` Component ✅ COMPLETE

	Actual implementation — moved to `modules/AudioGallery.py`:

	- `_CSS` — module-level string covering the gallery grid and controls.
	- `GALLERY_JS` — module-level string loaded globally through
	`demo.launch(head=...)`; defines `formatTime()`, `drawWaveform()`,
	`initAudioItem()`, and a `MutationObserver`.
	- `AudioGallery(gr.HTML)`:
	- `DEFAULT_LABELS = ["Drums", "Vocals", "Guitar", "Bass", "Other",
	"Piano", "Music"]`
	- `__init__(audio_urls, *, labels, columns=3, ...)`
	- `_build_html(audio_urls, labels, columns)`
	- `data-initialized="false"` prevents double event binding on Gradio re-renders.
	- `app.py` calls `AudioGallery._build_html(...)` directly.
	- Play buttons use `type="button"` so they do not submit the Gradio form.
	- Each stem card also renders a download link directly below the play button.

	> The time display is client-side and comes from the `<audio>` element runtime
	> playback state, not from the URL string.

	---

	## Step 6 — MCP Server Integration ✅ COMPLETE

	- `demo.launch(mcp_server=True)` exposes `/gradio_api/mcp/sse`.
	- `process_video()` is the MCP-exposed tool.
	- jCodeMunch MCP server is also configured in `.claude/settings.json`.

	---

	## Step 7 — Fix `modules/constants.py` for Local Dev ✅ COMPLETE

	`.env` is present with `HF_TOKEN`, so no code change was needed.

	Note: `constants.py` also imports `numpy` and `python-dotenv`, both of which
	must remain in `requirements.txt`.

	---

	## Step 8 — Local Run Verification

	```bash
	# Prerequisites
	# - Python 3.12
	# - ffmpeg in PATH
	# - .env file with HF_TOKEN set

	pip install -r requirements.txt
	python app.py
	# → Open http://localhost:7860
	# → Enter a YouTube video ID or full URL, or upload a .wav/.mp3 file
	# → Click "Separate Tracks"
	# → Verify 7 stems appear in AudioGallery
	# → Verify each stem includes a working download link below the play button
	# → Verify MCP endpoint at http://localhost:7860/gradio_api/mcp/sse
	```

	---

	## Step 9 — Docker Verification

	```bash
	docker build -t separatetracks .
	docker run -p 7860:7860 --env-file .env separatetracks
	# → Open http://localhost:7860 and verify the same behavior as Step 8
	```

	---

	## Step 10 — HuggingFace Space Deployment

	1. `README.md` already has the correct HF Space header (`sdk: docker`,
	`app_file: app.py`).
	2. Push to the `Surn/SeparateTracks` HF Space repo.
	3. Set Space secrets: `HF_TOKEN`, `CRYPTO_PK`, `HF_REPO_ID`, `SPACE_NAME`.
	4. Space auto-builds from `Dockerfile` on push.

	---

	## Dependency Map

	```text
	app.py
	├── modules/AudioGallery.py
	│ └── gradio (pip)
	├── modules/yt_audio_get_tracks.py
	│ ├── yt-dlp (pip)
	│ ├── pydub (pip) → ffmpeg (apt)
	│ └── demucs (pip) → torch (pip)
	├── modules/constants.py
	│ ├── python-dotenv (pip)
	│ └── numpy (pip)
	├── modules/version_info.py
	│ └── gradio + torch (pip)
	└── modules/file_utils.py
	├── Pillow (pip)
	└── requests (pip)
	```

	---

	## File Checklist

	\| # \| File \| Action \| Done \|
	\| - \| ---- \| ------ \| ---- \|
	\| 1 \| `.gitignore` \| Add `.env` entry \| [x] \|
	\| 2 \| `requirements.txt` \| Add gradio, dotenv, numpy, Pillow, requests \| [x] \|
	\| 3 \| `Dockerfile` \| Add ffmpeg apt, fix pip installs \| [x] \|
	\| 4 \| `app.py` \| Create Gradio app with AudioGallery + MCP \| [x] \|
	\| 5 \| `modules/AudioGallery.py` \| AudioGallery(gr.HTML) component \| [x] \|
	\| 6 \| `modules/AudioGallery.pyi` \| Type stub \| [x] \|
	\| 7 \| `modules/yt_audio_get_tracks.py` \| Moved from root + progress callbacks added \| [x] \|
	\| 8 \| `.claude/settings.json` \| jCodeMunch MCP server config \| [x] \|
	\| 9 \| `modules/constants.py` \| Verify local-safe \| [x] \|
	\| 10 \| Local run \| Step 8 verification \| [ ] \|
	\| 11 \| Docker build \| Step 9 verification \| [ ] \|
	\| 12 \| HF Space deploy \| Step 10 push \| [ ] \|

	---

	## Notes

	- Deno: Required by yt-dlp for some YouTube JS extraction. Docker installs it
	from `deno.land/install.sh`. Locally, download `deno.exe` and add it to PATH.
	- Demucs model: `htdemucs_6s` downloads on first run unless pre-cached.
	- Python style: Black + ruff + isort per agent conventions.
	- AudioGallery JS: Use `{{ }}` for JS template literals inside Python
	f-strings.