Spaces:
Sleeping
Sleeping
| # SeparateTracks β Build Plan | |
| ## Goal | |
| Produce a running Gradio application (`app.py`) that accepts a YouTube ID, | |
| YouTube URL, or uploaded `.wav`/`.mp3` audio, separates it into instrument | |
| stems via Demucs, displays results in an `AudioGallery` UI, and exposes an MCP | |
| server endpoint β deployable locally and as a HuggingFace Docker Space | |
| (`Surn/SeparateTracks`). | |
| --- | |
| ## Project Map | |
| | File | Status | Purpose | | |
| | ---- | ------ | ------- | | |
| | `app.py` | β created | Gradio UI entry point + MCP server | | |
| | `modules/AudioGallery.py` | β created | `AudioGallery(gr.HTML)` β 7-stem audio grid with play and download controls | | |
| | `modules/AudioGallery.pyi` | β created | Type stub for AudioGallery | | |
| | `modules/yt_audio_get_tracks.py` | β moved + updated | `download_audio()` + `separate_tracks()` with progress callbacks | | |
| | `modules/constants.py` | exists | Env vars, shared constants | | |
| | `modules/version_info.py` | exists | Footer HTML with versions | | |
| | `modules/file_utils.py` | exists | File helper utilities | | |
| | `requirements.txt` | β updated | gradio[mcp], python-dotenv, numpy, Pillow, requests added | | |
| | `Dockerfile` | β updated | ffmpeg apt, git, proper pip install order | | |
| | `.gitignore` | β updated | `.env` entry added | | |
| > **Removed:** Root-level `yt_audio_get_tracks.py` β replaced by | |
| > `modules/yt_audio_get_tracks.py`. | |
| --- | |
| ## Step 1 β Fix `.gitignore` | |
| **Problem:** `.env` contains real credentials (`HF_TOKEN`, `CRYPTO_PK`) and is not | |
| excluded from git tracking. | |
| **Action:** Add `.env` to `.gitignore`. | |
| ```gitignore | |
| .env | |
| separated/ | |
| *.webm | |
| ``` | |
| > **Warning:** Rotate or regenerate the `HF_TOKEN` and `CRYPTO_PK` values in `.env` | |
| > if they have ever been committed to git or shared publicly. | |
| --- | |
| ## Step 2 β Fix `requirements.txt` | |
| Current file is missing packages that `modules/` and the planned `app.py` need. | |
| ```txt | |
| # core audio pipeline | |
| yt-dlp | |
| demucs | |
| pydub | |
| youtube-transcript-api | |
| youtube-channel-transcript-api | |
| # gradio UI + MCP | |
| gradio[mcp]>=5.0 | |
| # utility deps used by modules/ | |
| python-dotenv | |
| numpy | |
| Pillow | |
| requests | |
| ``` | |
| > `ffmpeg` must be installed at the OS level, not via pip; handle that in | |
| > `Dockerfile`. `torch` and `torchaudio` are installed separately in Docker. | |
| --- | |
| ## Step 3 β Fix `Dockerfile` | |
| Current Dockerfile: | |
| - Missing `apt-get install ffmpeg`. | |
| - Missing `pip install -r requirements.txt`. | |
| - Missing demucs, yt-dlp, and pydub installs. | |
| Updated Dockerfile structure: | |
| ```dockerfile | |
| FROM python:3.12-slim | |
| RUN apt-get update && apt-get install -y --no-install-recommends \ | |
| ffmpeg curl unzip git \ | |
| && curl -fsSL https://deno.land/install.sh | sh \ | |
| && cp /root/.deno/bin/deno /usr/local/bin/ \ | |
| && rm -rf /var/lib/apt/lists/* | |
| WORKDIR /app | |
| COPY requirements.txt . | |
| RUN pip install --no-cache-dir torch torchaudio \ | |
| --index-url https://download.pytorch.org/whl/cpu | |
| RUN pip install --no-cache-dir gradio[mcp] transformers | |
| RUN pip install --no-cache-dir -r requirements.txt | |
| COPY . . | |
| EXPOSE 7860 | |
| CMD ["python", "app.py"] | |
| ``` | |
| > For HF Spaces GPU, the base image and torch install may be handled by the | |
| > runtime instead. | |
| --- | |
| ## Step 4 β Create `app.py` β COMPLETE | |
| **Actual implementation** (differs from the original skeleton): | |
| - Imports from `modules.AudioGallery` and `modules.yt_audio_get_tracks`. | |
| - `SEPARATED_DIR = Path("separated").resolve()` is used in `allowed_paths`. | |
| - `audio_gallery_head = f"<script>{modules.AudioGallery.GALLERY_JS}</script>"` | |
| is injected via `demo.launch(head=...)`. | |
| - `_extract_video_id(video_input)` accepts raw YouTube IDs plus supported | |
| YouTube URL formats and returns the canonical video ID. | |
| - `_prepare_uploaded_audio(uploaded_audio)` copies `.wav`/`.mp3` uploads into | |
| `separated/` and derives a sanitized local `job_id`. | |
| - Two processing functions: | |
| - `process_video(video_id)` β simple, MCP-exposed tool. | |
| - `process_video_with_progress(video_id, uploaded_audio)` β UI handler. | |
| - UI: `YouTube Video ID or URL` input + `Separate Tracks` button + | |
| `Audio File Override (.wav or .mp3)` upload β Progress textbox β | |
| AudioGallery HTML β footer. | |
| - UI handler uses `progress=gr.Progress(track_tqdm=True)`. | |
| - If an upload is present, it overrides the YouTube field and skips | |
| `download_audio()`. | |
| - Button is wired to `process_video_with_progress` β | |
| `[audio_output, progress_output]`. | |
| - `demo.launch(mcp_server=True, head=audio_gallery_head, allowed_paths=[str(SEPARATED_DIR)])`. | |
| - Audio URLs are built with `modules.file_utils.make_gradio_file_url()` so the | |
| `/gradio_api/file=` endpoint receives a safe relative path. | |
| - `gr.set_static_paths(paths=["separated/", ".separated/"])` registers local | |
| output folders for direct Gradio serving. | |
| --- | |
| ## Step 5 β Implement `AudioGallery` Component β COMPLETE | |
| **Actual implementation** β moved to `modules/AudioGallery.py`: | |
| - `_CSS` β module-level string covering the gallery grid and controls. | |
| - `GALLERY_JS` β module-level string loaded globally through | |
| `demo.launch(head=...)`; defines `formatTime()`, `drawWaveform()`, | |
| `initAudioItem()`, and a `MutationObserver`. | |
| - `AudioGallery(gr.HTML)`: | |
| - `DEFAULT_LABELS = ["Drums", "Vocals", "Guitar", "Bass", "Other", | |
| "Piano", "Music"]` | |
| - `__init__(audio_urls, *, labels, columns=3, ...)` | |
| - `_build_html(audio_urls, labels, columns)` | |
| - `data-initialized="false"` prevents double event binding on Gradio re-renders. | |
| - `app.py` calls `AudioGallery._build_html(...)` directly. | |
| - Play buttons use `type="button"` so they do not submit the Gradio form. | |
| - Each stem card also renders a download link directly below the play button. | |
| > The time display is client-side and comes from the `<audio>` element runtime | |
| > playback state, not from the URL string. | |
| --- | |
| ## Step 6 β MCP Server Integration β COMPLETE | |
| - `demo.launch(mcp_server=True)` exposes `/gradio_api/mcp/sse`. | |
| - `process_video()` is the MCP-exposed tool. | |
| - jCodeMunch MCP server is also configured in `.claude/settings.json`. | |
| --- | |
| ## Step 7 β Fix `modules/constants.py` for Local Dev β COMPLETE | |
| `.env` is present with `HF_TOKEN`, so no code change was needed. | |
| **Note:** `constants.py` also imports `numpy` and `python-dotenv`, both of which | |
| must remain in `requirements.txt`. | |
| --- | |
| ## Step 8 β Local Run Verification | |
| ```bash | |
| # Prerequisites | |
| # - Python 3.12 | |
| # - ffmpeg in PATH | |
| # - .env file with HF_TOKEN set | |
| pip install -r requirements.txt | |
| python app.py | |
| # β Open http://localhost:7860 | |
| # β Enter a YouTube video ID or full URL, or upload a .wav/.mp3 file | |
| # β Click "Separate Tracks" | |
| # β Verify 7 stems appear in AudioGallery | |
| # β Verify each stem includes a working download link below the play button | |
| # β Verify MCP endpoint at http://localhost:7860/gradio_api/mcp/sse | |
| ``` | |
| --- | |
| ## Step 9 β Docker Verification | |
| ```bash | |
| docker build -t separatetracks . | |
| docker run -p 7860:7860 --env-file .env separatetracks | |
| # β Open http://localhost:7860 and verify the same behavior as Step 8 | |
| ``` | |
| --- | |
| ## Step 10 β HuggingFace Space Deployment | |
| 1. `README.md` already has the correct HF Space header (`sdk: docker`, | |
| `app_file: app.py`). | |
| 2. Push to the `Surn/SeparateTracks` HF Space repo. | |
| 3. Set Space secrets: `HF_TOKEN`, `CRYPTO_PK`, `HF_REPO_ID`, `SPACE_NAME`. | |
| 4. Space auto-builds from `Dockerfile` on push. | |
| --- | |
| ## Dependency Map | |
| ```text | |
| app.py | |
| βββ modules/AudioGallery.py | |
| β βββ gradio (pip) | |
| βββ modules/yt_audio_get_tracks.py | |
| β βββ yt-dlp (pip) | |
| β βββ pydub (pip) β ffmpeg (apt) | |
| β βββ demucs (pip) β torch (pip) | |
| βββ modules/constants.py | |
| β βββ python-dotenv (pip) | |
| β βββ numpy (pip) | |
| βββ modules/version_info.py | |
| β βββ gradio + torch (pip) | |
| βββ modules/file_utils.py | |
| βββ Pillow (pip) | |
| βββ requests (pip) | |
| ``` | |
| --- | |
| ## File Checklist | |
| | # | File | Action | Done | | |
| | - | ---- | ------ | ---- | | |
| | 1 | `.gitignore` | Add `.env` entry | [x] | | |
| | 2 | `requirements.txt` | Add gradio, dotenv, numpy, Pillow, requests | [x] | | |
| | 3 | `Dockerfile` | Add ffmpeg apt, fix pip installs | [x] | | |
| | 4 | `app.py` | Create Gradio app with AudioGallery + MCP | [x] | | |
| | 5 | `modules/AudioGallery.py` | AudioGallery(gr.HTML) component | [x] | | |
| | 6 | `modules/AudioGallery.pyi` | Type stub | [x] | | |
| | 7 | `modules/yt_audio_get_tracks.py` | Moved from root + progress callbacks added | [x] | | |
| | 8 | `.claude/settings.json` | jCodeMunch MCP server config | [x] | | |
| | 9 | `modules/constants.py` | Verify local-safe | [x] | | |
| | 10 | Local run | Step 8 verification | [ ] | | |
| | 11 | Docker build | Step 9 verification | [ ] | | |
| | 12 | HF Space deploy | Step 10 push | [ ] | | |
| --- | |
| ## Notes | |
| - **Deno:** Required by yt-dlp for some YouTube JS extraction. Docker installs it | |
| from `deno.land/install.sh`. Locally, download `deno.exe` and add it to PATH. | |
| - **Demucs model:** `htdemucs_6s` downloads on first run unless pre-cached. | |
| - **Python style:** Black + ruff + isort per agent conventions. | |
| - **AudioGallery JS:** Use `{{ }}` for JS template literals inside Python | |
| f-strings. | |