# VoiceGate HF Space Deployment Plan ## Goal Deploy VoiceGate to a Hugging Face Space with a Gradio interface, using ZeroGPU for the GPU-heavy inference path. The initial target is a short-audio workflow that proves the full chain: audio input -> source separation -> ASR/SRT -> LLM translation -> VoxCPM TTS -> SRT-aligned audio merge -> audio and subtitle outputs. ## Repository Roles Use three clear ownership boundaries: - `VoiceGate`: upstream project assets, README, diagrams, and source workflows. - `comfyui_voicebridge`: the VoiceBridge ComfyUI custom node repository. - `VoiceGate-hf`: this repository, the Hugging Face Space deployment wrapper. The Space repository should not depend on nested git repositories at runtime. For deployment, copy or vendor only the required workflow files, custom nodes, bootstrap scripts, and Gradio application code into the Space layout. Current local state: - The outer `VoiceGate-hf` repository is connected to the Hugging Face Space remote `build-small-hackathon/VoiceGate`. - `VoiceGate/` is present as a local upstream checkout only. It is ignored by the Space repository and must not be treated as runtime content. - `VoiceGate/.gitmodules` references `comfyui_voicebridge`, but the local `VoiceGate/comfyui_voicebridge/` directory is currently empty. - `VoiceGate/workflows/VoiceGate-Workflow.json` is the UI workflow. - `VoiceGate/workflows/VoiceGate-Workflow_api.json` exists and has been confirmed as valid JSON. It still needs parameterization before Gradio can submit it to ComfyUI. - `workflows/voicegate_api.json` is the deployment copy of the API workflow. - `workflows/voicegate_ui.json` is the deployment reference copy of the UI workflow. ## Repository Hygiene The Space repository should stay small and deterministic: - Keep `VoiceGate/` as a local-only upstream checkout. - Copy deployment-ready workflow files into `workflows/`. - Copy or install custom nodes through an explicit bootstrap step. - Do not commit nested `.git` directories, model weights, API keys, uploaded media, generated audio, generated subtitles, or ComfyUI runtime caches. - Keep `.gitattributes` LFS rules for future model or binary assets, but prefer downloading model files at runtime instead of committing them. ## Hugging Face Space Constraints ZeroGPU Spaces are intended for Gradio SDK Spaces. The Gradio app should expose a normal `app.py`, and GPU-heavy functions should be wrapped with `@spaces.GPU`. This means the first implementation should prefer: - Gradio Space root files: `README.md`, `app.py`, `requirements.txt`, `packages.txt`. - A Python bootstrap that installs or prepares ComfyUI and custom nodes. - A workflow client that calls the local ComfyUI API from inside the Gradio handler. Avoid starting with a Docker Space for ZeroGPU, even though Docker would be a cleaner fit for a long-running ComfyUI service. ## Proposed Space Layout ```text VoiceGate-hf/ |-- README.md |-- app.py |-- requirements.txt |-- packages.txt |-- scripts/ | |-- bootstrap_comfy.py | |-- run_comfy.py | `-- workflow_client.py |-- workflows/ | |-- voicegate_api.json | `-- voicegate_ui.json |-- custom_nodes/ | `-- comfyui_voicebridge/ |-- assets/ `-- docs/ `-- deployment-plan.md ``` The current repository has the root scaffold, planning docs, and deployment workflow copies. Later steps should add bootstrap scripts and either copy deployment-ready custom nodes into `custom_nodes/` or install pinned node repositories during Space startup. ## Known Workflow Nodes The API workflow references these important node classes: - `LoadAudio` - `MelBandRoFormerModelLoader` - `MelBandRoFormerSampler` - `VoiceBridgeASRLoader` - `VoiceBridgeASRTranscribe` - `GenerateSRT` - `RH_LLMAPI_NODE` - `VoiceBridgeSRTSplitter` - `RunningHub_VoxCPM_LoadModel` - `RunningHub_VoxCPM_Generate` - `VoiceBridgeAudioListMergerBySRT` - `MergeAudioMW` - `SaveAudioMP3` - `SaveSRTFromString` - `TrimAudioDuration` - `Any Switch (rgthree)` - `easy showAnything` - `easy string` - `CR Text` - `ReplaceText` This implies dependencies on VoiceBridge, VoxCPM/RunningHub nodes, MelBandRoFormer nodes/models, rgthree, easy-use, and the LLM API node package. ## Model and Secret Inventory Expected model assets: - `Qwen/Qwen3-ASR-1.7B` - `Qwen/Qwen3-ForcedAligner-0.6B` - `VoxCPM2` - `MelBandRoFormer_comfy/MelBandRoformer_fp32.safetensors` Expected Space secrets: - `HF_TOKEN`, if private or gated model downloads are needed. - `DEEPSEEK_API_KEY` or another LLM provider key. - Optional LLM base URL and model name configuration. Do not commit model weights, API keys, generated audio, or generated subtitles. ## Implementation Phases ### Phase 1: Scaffold and Repository Hygiene Done: - Add HF Space root files. - Add minimal Gradio placeholder. - Add deployment plan. - Add ignore rules for runtime and generated artifacts. - Add a TODO checklist. - Copy the API workflow to `workflows/voicegate_api.json`. - Copy the UI workflow to `workflows/voicegate_ui.json`. - Confirm the API workflow is valid JSON. - Confirm the workflow files do not contain real API keys. ### Phase 2: Dependency Inventory Done: - Identify the ComfyUI and custom node repositories needed by the API workflow. - Pin the current candidate commits in `docs/dependency-inventory.md`. - Identify initial Python, system package, model, and secret requirements. - Decide to install custom nodes from pinned git URLs during bootstrap instead of vendoring them into this Space repo. ### Phase 3: Runtime Bootstrap Create scripts that can: - Clone or install ComfyUI. - Install Python dependencies. - Install required custom nodes at pinned commits. - Download or locate required model files. - Start ComfyUI locally inside the Space process. Current script status: - `scripts/bootstrap_comfy.py` clones ComfyUI and all pinned custom node repositories, installs their requirements, prepares model directories, and can optionally download the VoxCPM2 and MelBand RoFormer assets. - `scripts/run_comfy.py` starts ComfyUI and waits for `/system_stats`. - `scripts/workflow_client.py` uploads audio, patches the VoiceGate API workflow, submits it through `/prompt`, and waits on `/history/{prompt_id}`. Remaining runtime bootstrap work: - Wire bootstrap/startup behavior into `app.py`. - Validate the bootstrap and ComfyUI startup in the actual Space container. - Confirm the upload endpoint used by `LoadAudio` accepts the audio files we send from Gradio. ### Phase 4: Workflow Parameterization Parameterize `workflows/voicegate_api.json` before submitting it to ComfyUI. Required edits: - Patch hard-coded audio filenames with Gradio-uploaded input files. - Patch API keys from environment variables. - Patch target language, LLM model, and provider base URL. - Ensure output nodes produce deterministic job-specific file paths. These are implemented in `scripts/workflow_client.py`, but still need to be connected to the Gradio UI and verified against a running ComfyUI process. ### Phase 5: Gradio Integration Build the first real interface: - Input audio file. - Target language selector/text input. - Source language, default `auto`. - Optional prompt override. - Output audio. - Output translated/adjusted SRT. - Runtime log. Wrap the end-to-end function with `@spaces.GPU(duration=...)` and start with a short maximum input duration. ### Phase 6: Verification Verify in this order: 1. ComfyUI starts and exposes its local API. 2. TTS-only minimal workflow runs. 3. ASR-only short audio workflow runs. 4. SRT splitter + VoxCPM + merger runs. 5. Full VoiceGate short-audio workflow runs. 6. Video input support is added after the audio path is stable. ## Immediate Next Step Continue Phase 3 by wiring bootstrap/startup behavior into `app.py`, then test the scripts inside the running Hugging Face Space container.