Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.19.0
VoiceGate HF Space Deployment Plan
Goal
Deploy VoiceGate to a Hugging Face Space with a Gradio interface, using ZeroGPU for the GPU-heavy inference path.
The initial target is a short-audio workflow that proves the full chain: audio input -> source separation -> ASR/SRT -> LLM translation -> VoxCPM TTS -> SRT-aligned audio merge -> audio and subtitle outputs.
Repository Roles
Use three clear ownership boundaries:
VoiceGate: upstream project assets, README, diagrams, and source workflows.comfyui_voicebridge: the VoiceBridge ComfyUI custom node repository.VoiceGate-hf: this repository, the Hugging Face Space deployment wrapper.
The Space repository should not depend on nested git repositories at runtime. For deployment, copy or vendor only the required workflow files, custom nodes, bootstrap scripts, and Gradio application code into the Space layout.
Current local state:
- The outer
VoiceGate-hfrepository is connected to the Hugging Face Space remotebuild-small-hackathon/VoiceGate. VoiceGate/is present as a local upstream checkout only. It is ignored by the Space repository and must not be treated as runtime content.VoiceGate/.gitmodulesreferencescomfyui_voicebridge, but the localVoiceGate/comfyui_voicebridge/directory is currently empty.VoiceGate/workflows/VoiceGate-Workflow.jsonis the UI workflow.VoiceGate/workflows/VoiceGate-Workflow_api.jsonexists and has been confirmed as valid JSON. It still needs parameterization before Gradio can submit it to ComfyUI.workflows/voicegate_api.jsonis the deployment copy of the API workflow.workflows/voicegate_ui.jsonis the deployment reference copy of the UI workflow.
Repository Hygiene
The Space repository should stay small and deterministic:
- Keep
VoiceGate/as a local-only upstream checkout. - Copy deployment-ready workflow files into
workflows/. - Copy or install custom nodes through an explicit bootstrap step.
- Do not commit nested
.gitdirectories, model weights, API keys, uploaded media, generated audio, generated subtitles, or ComfyUI runtime caches. - Keep
.gitattributesLFS rules for future model or binary assets, but prefer downloading model files at runtime instead of committing them.
Hugging Face Space Constraints
ZeroGPU Spaces are intended for Gradio SDK Spaces. The Gradio app should expose
a normal app.py, and GPU-heavy functions should be wrapped with @spaces.GPU.
This means the first implementation should prefer:
- Gradio Space root files:
README.md,app.py,requirements.txt,packages.txt. - A Python bootstrap that installs or prepares ComfyUI and custom nodes.
- A workflow client that calls the local ComfyUI API from inside the Gradio handler.
Avoid starting with a Docker Space for ZeroGPU, even though Docker would be a cleaner fit for a long-running ComfyUI service.
Proposed Space Layout
VoiceGate-hf/
|-- README.md
|-- app.py
|-- requirements.txt
|-- packages.txt
|-- scripts/
| |-- bootstrap_comfy.py
| |-- run_comfy.py
| `-- workflow_client.py
|-- workflows/
| |-- voicegate_api.json
| `-- voicegate_ui.json
|-- custom_nodes/
| `-- comfyui_voicebridge/
|-- assets/
`-- docs/
`-- deployment-plan.md
The current repository has the root scaffold, planning docs, and deployment
workflow copies. Later steps should add bootstrap scripts and either copy
deployment-ready custom nodes into custom_nodes/ or install pinned node
repositories during Space startup.
Known Workflow Nodes
The API workflow references these important node classes:
LoadAudioMelBandRoFormerModelLoaderMelBandRoFormerSamplerVoiceBridgeASRLoaderVoiceBridgeASRTranscribeGenerateSRTRH_LLMAPI_NODEVoiceBridgeSRTSplitterRunningHub_VoxCPM_LoadModelRunningHub_VoxCPM_GenerateVoiceBridgeAudioListMergerBySRTMergeAudioMWSaveAudioMP3SaveSRTFromStringTrimAudioDurationAny Switch (rgthree)easy showAnythingeasy stringCR TextReplaceText
This implies dependencies on VoiceBridge, VoxCPM/RunningHub nodes, MelBandRoFormer nodes/models, rgthree, easy-use, and the LLM API node package.
Model and Secret Inventory
Expected model assets:
Qwen/Qwen3-ASR-1.7BQwen/Qwen3-ForcedAligner-0.6BVoxCPM2MelBandRoFormer_comfy/MelBandRoformer_fp32.safetensors
Expected Space secrets:
HF_TOKEN, if private or gated model downloads are needed.DEEPSEEK_API_KEYor another LLM provider key.- Optional LLM base URL and model name configuration.
Do not commit model weights, API keys, generated audio, or generated subtitles.
Implementation Phases
Phase 1: Scaffold and Repository Hygiene
Done:
- Add HF Space root files.
- Add minimal Gradio placeholder.
- Add deployment plan.
- Add ignore rules for runtime and generated artifacts.
- Add a TODO checklist.
- Copy the API workflow to
workflows/voicegate_api.json. - Copy the UI workflow to
workflows/voicegate_ui.json. - Confirm the API workflow is valid JSON.
- Confirm the workflow files do not contain real API keys.
Phase 2: Dependency Inventory
Done:
- Identify the ComfyUI and custom node repositories needed by the API workflow.
- Pin the current candidate commits in
docs/dependency-inventory.md. - Identify initial Python, system package, model, and secret requirements.
- Decide to install custom nodes from pinned git URLs during bootstrap instead of vendoring them into this Space repo.
Phase 3: Runtime Bootstrap
Create scripts that can:
- Clone or install ComfyUI.
- Install Python dependencies.
- Install required custom nodes at pinned commits.
- Download or locate required model files.
- Start ComfyUI locally inside the Space process.
Current script status:
scripts/bootstrap_comfy.pyclones ComfyUI and all pinned custom node repositories, installs their requirements, prepares model directories, and can optionally download the VoxCPM2 and MelBand RoFormer assets.scripts/run_comfy.pystarts ComfyUI and waits for/system_stats.scripts/workflow_client.pyuploads audio, patches the VoiceGate API workflow, submits it through/prompt, and waits on/history/{prompt_id}.
Remaining runtime bootstrap work:
- Wire bootstrap/startup behavior into
app.py. - Validate the bootstrap and ComfyUI startup in the actual Space container.
- Confirm the upload endpoint used by
LoadAudioaccepts the audio files we send from Gradio.
Phase 4: Workflow Parameterization
Parameterize workflows/voicegate_api.json before submitting it to ComfyUI.
Required edits:
- Patch hard-coded audio filenames with Gradio-uploaded input files.
- Patch API keys from environment variables.
- Patch target language, LLM model, and provider base URL.
- Ensure output nodes produce deterministic job-specific file paths.
These are implemented in scripts/workflow_client.py, but still need to be
connected to the Gradio UI and verified against a running ComfyUI process.
Phase 5: Gradio Integration
Build the first real interface:
- Input audio file.
- Target language selector/text input.
- Source language, default
auto. - Optional prompt override.
- Output audio.
- Output translated/adjusted SRT.
- Runtime log.
Wrap the end-to-end function with @spaces.GPU(duration=...) and start with a
short maximum input duration.
Phase 6: Verification
Verify in this order:
- ComfyUI starts and exposes its local API.
- TTS-only minimal workflow runs.
- ASR-only short audio workflow runs.
- SRT splitter + VoxCPM + merger runs.
- Full VoiceGate short-audio workflow runs.
- Video input support is added after the audio path is stable.
Immediate Next Step
Continue Phase 3 by wiring bootstrap/startup behavior into app.py, then test
the scripts inside the running Hugging Face Space container.