VoiceGate / docs /deployment-plan.md
YanTianlong's picture
Add ComfyUI runtime bootstrap scripts
683b147
|
Raw
History Blame Contribute Delete
7.89 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

VoiceGate HF Space Deployment Plan

Goal

Deploy VoiceGate to a Hugging Face Space with a Gradio interface, using ZeroGPU for the GPU-heavy inference path.

The initial target is a short-audio workflow that proves the full chain: audio input -> source separation -> ASR/SRT -> LLM translation -> VoxCPM TTS -> SRT-aligned audio merge -> audio and subtitle outputs.

Repository Roles

Use three clear ownership boundaries:

  • VoiceGate: upstream project assets, README, diagrams, and source workflows.
  • comfyui_voicebridge: the VoiceBridge ComfyUI custom node repository.
  • VoiceGate-hf: this repository, the Hugging Face Space deployment wrapper.

The Space repository should not depend on nested git repositories at runtime. For deployment, copy or vendor only the required workflow files, custom nodes, bootstrap scripts, and Gradio application code into the Space layout.

Current local state:

  • The outer VoiceGate-hf repository is connected to the Hugging Face Space remote build-small-hackathon/VoiceGate.
  • VoiceGate/ is present as a local upstream checkout only. It is ignored by the Space repository and must not be treated as runtime content.
  • VoiceGate/.gitmodules references comfyui_voicebridge, but the local VoiceGate/comfyui_voicebridge/ directory is currently empty.
  • VoiceGate/workflows/VoiceGate-Workflow.json is the UI workflow.
  • VoiceGate/workflows/VoiceGate-Workflow_api.json exists and has been confirmed as valid JSON. It still needs parameterization before Gradio can submit it to ComfyUI.
  • workflows/voicegate_api.json is the deployment copy of the API workflow.
  • workflows/voicegate_ui.json is the deployment reference copy of the UI workflow.

Repository Hygiene

The Space repository should stay small and deterministic:

  • Keep VoiceGate/ as a local-only upstream checkout.
  • Copy deployment-ready workflow files into workflows/.
  • Copy or install custom nodes through an explicit bootstrap step.
  • Do not commit nested .git directories, model weights, API keys, uploaded media, generated audio, generated subtitles, or ComfyUI runtime caches.
  • Keep .gitattributes LFS rules for future model or binary assets, but prefer downloading model files at runtime instead of committing them.

Hugging Face Space Constraints

ZeroGPU Spaces are intended for Gradio SDK Spaces. The Gradio app should expose a normal app.py, and GPU-heavy functions should be wrapped with @spaces.GPU.

This means the first implementation should prefer:

  • Gradio Space root files: README.md, app.py, requirements.txt, packages.txt.
  • A Python bootstrap that installs or prepares ComfyUI and custom nodes.
  • A workflow client that calls the local ComfyUI API from inside the Gradio handler.

Avoid starting with a Docker Space for ZeroGPU, even though Docker would be a cleaner fit for a long-running ComfyUI service.

Proposed Space Layout

VoiceGate-hf/
|-- README.md
|-- app.py
|-- requirements.txt
|-- packages.txt
|-- scripts/
|   |-- bootstrap_comfy.py
|   |-- run_comfy.py
|   `-- workflow_client.py
|-- workflows/
|   |-- voicegate_api.json
|   `-- voicegate_ui.json
|-- custom_nodes/
|   `-- comfyui_voicebridge/
|-- assets/
`-- docs/
    `-- deployment-plan.md

The current repository has the root scaffold, planning docs, and deployment workflow copies. Later steps should add bootstrap scripts and either copy deployment-ready custom nodes into custom_nodes/ or install pinned node repositories during Space startup.

Known Workflow Nodes

The API workflow references these important node classes:

  • LoadAudio
  • MelBandRoFormerModelLoader
  • MelBandRoFormerSampler
  • VoiceBridgeASRLoader
  • VoiceBridgeASRTranscribe
  • GenerateSRT
  • RH_LLMAPI_NODE
  • VoiceBridgeSRTSplitter
  • RunningHub_VoxCPM_LoadModel
  • RunningHub_VoxCPM_Generate
  • VoiceBridgeAudioListMergerBySRT
  • MergeAudioMW
  • SaveAudioMP3
  • SaveSRTFromString
  • TrimAudioDuration
  • Any Switch (rgthree)
  • easy showAnything
  • easy string
  • CR Text
  • ReplaceText

This implies dependencies on VoiceBridge, VoxCPM/RunningHub nodes, MelBandRoFormer nodes/models, rgthree, easy-use, and the LLM API node package.

Model and Secret Inventory

Expected model assets:

  • Qwen/Qwen3-ASR-1.7B
  • Qwen/Qwen3-ForcedAligner-0.6B
  • VoxCPM2
  • MelBandRoFormer_comfy/MelBandRoformer_fp32.safetensors

Expected Space secrets:

  • HF_TOKEN, if private or gated model downloads are needed.
  • DEEPSEEK_API_KEY or another LLM provider key.
  • Optional LLM base URL and model name configuration.

Do not commit model weights, API keys, generated audio, or generated subtitles.

Implementation Phases

Phase 1: Scaffold and Repository Hygiene

Done:

  • Add HF Space root files.
  • Add minimal Gradio placeholder.
  • Add deployment plan.
  • Add ignore rules for runtime and generated artifacts.
  • Add a TODO checklist.
  • Copy the API workflow to workflows/voicegate_api.json.
  • Copy the UI workflow to workflows/voicegate_ui.json.
  • Confirm the API workflow is valid JSON.
  • Confirm the workflow files do not contain real API keys.

Phase 2: Dependency Inventory

Done:

  • Identify the ComfyUI and custom node repositories needed by the API workflow.
  • Pin the current candidate commits in docs/dependency-inventory.md.
  • Identify initial Python, system package, model, and secret requirements.
  • Decide to install custom nodes from pinned git URLs during bootstrap instead of vendoring them into this Space repo.

Phase 3: Runtime Bootstrap

Create scripts that can:

  • Clone or install ComfyUI.
  • Install Python dependencies.
  • Install required custom nodes at pinned commits.
  • Download or locate required model files.
  • Start ComfyUI locally inside the Space process.

Current script status:

  • scripts/bootstrap_comfy.py clones ComfyUI and all pinned custom node repositories, installs their requirements, prepares model directories, and can optionally download the VoxCPM2 and MelBand RoFormer assets.
  • scripts/run_comfy.py starts ComfyUI and waits for /system_stats.
  • scripts/workflow_client.py uploads audio, patches the VoiceGate API workflow, submits it through /prompt, and waits on /history/{prompt_id}.

Remaining runtime bootstrap work:

  • Wire bootstrap/startup behavior into app.py.
  • Validate the bootstrap and ComfyUI startup in the actual Space container.
  • Confirm the upload endpoint used by LoadAudio accepts the audio files we send from Gradio.

Phase 4: Workflow Parameterization

Parameterize workflows/voicegate_api.json before submitting it to ComfyUI.

Required edits:

  • Patch hard-coded audio filenames with Gradio-uploaded input files.
  • Patch API keys from environment variables.
  • Patch target language, LLM model, and provider base URL.
  • Ensure output nodes produce deterministic job-specific file paths.

These are implemented in scripts/workflow_client.py, but still need to be connected to the Gradio UI and verified against a running ComfyUI process.

Phase 5: Gradio Integration

Build the first real interface:

  • Input audio file.
  • Target language selector/text input.
  • Source language, default auto.
  • Optional prompt override.
  • Output audio.
  • Output translated/adjusted SRT.
  • Runtime log.

Wrap the end-to-end function with @spaces.GPU(duration=...) and start with a short maximum input duration.

Phase 6: Verification

Verify in this order:

  1. ComfyUI starts and exposes its local API.
  2. TTS-only minimal workflow runs.
  3. ASR-only short audio workflow runs.
  4. SRT splitter + VoxCPM + merger runs.
  5. Full VoiceGate short-audio workflow runs.
  6. Video input support is added after the audio path is stable.

Immediate Next Step

Continue Phase 3 by wiring bootstrap/startup behavior into app.py, then test the scripts inside the running Hugging Face Space container.