Spaces:

build-small-hackathon
/

VoiceGate

Running on Zero

App Files Files Community

VoiceGate / docs /deployment-plan.md

YanTianlong

Add ComfyUI runtime bootstrap scripts

683b147 25 days ago

preview code

Raw

History Blame Contribute Delete

7.89 kB

	# VoiceGate HF Space Deployment Plan

	## Goal

	Deploy VoiceGate to a Hugging Face Space with a Gradio interface, using ZeroGPU
	for the GPU-heavy inference path.

	The initial target is a short-audio workflow that proves the full chain:
	audio input -> source separation -> ASR/SRT -> LLM translation -> VoxCPM TTS ->
	SRT-aligned audio merge -> audio and subtitle outputs.

	## Repository Roles

	Use three clear ownership boundaries:

	- `VoiceGate`: upstream project assets, README, diagrams, and source workflows.
	- `comfyui_voicebridge`: the VoiceBridge ComfyUI custom node repository.
	- `VoiceGate-hf`: this repository, the Hugging Face Space deployment wrapper.

	The Space repository should not depend on nested git repositories at runtime.
	For deployment, copy or vendor only the required workflow files, custom nodes,
	bootstrap scripts, and Gradio application code into the Space layout.

	Current local state:

	- The outer `VoiceGate-hf` repository is connected to the Hugging Face Space
	remote `build-small-hackathon/VoiceGate`.
	- `VoiceGate/` is present as a local upstream checkout only. It is ignored by
	the Space repository and must not be treated as runtime content.
	- `VoiceGate/.gitmodules` references `comfyui_voicebridge`, but the local
	`VoiceGate/comfyui_voicebridge/` directory is currently empty.
	- `VoiceGate/workflows/VoiceGate-Workflow.json` is the UI workflow.
	- `VoiceGate/workflows/VoiceGate-Workflow_api.json` exists and has been
	confirmed as valid JSON. It still needs parameterization before Gradio can
	submit it to ComfyUI.
	- `workflows/voicegate_api.json` is the deployment copy of the API workflow.
	- `workflows/voicegate_ui.json` is the deployment reference copy of the UI
	workflow.

	## Repository Hygiene

	The Space repository should stay small and deterministic:

	- Keep `VoiceGate/` as a local-only upstream checkout.
	- Copy deployment-ready workflow files into `workflows/`.
	- Copy or install custom nodes through an explicit bootstrap step.
	- Do not commit nested `.git` directories, model weights, API keys, uploaded
	media, generated audio, generated subtitles, or ComfyUI runtime caches.
	- Keep `.gitattributes` LFS rules for future model or binary assets, but prefer
	downloading model files at runtime instead of committing them.

	## Hugging Face Space Constraints

	ZeroGPU Spaces are intended for Gradio SDK Spaces. The Gradio app should expose
	a normal `app.py`, and GPU-heavy functions should be wrapped with `@spaces.GPU`.

	This means the first implementation should prefer:

	- Gradio Space root files: `README.md`, `app.py`, `requirements.txt`,
	`packages.txt`.
	- A Python bootstrap that installs or prepares ComfyUI and custom nodes.
	- A workflow client that calls the local ComfyUI API from inside the Gradio
	handler.

	Avoid starting with a Docker Space for ZeroGPU, even though Docker would be a
	cleaner fit for a long-running ComfyUI service.

	## Proposed Space Layout

	```text
	VoiceGate-hf/
	\|-- README.md
	\|-- app.py
	\|-- requirements.txt
	\|-- packages.txt
	\|-- scripts/
	\| \|-- bootstrap_comfy.py
	\| \|-- run_comfy.py
	\| `-- workflow_client.py
	\|-- workflows/
	\| \|-- voicegate_api.json
	\| `-- voicegate_ui.json
	\|-- custom_nodes/
	\| `-- comfyui_voicebridge/
	\|-- assets/
	`-- docs/
	`-- deployment-plan.md
	```

	The current repository has the root scaffold, planning docs, and deployment
	workflow copies. Later steps should add bootstrap scripts and either copy
	deployment-ready custom nodes into `custom_nodes/` or install pinned node
	repositories during Space startup.

	## Known Workflow Nodes

	The API workflow references these important node classes:

	- `LoadAudio`
	- `MelBandRoFormerModelLoader`
	- `MelBandRoFormerSampler`
	- `VoiceBridgeASRLoader`
	- `VoiceBridgeASRTranscribe`
	- `GenerateSRT`
	- `RH_LLMAPI_NODE`
	- `VoiceBridgeSRTSplitter`
	- `RunningHub_VoxCPM_LoadModel`
	- `RunningHub_VoxCPM_Generate`
	- `VoiceBridgeAudioListMergerBySRT`
	- `MergeAudioMW`
	- `SaveAudioMP3`
	- `SaveSRTFromString`
	- `TrimAudioDuration`
	- `Any Switch (rgthree)`
	- `easy showAnything`
	- `easy string`
	- `CR Text`
	- `ReplaceText`

	This implies dependencies on VoiceBridge, VoxCPM/RunningHub nodes,
	MelBandRoFormer nodes/models, rgthree, easy-use, and the LLM API node package.

	## Model and Secret Inventory

	Expected model assets:

	- `Qwen/Qwen3-ASR-1.7B`
	- `Qwen/Qwen3-ForcedAligner-0.6B`
	- `VoxCPM2`
	- `MelBandRoFormer_comfy/MelBandRoformer_fp32.safetensors`

	Expected Space secrets:

	- `HF_TOKEN`, if private or gated model downloads are needed.
	- `DEEPSEEK_API_KEY` or another LLM provider key.
	- Optional LLM base URL and model name configuration.

	Do not commit model weights, API keys, generated audio, or generated subtitles.

	## Implementation Phases

	### Phase 1: Scaffold and Repository Hygiene

	Done:

	- Add HF Space root files.
	- Add minimal Gradio placeholder.
	- Add deployment plan.
	- Add ignore rules for runtime and generated artifacts.
	- Add a TODO checklist.
	- Copy the API workflow to `workflows/voicegate_api.json`.
	- Copy the UI workflow to `workflows/voicegate_ui.json`.
	- Confirm the API workflow is valid JSON.
	- Confirm the workflow files do not contain real API keys.

	### Phase 2: Dependency Inventory

	Done:

	- Identify the ComfyUI and custom node repositories needed by the API workflow.
	- Pin the current candidate commits in `docs/dependency-inventory.md`.
	- Identify initial Python, system package, model, and secret requirements.
	- Decide to install custom nodes from pinned git URLs during bootstrap instead
	of vendoring them into this Space repo.

	### Phase 3: Runtime Bootstrap

	Create scripts that can:

	- Clone or install ComfyUI.
	- Install Python dependencies.
	- Install required custom nodes at pinned commits.
	- Download or locate required model files.
	- Start ComfyUI locally inside the Space process.

	Current script status:

	- `scripts/bootstrap_comfy.py` clones ComfyUI and all pinned custom node
	repositories, installs their requirements, prepares model directories, and
	can optionally download the VoxCPM2 and MelBand RoFormer assets.
	- `scripts/run_comfy.py` starts ComfyUI and waits for `/system_stats`.
	- `scripts/workflow_client.py` uploads audio, patches the VoiceGate API
	workflow, submits it through `/prompt`, and waits on `/history/{prompt_id}`.

	Remaining runtime bootstrap work:

	- Wire bootstrap/startup behavior into `app.py`.
	- Validate the bootstrap and ComfyUI startup in the actual Space container.
	- Confirm the upload endpoint used by `LoadAudio` accepts the audio files we
	send from Gradio.

	### Phase 4: Workflow Parameterization

	Parameterize `workflows/voicegate_api.json` before submitting it to ComfyUI.

	Required edits:

	- Patch hard-coded audio filenames with Gradio-uploaded input files.
	- Patch API keys from environment variables.
	- Patch target language, LLM model, and provider base URL.
	- Ensure output nodes produce deterministic job-specific file paths.

	These are implemented in `scripts/workflow_client.py`, but still need to be
	connected to the Gradio UI and verified against a running ComfyUI process.

	### Phase 5: Gradio Integration

	Build the first real interface:

	- Input audio file.
	- Target language selector/text input.
	- Source language, default `auto`.
	- Optional prompt override.
	- Output audio.
	- Output translated/adjusted SRT.
	- Runtime log.

	Wrap the end-to-end function with `@spaces.GPU(duration=...)` and start with a
	short maximum input duration.

	### Phase 6: Verification

	Verify in this order:

	1. ComfyUI starts and exposes its local API.
	2. TTS-only minimal workflow runs.
	3. ASR-only short audio workflow runs.
	4. SRT splitter + VoxCPM + merger runs.
	5. Full VoiceGate short-audio workflow runs.
	6. Video input support is added after the audio path is stable.

	## Immediate Next Step

	Continue Phase 3 by wiring bootstrap/startup behavior into `app.py`, then test
	the scripts inside the running Hugging Face Space container.