Spaces:

build-small-hackathon
/

VoiceGate

Running on Zero

Add TTS trim control and polish UI

1c552ae 26 days ago

7.45 kB

A newer version of the Gradio SDK is available: 6.19.0

VoiceGate HF Space TODO

This is the execution checklist for bringing the VoiceGate Hugging Face Space from scaffold to a working Gradio app.

Create the VoiceGate-hf Space wrapper repository.
Configure and push to build-small-hackathon/VoiceGate.
Replace the default HF template with a VoiceGate scaffold.
Keep the upstream VoiceGate/ checkout local-only and ignored.
Preserve Hugging Face LFS rules in .gitattributes.
Confirm VoiceGate/workflows/VoiceGate-Workflow_api.json is valid JSON.
Copy the validated API workflow into workflows/voicegate_api.json.
Confirm SSH access to the running Space container and document the runbook.
Confirm DEEPSEEK_API_KEY is visible in the Space without printing it.
Download VoxCPM2 and MelBand RoFormer to persistent Space storage.
Confirm ZeroGPU CUDA can be invoked from the normal Gradio runtime through @spaces.GPU.
Confirm the full short-audio workflow can run on the duplicated personal T4 Small Space.

Copy VoiceGate/workflows/VoiceGate-Workflow_api.json to workflows/voicegate_api.json.
Copy VoiceGate/workflows/VoiceGate-Workflow.json to workflows/voicegate_ui.json for reference.
Update docs so VoiceGate/ is clearly documented as a local upstream checkout, not Space runtime content.
Commit and push the workflow files after confirming they contain no secrets or large media.

Identify exact repositories and pinned commits for every required custom node package:
- comfyui_voicebridge
- RunningHub VoxCPM nodes
- MelBandRoFormer nodes
- rgthree nodes
- comfyui-easy-use
- Comfyroll text nodes or equivalent provider for CR Text
- RH_LLMAPI_NODE provider
- ReplaceText provider
- MergeAudioMW provider
Identify Python dependency constraints for ComfyUI and all custom nodes.
Identify system packages beyond ffmpeg and git, if any.
Decide whether custom nodes are vendored in custom_nodes/ or installed by pinned git URL during bootstrap.
Decide where model files are downloaded and cached in the Space.

Add scripts/bootstrap_comfy.py.
Add scripts/run_comfy.py.
Add scripts/workflow_client.py.
Install or prepare ComfyUI on Space startup.
Add bootstrap support for installing custom node dependencies.
Add opt-in model directory preparation and download commands.
Verify ComfyUI can start locally in the Space in CPU mode.
Verify ComfyUI can start from a Gradio @spaces.GPU function and report CUDA through /system_stats.
Verify ComfyUI_AudioTools imports successfully in the Space after adding its missing system and Python dependencies.
Verify ComfyUI API endpoints are reachable:
- /system_stats
- /upload/image or the audio upload equivalent used by LoadAudio
- /prompt
- /history/{prompt_id}

Add Python-side patching for node 16 LoadAudio.inputs.audio.
Add Python-side patching for node 105 api_key from DEEPSEEK_API_KEY.
Add Python-side patching for node 105 api_baseurl.
Add Python-side patching for node 105 model.
Add Python-side patching for node 110 target language.
Add unique job-specific output prefixes for node 180 and node 214.
Decide which user controls are exposed first:
- source language
- target language
- LLM model
- max input duration
- keep or drop background audio
Remove or ignore display-only easy showAnything nodes if they are not needed for API execution.

Replace the placeholder app.py with the first VoiceGate interface.
Add short audio upload input.
Add target language input.
Add status/log output.
Add generated audio output.
Add generated SRT file outputs.
Wrap GPU-heavy execution with @spaces.GPU(duration=...).
Add and run a minimal GPU smoke test button.
Keep a diagnostics tab for internal workflow tests.
Add automatic model-path preparation for the user-facing run path.
Add user-facing TTS segment trim control for node 268 TrimAudioDuration.start_index.
Add guardrails for file type and duration.

Start Space and confirm Gradio loads.
Start ComfyUI from the Space process.
Submit a minimal prompt to ComfyUI and receive a response.
Submit a minimal prompt to ComfyUI from the Gradio GPU runtime and receive a response.
Run a DeepSeek LLM node smoke test.
Run a MelBand RoFormer smoke test.
Run a MelBand RoFormer smoke test inside ZeroGPU.
Run a tiny TTS-only workflow inside ZeroGPU.
Run a short ASR-only workflow inside ZeroGPU.
Verify the Space API workflow connection graph against VoiceGate/workflows/VoiceGate-Workflow.json.
Run SRT split -> VoxCPM -> SRT merge.
Run the full short-audio VoiceGate workflow on the personal T4 Small Space.
Run the full short-audio VoiceGate workflow on the organization ZeroGPU Space after quota recovers.
Confirm output audio is downloadable/playable from Gradio.
Confirm SRT files are downloadable from Gradio after redeploy.

Which exact node repository provides RH_LLMAPI_NODE?
Which exact node repository provides RunningHub_VoxCPM_*?
Is flash_attention_2 available and reliable in the ZeroGPU environment?
Can ASR run without flash_attention_2? Yes. The ASR-only smoke test used attention=sdpa.
Does VoxCPM2 fit comfortably in ZeroGPU memory with ASR and MelBandRoFormer in the same run? A full run reached the heavy nodes but timed out at the 800s app wait window; the next shortened retry was blocked by ZeroGPU quota before execution.
Can a full short-audio workflow run on T4 Small? Yes, on the duplicated personal Space with a warm ComfyUI process, the test completed in 24.4s.
How do we avoid user-facing failure after hardware changes? The user path now checks required model paths and runs bootstrap_comfy.py --with-models before submitting to ComfyUI.
Where should large model files live? /data/voicegate_models, with symlinks into ComfyUI's expected model directories.
Should the first public demo disable background separation to reduce runtime and memory pressure?
What maximum uploaded audio/video duration should the first version allow?
Can SSH access the ZeroGPU CUDA device directly? No. SSH enters the normal running Space container without CUDA; GPU work must run inside a @spaces.GPU function.
Has the Space successfully invoked ZeroGPU CUDA from Gradio? Yes. The gpu_smoke_test button returned cuda_available=True on one CUDA device.