Spaces:
Running on Zero
Running on Zero
A newer version of the Gradio SDK is available: 6.19.0
VoiceGate HF Space TODO
This is the execution checklist for bringing the VoiceGate Hugging Face Space from scaffold to a working Gradio app.
Current Status
- Create the
VoiceGate-hfSpace wrapper repository. - Configure and push to
build-small-hackathon/VoiceGate. - Replace the default HF template with a VoiceGate scaffold.
- Keep the upstream
VoiceGate/checkout local-only and ignored. - Preserve Hugging Face LFS rules in
.gitattributes. - Confirm
VoiceGate/workflows/VoiceGate-Workflow_api.jsonis valid JSON. - Copy the validated API workflow into
workflows/voicegate_api.json. - Confirm SSH access to the running Space container and document the runbook.
- Confirm
DEEPSEEK_API_KEYis visible in the Space without printing it. - Download VoxCPM2 and MelBand RoFormer to persistent Space storage.
- Confirm ZeroGPU CUDA can be invoked from the normal Gradio runtime through
@spaces.GPU. - Confirm the full short-audio workflow can run on the duplicated personal T4 Small Space.
Phase 1: Repository Hygiene
- Copy
VoiceGate/workflows/VoiceGate-Workflow_api.jsontoworkflows/voicegate_api.json. - Copy
VoiceGate/workflows/VoiceGate-Workflow.jsontoworkflows/voicegate_ui.jsonfor reference. - Update docs so
VoiceGate/is clearly documented as a local upstream checkout, not Space runtime content. - Commit and push the workflow files after confirming they contain no secrets or large media.
Phase 2: Dependency Inventory
- Identify exact repositories and pinned commits for every required custom
node package:
-
comfyui_voicebridge - RunningHub VoxCPM nodes
- MelBandRoFormer nodes
- rgthree nodes
- comfyui-easy-use
- Comfyroll text nodes or equivalent provider for
CR Text -
RH_LLMAPI_NODEprovider -
ReplaceTextprovider -
MergeAudioMWprovider
-
- Identify Python dependency constraints for ComfyUI and all custom nodes.
- Identify system packages beyond
ffmpegandgit, if any. - Decide whether custom nodes are vendored in
custom_nodes/or installed by pinned git URL during bootstrap. - Decide where model files are downloaded and cached in the Space.
Phase 3: Runtime Bootstrap
- Add
scripts/bootstrap_comfy.py. - Add
scripts/run_comfy.py. - Add
scripts/workflow_client.py. - Install or prepare ComfyUI on Space startup.
- Add bootstrap support for installing custom node dependencies.
- Add opt-in model directory preparation and download commands.
- Verify ComfyUI can start locally in the Space in CPU mode.
- Verify ComfyUI can start from a Gradio
@spaces.GPUfunction and report CUDA through/system_stats. - Verify
ComfyUI_AudioToolsimports successfully in the Space after adding its missing system and Python dependencies. - Verify ComfyUI API endpoints are reachable:
-
/system_stats -
/upload/imageor the audio upload equivalent used byLoadAudio -
/prompt -
/history/{prompt_id}
-
Phase 4: Workflow Parameterization
- Add Python-side patching for node
16LoadAudio.inputs.audio. - Add Python-side patching for node
105api_keyfromDEEPSEEK_API_KEY. - Add Python-side patching for node
105api_baseurl. - Add Python-side patching for node
105model. - Add Python-side patching for node
110target language. - Add unique job-specific output prefixes for node
180and node214. - Decide which user controls are exposed first:
- source language
- target language
- LLM model
- max input duration
- keep or drop background audio
- Remove or ignore display-only
easy showAnythingnodes if they are not needed for API execution.
Phase 5: Gradio Interface
- Replace the placeholder
app.pywith the first VoiceGate interface. - Add short audio upload input.
- Add target language input.
- Add status/log output.
- Add generated audio output.
- Add generated SRT file outputs.
- Wrap GPU-heavy execution with
@spaces.GPU(duration=...). - Add and run a minimal GPU smoke test button.
- Keep a diagnostics tab for internal workflow tests.
- Add automatic model-path preparation for the user-facing run path.
- Add user-facing TTS segment trim control for node
268TrimAudioDuration.start_index. - Add guardrails for file type and duration.
Phase 6: Minimal Runtime Tests
- Start Space and confirm Gradio loads.
- Start ComfyUI from the Space process.
- Submit a minimal prompt to ComfyUI and receive a response.
- Submit a minimal prompt to ComfyUI from the Gradio GPU runtime and receive a response.
- Run a DeepSeek LLM node smoke test.
- Run a MelBand RoFormer smoke test.
- Run a MelBand RoFormer smoke test inside ZeroGPU.
- Run a tiny TTS-only workflow inside ZeroGPU.
- Run a short ASR-only workflow inside ZeroGPU.
- Verify the Space API workflow connection graph against
VoiceGate/workflows/VoiceGate-Workflow.json. - Run SRT split -> VoxCPM -> SRT merge.
- Run the full short-audio VoiceGate workflow on the personal T4 Small Space.
- Run the full short-audio VoiceGate workflow on the organization ZeroGPU Space after quota recovers.
- Confirm output audio is downloadable/playable from Gradio.
- Confirm SRT files are downloadable from Gradio after redeploy.
Phase 7: Full VoiceGate Path
- Add video input support after the audio path is stable.
- Extract audio from video with
ffmpeg. - Merge generated audio back into the original video if required.
- Add subtitle download and optional subtitle burn-in.
- Add examples with small sample files, if allowed by Space storage limits.
Open Questions
- Which exact node repository provides
RH_LLMAPI_NODE? - Which exact node repository provides
RunningHub_VoxCPM_*? - Is
flash_attention_2available and reliable in the ZeroGPU environment? - Can ASR run without
flash_attention_2? Yes. The ASR-only smoke test usedattention=sdpa. - Does VoxCPM2 fit comfortably in ZeroGPU memory with ASR and
MelBandRoFormer in the same run? A full run reached the heavy nodes but timed
out at the
800sapp wait window; the next shortened retry was blocked by ZeroGPU quota before execution. - Can a full short-audio workflow run on T4 Small? Yes, on the duplicated
personal Space with a warm ComfyUI process, the test completed in
24.4s. - How do we avoid user-facing failure after hardware changes? The user path
now checks required model paths and runs
bootstrap_comfy.py --with-modelsbefore submitting to ComfyUI. - Where should large model files live?
/data/voicegate_models, with symlinks into ComfyUI's expected model directories. - Should the first public demo disable background separation to reduce runtime and memory pressure?
- What maximum uploaded audio/video duration should the first version allow?
- Can SSH access the ZeroGPU CUDA device directly? No. SSH enters the
normal running Space container without CUDA; GPU work must run inside a
@spaces.GPUfunction. - Has the Space successfully invoked ZeroGPU CUDA from Gradio? Yes. The
gpu_smoke_testbutton returnedcuda_available=Trueon one CUDA device.