Spaces:
Running on Zero
Running on Zero
| # VoiceGate HF Space TODO | |
| This is the execution checklist for bringing the VoiceGate Hugging Face Space | |
| from scaffold to a working Gradio app. | |
| ## Current Status | |
| - [x] Create the `VoiceGate-hf` Space wrapper repository. | |
| - [x] Configure and push to `build-small-hackathon/VoiceGate`. | |
| - [x] Replace the default HF template with a VoiceGate scaffold. | |
| - [x] Keep the upstream `VoiceGate/` checkout local-only and ignored. | |
| - [x] Preserve Hugging Face LFS rules in `.gitattributes`. | |
| - [x] Confirm `VoiceGate/workflows/VoiceGate-Workflow_api.json` is valid JSON. | |
| - [x] Copy the validated API workflow into `workflows/voicegate_api.json`. | |
| - [x] Confirm SSH access to the running Space container and document the | |
| runbook. | |
| - [x] Confirm `DEEPSEEK_API_KEY` is visible in the Space without printing it. | |
| - [x] Download VoxCPM2 and MelBand RoFormer to persistent Space storage. | |
| - [x] Confirm ZeroGPU CUDA can be invoked from the normal Gradio runtime through | |
| `@spaces.GPU`. | |
| - [x] Confirm the full short-audio workflow can run on the duplicated personal | |
| T4 Small Space. | |
| ## Phase 1: Repository Hygiene | |
| - [x] Copy `VoiceGate/workflows/VoiceGate-Workflow_api.json` to | |
| `workflows/voicegate_api.json`. | |
| - [x] Copy `VoiceGate/workflows/VoiceGate-Workflow.json` to | |
| `workflows/voicegate_ui.json` for reference. | |
| - [x] Update docs so `VoiceGate/` is clearly documented as a local upstream | |
| checkout, not Space runtime content. | |
| - [x] Commit and push the workflow files after confirming they contain no | |
| secrets or large media. | |
| ## Phase 2: Dependency Inventory | |
| - [x] Identify exact repositories and pinned commits for every required custom | |
| node package: | |
| - [x] `comfyui_voicebridge` | |
| - [x] RunningHub VoxCPM nodes | |
| - [x] MelBandRoFormer nodes | |
| - [x] rgthree nodes | |
| - [x] comfyui-easy-use | |
| - [x] Comfyroll text nodes or equivalent provider for `CR Text` | |
| - [x] `RH_LLMAPI_NODE` provider | |
| - [x] `ReplaceText` provider | |
| - [x] `MergeAudioMW` provider | |
| - [x] Identify Python dependency constraints for ComfyUI and all custom nodes. | |
| - [x] Identify system packages beyond `ffmpeg` and `git`, if any. | |
| - [x] Decide whether custom nodes are vendored in `custom_nodes/` or installed | |
| by pinned git URL during bootstrap. | |
| - [x] Decide where model files are downloaded and cached in the Space. | |
| ## Phase 3: Runtime Bootstrap | |
| - [x] Add `scripts/bootstrap_comfy.py`. | |
| - [x] Add `scripts/run_comfy.py`. | |
| - [x] Add `scripts/workflow_client.py`. | |
| - [ ] Install or prepare ComfyUI on Space startup. | |
| - [x] Add bootstrap support for installing custom node dependencies. | |
| - [x] Add opt-in model directory preparation and download commands. | |
| - [x] Verify ComfyUI can start locally in the Space in CPU mode. | |
| - [x] Verify ComfyUI can start from a Gradio `@spaces.GPU` function and report | |
| CUDA through `/system_stats`. | |
| - [x] Verify `ComfyUI_AudioTools` imports successfully in the Space after | |
| adding its missing system and Python dependencies. | |
| - [ ] Verify ComfyUI API endpoints are reachable: | |
| - [x] `/system_stats` | |
| - [x] `/upload/image` or the audio upload equivalent used by `LoadAudio` | |
| - [x] `/prompt` | |
| - [x] `/history/{prompt_id}` | |
| ## Phase 4: Workflow Parameterization | |
| - [x] Add Python-side patching for node `16` `LoadAudio.inputs.audio`. | |
| - [x] Add Python-side patching for node `105` `api_key` from | |
| `DEEPSEEK_API_KEY`. | |
| - [x] Add Python-side patching for node `105` `api_baseurl`. | |
| - [x] Add Python-side patching for node `105` `model`. | |
| - [x] Add Python-side patching for node `110` target language. | |
| - [x] Add unique job-specific output prefixes for node `180` and node `214`. | |
| - [x] Decide which user controls are exposed first: | |
| - [ ] source language | |
| - [x] target language | |
| - [ ] LLM model | |
| - [ ] max input duration | |
| - [ ] keep or drop background audio | |
| - [ ] Remove or ignore display-only `easy showAnything` nodes if they are not | |
| needed for API execution. | |
| ## Phase 5: Gradio Interface | |
| - [x] Replace the placeholder `app.py` with the first VoiceGate interface. | |
| - [x] Add short audio upload input. | |
| - [x] Add target language input. | |
| - [x] Add status/log output. | |
| - [x] Add generated audio output. | |
| - [x] Add generated SRT file outputs. | |
| - [ ] Wrap GPU-heavy execution with `@spaces.GPU(duration=...)`. | |
| - [x] Add and run a minimal GPU smoke test button. | |
| - [x] Keep a diagnostics tab for internal workflow tests. | |
| - [x] Add automatic model-path preparation for the user-facing run path. | |
| - [x] Add user-facing TTS segment trim control for node `268` | |
| `TrimAudioDuration.start_index`. | |
| - [ ] Add guardrails for file type and duration. | |
| ## Phase 6: Minimal Runtime Tests | |
| - [x] Start Space and confirm Gradio loads. | |
| - [x] Start ComfyUI from the Space process. | |
| - [x] Submit a minimal prompt to ComfyUI and receive a response. | |
| - [x] Submit a minimal prompt to ComfyUI from the Gradio GPU runtime and receive | |
| a response. | |
| - [x] Run a DeepSeek LLM node smoke test. | |
| - [x] Run a MelBand RoFormer smoke test. | |
| - [x] Run a MelBand RoFormer smoke test inside ZeroGPU. | |
| - [x] Run a tiny TTS-only workflow inside ZeroGPU. | |
| - [x] Run a short ASR-only workflow inside ZeroGPU. | |
| - [x] Verify the Space API workflow connection graph against | |
| `VoiceGate/workflows/VoiceGate-Workflow.json`. | |
| - [ ] Run SRT split -> VoxCPM -> SRT merge. | |
| - [x] Run the full short-audio VoiceGate workflow on the personal T4 Small | |
| Space. | |
| - [ ] Run the full short-audio VoiceGate workflow on the organization ZeroGPU | |
| Space after quota recovers. | |
| - [x] Confirm output audio is downloadable/playable from Gradio. | |
| - [ ] Confirm SRT files are downloadable from Gradio after redeploy. | |
| ## Phase 7: Full VoiceGate Path | |
| - [ ] Add video input support after the audio path is stable. | |
| - [ ] Extract audio from video with `ffmpeg`. | |
| - [ ] Merge generated audio back into the original video if required. | |
| - [ ] Add subtitle download and optional subtitle burn-in. | |
| - [ ] Add examples with small sample files, if allowed by Space storage limits. | |
| ## Open Questions | |
| - [x] Which exact node repository provides `RH_LLMAPI_NODE`? | |
| - [x] Which exact node repository provides `RunningHub_VoxCPM_*`? | |
| - [ ] Is `flash_attention_2` available and reliable in the ZeroGPU environment? | |
| - [x] Can ASR run without `flash_attention_2`? Yes. The ASR-only smoke test | |
| used `attention=sdpa`. | |
| - [ ] Does VoxCPM2 fit comfortably in ZeroGPU memory with ASR and | |
| MelBandRoFormer in the same run? A full run reached the heavy nodes but timed | |
| out at the `800s` app wait window; the next shortened retry was blocked by | |
| ZeroGPU quota before execution. | |
| - [x] Can a full short-audio workflow run on T4 Small? Yes, on the duplicated | |
| personal Space with a warm ComfyUI process, the test completed in `24.4s`. | |
| - [x] How do we avoid user-facing failure after hardware changes? The user path | |
| now checks required model paths and runs `bootstrap_comfy.py --with-models` | |
| before submitting to ComfyUI. | |
| - [x] Where should large model files live? `/data/voicegate_models`, with | |
| symlinks into ComfyUI's expected model directories. | |
| - [ ] Should the first public demo disable background separation to reduce | |
| runtime and memory pressure? | |
| - [ ] What maximum uploaded audio/video duration should the first version allow? | |
| - [x] Can SSH access the ZeroGPU CUDA device directly? No. SSH enters the | |
| normal running Space container without CUDA; GPU work must run inside a | |
| `@spaces.GPU` function. | |
| - [x] Has the Space successfully invoked ZeroGPU CUDA from Gradio? Yes. The | |
| `gpu_smoke_test` button returned `cuda_available=True` on one CUDA device. | |