# VoiceGate HF Space TODO This is the execution checklist for bringing the VoiceGate Hugging Face Space from scaffold to a working Gradio app. ## Current Status - [x] Create the `VoiceGate-hf` Space wrapper repository. - [x] Configure and push to `build-small-hackathon/VoiceGate`. - [x] Replace the default HF template with a VoiceGate scaffold. - [x] Keep the upstream `VoiceGate/` checkout local-only and ignored. - [x] Preserve Hugging Face LFS rules in `.gitattributes`. - [x] Confirm `VoiceGate/workflows/VoiceGate-Workflow_api.json` is valid JSON. - [x] Copy the validated API workflow into `workflows/voicegate_api.json`. - [x] Confirm SSH access to the running Space container and document the runbook. - [x] Confirm `DEEPSEEK_API_KEY` is visible in the Space without printing it. - [x] Download VoxCPM2 and MelBand RoFormer to persistent Space storage. - [x] Confirm ZeroGPU CUDA can be invoked from the normal Gradio runtime through `@spaces.GPU`. - [x] Confirm the full short-audio workflow can run on the duplicated personal T4 Small Space. ## Phase 1: Repository Hygiene - [x] Copy `VoiceGate/workflows/VoiceGate-Workflow_api.json` to `workflows/voicegate_api.json`. - [x] Copy `VoiceGate/workflows/VoiceGate-Workflow.json` to `workflows/voicegate_ui.json` for reference. - [x] Update docs so `VoiceGate/` is clearly documented as a local upstream checkout, not Space runtime content. - [x] Commit and push the workflow files after confirming they contain no secrets or large media. ## Phase 2: Dependency Inventory - [x] Identify exact repositories and pinned commits for every required custom node package: - [x] `comfyui_voicebridge` - [x] RunningHub VoxCPM nodes - [x] MelBandRoFormer nodes - [x] rgthree nodes - [x] comfyui-easy-use - [x] Comfyroll text nodes or equivalent provider for `CR Text` - [x] `RH_LLMAPI_NODE` provider - [x] `ReplaceText` provider - [x] `MergeAudioMW` provider - [x] Identify Python dependency constraints for ComfyUI and all custom nodes. - [x] Identify system packages beyond `ffmpeg` and `git`, if any. - [x] Decide whether custom nodes are vendored in `custom_nodes/` or installed by pinned git URL during bootstrap. - [x] Decide where model files are downloaded and cached in the Space. ## Phase 3: Runtime Bootstrap - [x] Add `scripts/bootstrap_comfy.py`. - [x] Add `scripts/run_comfy.py`. - [x] Add `scripts/workflow_client.py`. - [ ] Install or prepare ComfyUI on Space startup. - [x] Add bootstrap support for installing custom node dependencies. - [x] Add opt-in model directory preparation and download commands. - [x] Verify ComfyUI can start locally in the Space in CPU mode. - [x] Verify ComfyUI can start from a Gradio `@spaces.GPU` function and report CUDA through `/system_stats`. - [x] Verify `ComfyUI_AudioTools` imports successfully in the Space after adding its missing system and Python dependencies. - [ ] Verify ComfyUI API endpoints are reachable: - [x] `/system_stats` - [x] `/upload/image` or the audio upload equivalent used by `LoadAudio` - [x] `/prompt` - [x] `/history/{prompt_id}` ## Phase 4: Workflow Parameterization - [x] Add Python-side patching for node `16` `LoadAudio.inputs.audio`. - [x] Add Python-side patching for node `105` `api_key` from `DEEPSEEK_API_KEY`. - [x] Add Python-side patching for node `105` `api_baseurl`. - [x] Add Python-side patching for node `105` `model`. - [x] Add Python-side patching for node `110` target language. - [x] Add unique job-specific output prefixes for node `180` and node `214`. - [x] Decide which user controls are exposed first: - [ ] source language - [x] target language - [ ] LLM model - [ ] max input duration - [ ] keep or drop background audio - [ ] Remove or ignore display-only `easy showAnything` nodes if they are not needed for API execution. ## Phase 5: Gradio Interface - [x] Replace the placeholder `app.py` with the first VoiceGate interface. - [x] Add short audio upload input. - [x] Add target language input. - [x] Add status/log output. - [x] Add generated audio output. - [x] Add generated SRT file outputs. - [ ] Wrap GPU-heavy execution with `@spaces.GPU(duration=...)`. - [x] Add and run a minimal GPU smoke test button. - [x] Keep a diagnostics tab for internal workflow tests. - [x] Add automatic model-path preparation for the user-facing run path. - [x] Add user-facing TTS segment trim control for node `268` `TrimAudioDuration.start_index`. - [ ] Add guardrails for file type and duration. ## Phase 6: Minimal Runtime Tests - [x] Start Space and confirm Gradio loads. - [x] Start ComfyUI from the Space process. - [x] Submit a minimal prompt to ComfyUI and receive a response. - [x] Submit a minimal prompt to ComfyUI from the Gradio GPU runtime and receive a response. - [x] Run a DeepSeek LLM node smoke test. - [x] Run a MelBand RoFormer smoke test. - [x] Run a MelBand RoFormer smoke test inside ZeroGPU. - [x] Run a tiny TTS-only workflow inside ZeroGPU. - [x] Run a short ASR-only workflow inside ZeroGPU. - [x] Verify the Space API workflow connection graph against `VoiceGate/workflows/VoiceGate-Workflow.json`. - [ ] Run SRT split -> VoxCPM -> SRT merge. - [x] Run the full short-audio VoiceGate workflow on the personal T4 Small Space. - [ ] Run the full short-audio VoiceGate workflow on the organization ZeroGPU Space after quota recovers. - [x] Confirm output audio is downloadable/playable from Gradio. - [ ] Confirm SRT files are downloadable from Gradio after redeploy. ## Phase 7: Full VoiceGate Path - [ ] Add video input support after the audio path is stable. - [ ] Extract audio from video with `ffmpeg`. - [ ] Merge generated audio back into the original video if required. - [ ] Add subtitle download and optional subtitle burn-in. - [ ] Add examples with small sample files, if allowed by Space storage limits. ## Open Questions - [x] Which exact node repository provides `RH_LLMAPI_NODE`? - [x] Which exact node repository provides `RunningHub_VoxCPM_*`? - [ ] Is `flash_attention_2` available and reliable in the ZeroGPU environment? - [x] Can ASR run without `flash_attention_2`? Yes. The ASR-only smoke test used `attention=sdpa`. - [ ] Does VoxCPM2 fit comfortably in ZeroGPU memory with ASR and MelBandRoFormer in the same run? A full run reached the heavy nodes but timed out at the `800s` app wait window; the next shortened retry was blocked by ZeroGPU quota before execution. - [x] Can a full short-audio workflow run on T4 Small? Yes, on the duplicated personal Space with a warm ComfyUI process, the test completed in `24.4s`. - [x] How do we avoid user-facing failure after hardware changes? The user path now checks required model paths and runs `bootstrap_comfy.py --with-models` before submitting to ComfyUI. - [x] Where should large model files live? `/data/voicegate_models`, with symlinks into ComfyUI's expected model directories. - [ ] Should the first public demo disable background separation to reduce runtime and memory pressure? - [ ] What maximum uploaded audio/video duration should the first version allow? - [x] Can SSH access the ZeroGPU CUDA device directly? No. SSH enters the normal running Space container without CUDA; GPU work must run inside a `@spaces.GPU` function. - [x] Has the Space successfully invoked ZeroGPU CUDA from Gradio? Yes. The `gpu_smoke_test` button returned `cuda_available=True` on one CUDA device.