# VoiceGate HF Space TODO

This is the execution checklist for bringing the VoiceGate Hugging Face Space
from scaffold to a working Gradio app.

## Current Status

- [x] Create the `VoiceGate-hf` Space wrapper repository.
- [x] Configure and push to `build-small-hackathon/VoiceGate`.
- [x] Replace the default HF template with a VoiceGate scaffold.
- [x] Keep the upstream `VoiceGate/` checkout local-only and ignored.
- [x] Preserve Hugging Face LFS rules in `.gitattributes`.
- [x] Confirm `VoiceGate/workflows/VoiceGate-Workflow_api.json` is valid JSON.
- [x] Copy the validated API workflow into `workflows/voicegate_api.json`.
- [x] Confirm SSH access to the running Space container and document the
  runbook.
- [x] Confirm `DEEPSEEK_API_KEY` is visible in the Space without printing it.
- [x] Download VoxCPM2 and MelBand RoFormer to persistent Space storage.
- [x] Confirm ZeroGPU CUDA can be invoked from the normal Gradio runtime through
  `@spaces.GPU`.
- [x] Confirm the full short-audio workflow can run on the duplicated personal
  T4 Small Space.

## Phase 1: Repository Hygiene

- [x] Copy `VoiceGate/workflows/VoiceGate-Workflow_api.json` to
  `workflows/voicegate_api.json`.
- [x] Copy `VoiceGate/workflows/VoiceGate-Workflow.json` to
  `workflows/voicegate_ui.json` for reference.
- [x] Update docs so `VoiceGate/` is clearly documented as a local upstream
  checkout, not Space runtime content.
- [x] Commit and push the workflow files after confirming they contain no
  secrets or large media.

## Phase 2: Dependency Inventory

- [x] Identify exact repositories and pinned commits for every required custom
  node package:
  - [x] `comfyui_voicebridge`
  - [x] RunningHub VoxCPM nodes
  - [x] MelBandRoFormer nodes
  - [x] rgthree nodes
  - [x] comfyui-easy-use
  - [x] Comfyroll text nodes or equivalent provider for `CR Text`
  - [x] `RH_LLMAPI_NODE` provider
  - [x] `ReplaceText` provider
  - [x] `MergeAudioMW` provider
- [x] Identify Python dependency constraints for ComfyUI and all custom nodes.
- [x] Identify system packages beyond `ffmpeg` and `git`, if any.
- [x] Decide whether custom nodes are vendored in `custom_nodes/` or installed
  by pinned git URL during bootstrap.
- [x] Decide where model files are downloaded and cached in the Space.

## Phase 3: Runtime Bootstrap

- [x] Add `scripts/bootstrap_comfy.py`.
- [x] Add `scripts/run_comfy.py`.
- [x] Add `scripts/workflow_client.py`.
- [ ] Install or prepare ComfyUI on Space startup.
- [x] Add bootstrap support for installing custom node dependencies.
- [x] Add opt-in model directory preparation and download commands.
- [x] Verify ComfyUI can start locally in the Space in CPU mode.
- [x] Verify ComfyUI can start from a Gradio `@spaces.GPU` function and report
  CUDA through `/system_stats`.
- [x] Verify `ComfyUI_AudioTools` imports successfully in the Space after
  adding its missing system and Python dependencies.
- [ ] Verify ComfyUI API endpoints are reachable:
  - [x] `/system_stats`
  - [x] `/upload/image` or the audio upload equivalent used by `LoadAudio`
  - [x] `/prompt`
  - [x] `/history/{prompt_id}`

## Phase 4: Workflow Parameterization

- [x] Add Python-side patching for node `16` `LoadAudio.inputs.audio`.
- [x] Add Python-side patching for node `105` `api_key` from
  `DEEPSEEK_API_KEY`.
- [x] Add Python-side patching for node `105` `api_baseurl`.
- [x] Add Python-side patching for node `105` `model`.
- [x] Add Python-side patching for node `110` target language.
- [x] Add unique job-specific output prefixes for node `180` and node `214`.
- [x] Decide which user controls are exposed first:
  - [ ] source language
  - [x] target language
  - [ ] LLM model
  - [ ] max input duration
  - [ ] keep or drop background audio
- [ ] Remove or ignore display-only `easy showAnything` nodes if they are not
  needed for API execution.

## Phase 5: Gradio Interface

- [x] Replace the placeholder `app.py` with the first VoiceGate interface.
- [x] Add short audio upload input.
- [x] Add target language input.
- [x] Add status/log output.
- [x] Add generated audio output.
- [x] Add generated SRT file outputs.
- [ ] Wrap GPU-heavy execution with `@spaces.GPU(duration=...)`.
- [x] Add and run a minimal GPU smoke test button.
- [x] Keep a diagnostics tab for internal workflow tests.
- [x] Add automatic model-path preparation for the user-facing run path.
- [x] Add user-facing TTS segment trim control for node `268`
  `TrimAudioDuration.start_index`.
- [ ] Add guardrails for file type and duration.

## Phase 6: Minimal Runtime Tests

- [x] Start Space and confirm Gradio loads.
- [x] Start ComfyUI from the Space process.
- [x] Submit a minimal prompt to ComfyUI and receive a response.
- [x] Submit a minimal prompt to ComfyUI from the Gradio GPU runtime and receive
  a response.
- [x] Run a DeepSeek LLM node smoke test.
- [x] Run a MelBand RoFormer smoke test.
- [x] Run a MelBand RoFormer smoke test inside ZeroGPU.
- [x] Run a tiny TTS-only workflow inside ZeroGPU.
- [x] Run a short ASR-only workflow inside ZeroGPU.
- [x] Verify the Space API workflow connection graph against
  `VoiceGate/workflows/VoiceGate-Workflow.json`.
- [ ] Run SRT split -> VoxCPM -> SRT merge.
- [x] Run the full short-audio VoiceGate workflow on the personal T4 Small
  Space.
- [ ] Run the full short-audio VoiceGate workflow on the organization ZeroGPU
  Space after quota recovers.
- [x] Confirm output audio is downloadable/playable from Gradio.
- [ ] Confirm SRT files are downloadable from Gradio after redeploy.

## Phase 7: Full VoiceGate Path

- [ ] Add video input support after the audio path is stable.
- [ ] Extract audio from video with `ffmpeg`.
- [ ] Merge generated audio back into the original video if required.
- [ ] Add subtitle download and optional subtitle burn-in.
- [ ] Add examples with small sample files, if allowed by Space storage limits.

## Open Questions

- [x] Which exact node repository provides `RH_LLMAPI_NODE`?
- [x] Which exact node repository provides `RunningHub_VoxCPM_*`?
- [ ] Is `flash_attention_2` available and reliable in the ZeroGPU environment?
- [x] Can ASR run without `flash_attention_2`? Yes. The ASR-only smoke test
  used `attention=sdpa`.
- [ ] Does VoxCPM2 fit comfortably in ZeroGPU memory with ASR and
  MelBandRoFormer in the same run? A full run reached the heavy nodes but timed
  out at the `800s` app wait window; the next shortened retry was blocked by
  ZeroGPU quota before execution.
- [x] Can a full short-audio workflow run on T4 Small? Yes, on the duplicated
  personal Space with a warm ComfyUI process, the test completed in `24.4s`.
- [x] How do we avoid user-facing failure after hardware changes? The user path
  now checks required model paths and runs `bootstrap_comfy.py --with-models`
  before submitting to ComfyUI.
- [x] Where should large model files live? `/data/voicegate_models`, with
  symlinks into ComfyUI's expected model directories.
- [ ] Should the first public demo disable background separation to reduce
  runtime and memory pressure?
- [ ] What maximum uploaded audio/video duration should the first version allow?
- [x] Can SSH access the ZeroGPU CUDA device directly? No. SSH enters the
  normal running Space container without CUDA; GPU work must run inside a
  `@spaces.GPU` function.
- [x] Has the Space successfully invoked ZeroGPU CUDA from Gradio? Yes. The
  `gpu_smoke_test` button returned `cuda_available=True` on one CUDA device.