Spaces:
Running on Zero
Running on Zero
| # VoiceGate Dependency Inventory | |
| This document tracks the ComfyUI, custom node, Python package, model, and | |
| runtime dependencies required by `workflows/voicegate_api.json`. | |
| ## Workflow Node Sources | |
| | Workflow node(s) | Source | Repository | Version / pin status | Notes | | |
| | --- | --- | --- | --- | --- | | |
| | `LoadAudio`, `SaveAudioMP3`, `TrimAudioDuration` | ComfyUI core | `https://github.com/comfyanonymous/ComfyUI.git` | current checked HEAD: `5aa71b9bc28809a16596bb9fa3d0a6300d8e3f0e`; workflow recorded `comfy-core` `0.12.1` | Core audio nodes. | | |
| | `VoiceBridgeASRLoader`, `VoiceBridgeASRTranscribe`, `GenerateSRT`, `VoiceBridgeSRTSplitter`, `VoiceBridgeAudioListMergerBySRT`, `SaveSRTFromString` | VoiceBridge | `https://github.com/YanTianlong-01/comfyui_voicebridge.git` | current checked HEAD: `3728962c0db7b9e05a1d0b341e3dbbd8adba4409`; workflow recorded `ddefcc0082ab9591f9b613f0de565f25f85d8f2a` and `5149c68df1d156794999bd77ff6a86fcab0314ed` | Required. Existing workflow uses newer SRT split/merge nodes plus ASR/SRT generation. | | |
| | `RunningHub_VoxCPM_LoadModel`, `RunningHub_VoxCPM_Generate` | RunningHub VoxCPM | Prefer `https://github.com/RH-RunningHub/ComfyUI_RH_VoxCPM.git` | current checked HEAD: `8365fe0e1fa60d7547f83ae5db53453a3d9c627d`; mirror/fork `https://github.com/HM-RunningHub/ComfyUI_RH_VoxCPM.git` HEAD: `1cd7b29fb6596588319fc5ad49cd78f5b5375d76` | Required for VoxCPM2 TTS. Both candidate repos contain the needed node mappings; README points to `RH-RunningHub`. | | |
| | `MelBandRoFormerModelLoader`, `MelBandRoFormerSampler` | MelBand RoFormer | `https://github.com/kijai/ComfyUI-MelBandRoFormer.git` | current checked HEAD: `92c86854e6654f4aacc97484471af95c98ea16d4`; workflow recorded `b40e263224778ec417114d91d8b3b39934e30de5` | Required for vocal/background separation. | | |
| | `RH_LLMAPI_NODE` | RunningHub LLM API | `https://github.com/HM-RunningHub/ComfyUI_RH_LLM_API.git` | current checked HEAD: `26e18d1a769bd08e115b59bfdf170f8a2166c0df` | Required for DeepSeek/OpenAI-compatible SRT translation. No requirements file; code imports `openai`. | | |
| | `Any Switch (rgthree)`, `Fast Groups Bypasser (rgthree)` | rgthree | `https://github.com/rgthree/rgthree-comfy.git` | current checked HEAD: `738105af5fb14e96fbecaf406dc356e284797e8c` | Required by API workflow for reference audio/text switches. | | |
| | `easy showAnything`, `easy string` | Easy Use | `https://github.com/yolain/ComfyUI-Easy-Use.git` | current checked HEAD: `625efbfa2fc20c31797dfffcbb41a26b6d91ab7b` | `easy showAnything` is display-oriented and may be removable later, but current workflow references it. | | |
| | `CR Text` | Comfyroll | `https://github.com/Suzie1/ComfyUI_Comfyroll_CustomNodes.git` | current checked HEAD: `d78b780ae43fcf8c6b7c6505e6ffb4584281ceca` | Required for the translation prompt text node. No requirements file found. | | |
| | `ReplaceText` | ComfyUI core extra nodes | `https://github.com/comfyanonymous/ComfyUI.git` | covered by the ComfyUI pin | Confirmed in `comfy_extras/nodes_dataset.py` as node id `ReplaceText`. | | |
| | `MergeAudioMW` | MW AudioTools | `https://github.com/billwuhao/ComfyUI_AudioTools.git` | current checked HEAD: `41463715b476aa1d44de617119a68d8841aa04bd` | Required for merging generated speech with separated background audio. | | |
| ## Python Requirements Observed | |
| ### ComfyUI Core | |
| Install from ComfyUI's own `requirements.txt` after cloning the selected | |
| ComfyUI commit. | |
| ### VoiceBridge | |
| `comfyui_voicebridge/requirements.txt`: | |
| ```text | |
| torch | |
| numpy | |
| qwen-asr | |
| transformers | |
| accelerate | |
| modelscope | |
| soundfile | |
| openai | |
| ``` | |
| ### RunningHub VoxCPM | |
| `ComfyUI_RH_VoxCPM/requirements.txt`: | |
| ```text | |
| voxcpm | |
| soundfile | |
| librosa | |
| wetext | |
| modelscope>=1.22.0 | |
| funasr | |
| inflect | |
| addict | |
| simplejson | |
| sortedcontainers | |
| pydantic | |
| transformers | |
| datasets | |
| safetensors | |
| argbind | |
| ``` | |
| The workflow only uses inference nodes, so training-only dependencies may be | |
| avoidable later. Keep the upstream requirements initially for compatibility, | |
| then trim once the runtime is known. | |
| ### MW AudioTools | |
| `ComfyUI_AudioTools/requirements.txt`: | |
| ```text | |
| sox | |
| librosa | |
| pydub | |
| pyyaml | |
| rotary-embedding-torch | |
| typeguard | |
| git+https://github.com/SesameAILabs/silentcipher | |
| ``` | |
| ### MelBand RoFormer | |
| `ComfyUI-MelBandRoFormer/requirements.txt`: | |
| ```text | |
| rotary_embedding_torch | |
| einops | |
| ``` | |
| ### Easy Use | |
| `ComfyUI-Easy-Use/requirements.txt`: | |
| ```text | |
| diffusers | |
| accelerate | |
| clip_interrogator>=0.6.0 | |
| lark | |
| onnxruntime | |
| opencv-python-headless | |
| sentencepiece | |
| spandrel | |
| matplotlib | |
| peft | |
| ``` | |
| ### rgthree | |
| `rgthree-comfy/requirements.txt` exists but is empty. | |
| ### RunningHub LLM API | |
| No `requirements.txt` found. The node imports `openai`, already covered by | |
| VoiceBridge. | |
| ### Comfyroll | |
| No `requirements.txt` found. | |
| ## Model Inventory | |
| | Model / asset | Source | Target location | Notes | | |
| | --- | --- | --- | --- | | |
| | Qwen ASR | `Qwen/Qwen3-ASR-1.7B` | persistent target `/data/voicegate_models/Qwen3-ASR/Qwen3-ASR-1.7B`; ComfyUI link root `ComfyUI/models/Qwen3-ASR` | Loaded by `VoiceBridgeASRLoader`. | | |
| | Qwen forced aligner | `Qwen/Qwen3-ForcedAligner-0.6B` | persistent target `/data/voicegate_models/Qwen3-ASR/Qwen3-ForcedAligner-0.6B`; ComfyUI link root `ComfyUI/models/Qwen3-ASR` | Loaded by `VoiceBridgeASRLoader`. | | |
| | VoxCPM2 | `openbmb/VoxCPM2` | persistent target `/data/voicegate_models/voxcpm/VoxCPM2`; ComfyUI link `ComfyUI/models/voxcpm/VoxCPM2` | RunningHub VoxCPM README documents the ComfyUI location. Approx. 4.6 GB. | | |
| | MelBand RoFormer | `Kijai/MelBandRoFormer_comfy` | persistent target `/data/voicegate_models/diffusion_models/MelBandRoFormer_comfy`; ComfyUI link `ComfyUI/models/diffusion_models/MelBandRoFormer_comfy` | Workflow references `MelBandRoFormer_comfy/MelBandRoformer_fp32.safetensors`. | | |
| ## System Packages | |
| Current `packages.txt` contains: | |
| ```text | |
| ffmpeg | |
| git | |
| ``` | |
| Likely sufficient for the first bootstrap pass. Revisit after dependency | |
| installation errors. | |
| ## Space Secrets | |
| Required: | |
| - `DEEPSEEK_API_KEY` or compatible OpenAI API key for `RH_LLMAPI_NODE`. | |
| Optional: | |
| - `DEEPSEEK_BASE_URL`, default `https://api.deepseek.com`. | |
| - `DEEPSEEK_MODEL`, default currently `deepseek-v4-flash` in the workflow. | |
| - `HF_TOKEN`, if model downloads require authenticated access. | |
| ## Pinning Strategy | |
| Use explicit git URLs and commits in the bootstrap script. | |
| Recommended initial pins: | |
| ```text | |
| ComfyUI 5aa71b9bc28809a16596bb9fa3d0a6300d8e3f0e | |
| comfyui_voicebridge 3728962c0db7b9e05a1d0b341e3dbbd8adba4409 | |
| ComfyUI_RH_VoxCPM 8365fe0e1fa60d7547f83ae5db53453a3d9c627d | |
| ComfyUI-MelBandRoFormer 92c86854e6654f4aacc97484471af95c98ea16d4 | |
| ComfyUI_RH_LLM_API 26e18d1a769bd08e115b59bfdf170f8a2166c0df | |
| rgthree-comfy 738105af5fb14e96fbecaf406dc356e284797e8c | |
| ComfyUI-Easy-Use 625efbfa2fc20c31797dfffcbb41a26b6d91ab7b | |
| ComfyUI_Comfyroll_CustomNodes d78b780ae43fcf8c6b7c6505e6ffb4584281ceca | |
| ComfyUI_AudioTools 41463715b476aa1d44de617119a68d8841aa04bd | |
| ``` | |
| Important: workflow-embedded commits for VoiceBridge and MelBand differ from | |
| current HEADs. Bootstrap should prefer current tested HEADs first, then fall | |
| back to workflow-embedded commits only if node API compatibility breaks. | |
| ## Installation Strategy | |
| Do not vendor custom node repositories into this Space yet. Install them during | |
| bootstrap from explicit git URLs and commit pins. This keeps the Space repo | |
| small and makes it easier to update or swap a node package when a ComfyUI API | |
| change breaks compatibility. | |
| Runtime layout target: | |
| ```text | |
| ComfyUI/ | |
| |-- custom_nodes/ | |
| | |-- comfyui_voicebridge/ | |
| | |-- ComfyUI_RH_VoxCPM/ | |
| | |-- ComfyUI-MelBandRoFormer/ | |
| | |-- ComfyUI_RH_LLM_API/ | |
| | |-- rgthree-comfy/ | |
| | |-- ComfyUI-Easy-Use/ | |
| | |-- ComfyUI_Comfyroll_CustomNodes/ | |
| | `-- ComfyUI_AudioTools/ | |
| `-- models/ | |
| |-- voxcpm/VoxCPM2/ | |
| |-- diffusion_models/MelBandRoFormer_comfy/ | |
| `-- Qwen3-ASR/ | |
| |-- Qwen3-ASR-1.7B/ | |
| `-- Qwen3-ForcedAligner-0.6B/ | |
| ``` | |
| Model files should be downloaded at runtime or first startup with | |
| `huggingface_hub`/`hf download`. The Space has persistent storage mounted at | |
| `/data`, so `scripts/bootstrap_comfy.py` defaults to storing large model files | |
| under `/data/voicegate_models` and creating symlinks at the ComfyUI paths | |
| expected by custom nodes. Override this with `VOICEGATE_MODEL_ROOT` if needed. | |
| Do not commit model weights to git. | |
| ## Remaining Compatibility Risks | |
| - Can the current `comfyui_voicebridge` HEAD run the workflow created with | |
| earlier recorded VoiceBridge commits? | |
| - Is `flash_attention_2` available in the target ZeroGPU environment, or should | |
| the workflow patcher downgrade attention mode automatically? | |
| - Should EasyUse and Comfyroll be removed from the API workflow by replacing | |
| `easy string`, `easy showAnything`, `CR Text`, and `ReplaceText` with plain | |
| Python-side prompt patching? | |