VoiceGate / docs /dependency-inventory.md
YanTianlong's picture
Record ASR smoke test success
057d7fe
|
Raw
History Blame Contribute Delete
8.82 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

VoiceGate Dependency Inventory

This document tracks the ComfyUI, custom node, Python package, model, and runtime dependencies required by workflows/voicegate_api.json.

Workflow Node Sources

Workflow node(s) Source Repository Version / pin status Notes
LoadAudio, SaveAudioMP3, TrimAudioDuration ComfyUI core https://github.com/comfyanonymous/ComfyUI.git current checked HEAD: 5aa71b9bc28809a16596bb9fa3d0a6300d8e3f0e; workflow recorded comfy-core 0.12.1 Core audio nodes.
VoiceBridgeASRLoader, VoiceBridgeASRTranscribe, GenerateSRT, VoiceBridgeSRTSplitter, VoiceBridgeAudioListMergerBySRT, SaveSRTFromString VoiceBridge https://github.com/YanTianlong-01/comfyui_voicebridge.git current checked HEAD: 3728962c0db7b9e05a1d0b341e3dbbd8adba4409; workflow recorded ddefcc0082ab9591f9b613f0de565f25f85d8f2a and 5149c68df1d156794999bd77ff6a86fcab0314ed Required. Existing workflow uses newer SRT split/merge nodes plus ASR/SRT generation.
RunningHub_VoxCPM_LoadModel, RunningHub_VoxCPM_Generate RunningHub VoxCPM Prefer https://github.com/RH-RunningHub/ComfyUI_RH_VoxCPM.git current checked HEAD: 8365fe0e1fa60d7547f83ae5db53453a3d9c627d; mirror/fork https://github.com/HM-RunningHub/ComfyUI_RH_VoxCPM.git HEAD: 1cd7b29fb6596588319fc5ad49cd78f5b5375d76 Required for VoxCPM2 TTS. Both candidate repos contain the needed node mappings; README points to RH-RunningHub.
MelBandRoFormerModelLoader, MelBandRoFormerSampler MelBand RoFormer https://github.com/kijai/ComfyUI-MelBandRoFormer.git current checked HEAD: 92c86854e6654f4aacc97484471af95c98ea16d4; workflow recorded b40e263224778ec417114d91d8b3b39934e30de5 Required for vocal/background separation.
RH_LLMAPI_NODE RunningHub LLM API https://github.com/HM-RunningHub/ComfyUI_RH_LLM_API.git current checked HEAD: 26e18d1a769bd08e115b59bfdf170f8a2166c0df Required for DeepSeek/OpenAI-compatible SRT translation. No requirements file; code imports openai.
Any Switch (rgthree), Fast Groups Bypasser (rgthree) rgthree https://github.com/rgthree/rgthree-comfy.git current checked HEAD: 738105af5fb14e96fbecaf406dc356e284797e8c Required by API workflow for reference audio/text switches.
easy showAnything, easy string Easy Use https://github.com/yolain/ComfyUI-Easy-Use.git current checked HEAD: 625efbfa2fc20c31797dfffcbb41a26b6d91ab7b easy showAnything is display-oriented and may be removable later, but current workflow references it.
CR Text Comfyroll https://github.com/Suzie1/ComfyUI_Comfyroll_CustomNodes.git current checked HEAD: d78b780ae43fcf8c6b7c6505e6ffb4584281ceca Required for the translation prompt text node. No requirements file found.
ReplaceText ComfyUI core extra nodes https://github.com/comfyanonymous/ComfyUI.git covered by the ComfyUI pin Confirmed in comfy_extras/nodes_dataset.py as node id ReplaceText.
MergeAudioMW MW AudioTools https://github.com/billwuhao/ComfyUI_AudioTools.git current checked HEAD: 41463715b476aa1d44de617119a68d8841aa04bd Required for merging generated speech with separated background audio.

Python Requirements Observed

ComfyUI Core

Install from ComfyUI's own requirements.txt after cloning the selected ComfyUI commit.

VoiceBridge

comfyui_voicebridge/requirements.txt:

torch
numpy
qwen-asr
transformers
accelerate
modelscope
soundfile
openai

RunningHub VoxCPM

ComfyUI_RH_VoxCPM/requirements.txt:

voxcpm
soundfile
librosa
wetext
modelscope>=1.22.0
funasr
inflect
addict
simplejson
sortedcontainers
pydantic
transformers
datasets
safetensors
argbind

The workflow only uses inference nodes, so training-only dependencies may be avoidable later. Keep the upstream requirements initially for compatibility, then trim once the runtime is known.

MW AudioTools

ComfyUI_AudioTools/requirements.txt:

sox
librosa
pydub
pyyaml
rotary-embedding-torch
typeguard
git+https://github.com/SesameAILabs/silentcipher

MelBand RoFormer

ComfyUI-MelBandRoFormer/requirements.txt:

rotary_embedding_torch
einops

Easy Use

ComfyUI-Easy-Use/requirements.txt:

diffusers
accelerate
clip_interrogator>=0.6.0
lark
onnxruntime
opencv-python-headless
sentencepiece
spandrel
matplotlib
peft

rgthree

rgthree-comfy/requirements.txt exists but is empty.

RunningHub LLM API

No requirements.txt found. The node imports openai, already covered by VoiceBridge.

Comfyroll

No requirements.txt found.

Model Inventory

Model / asset Source Target location Notes
Qwen ASR Qwen/Qwen3-ASR-1.7B persistent target /data/voicegate_models/Qwen3-ASR/Qwen3-ASR-1.7B; ComfyUI link root ComfyUI/models/Qwen3-ASR Loaded by VoiceBridgeASRLoader.
Qwen forced aligner Qwen/Qwen3-ForcedAligner-0.6B persistent target /data/voicegate_models/Qwen3-ASR/Qwen3-ForcedAligner-0.6B; ComfyUI link root ComfyUI/models/Qwen3-ASR Loaded by VoiceBridgeASRLoader.
VoxCPM2 openbmb/VoxCPM2 persistent target /data/voicegate_models/voxcpm/VoxCPM2; ComfyUI link ComfyUI/models/voxcpm/VoxCPM2 RunningHub VoxCPM README documents the ComfyUI location. Approx. 4.6 GB.
MelBand RoFormer Kijai/MelBandRoFormer_comfy persistent target /data/voicegate_models/diffusion_models/MelBandRoFormer_comfy; ComfyUI link ComfyUI/models/diffusion_models/MelBandRoFormer_comfy Workflow references MelBandRoFormer_comfy/MelBandRoformer_fp32.safetensors.

System Packages

Current packages.txt contains:

ffmpeg
git

Likely sufficient for the first bootstrap pass. Revisit after dependency installation errors.

Space Secrets

Required:

  • DEEPSEEK_API_KEY or compatible OpenAI API key for RH_LLMAPI_NODE.

Optional:

  • DEEPSEEK_BASE_URL, default https://api.deepseek.com.
  • DEEPSEEK_MODEL, default currently deepseek-v4-flash in the workflow.
  • HF_TOKEN, if model downloads require authenticated access.

Pinning Strategy

Use explicit git URLs and commits in the bootstrap script.

Recommended initial pins:

ComfyUI                         5aa71b9bc28809a16596bb9fa3d0a6300d8e3f0e
comfyui_voicebridge             3728962c0db7b9e05a1d0b341e3dbbd8adba4409
ComfyUI_RH_VoxCPM               8365fe0e1fa60d7547f83ae5db53453a3d9c627d
ComfyUI-MelBandRoFormer         92c86854e6654f4aacc97484471af95c98ea16d4
ComfyUI_RH_LLM_API              26e18d1a769bd08e115b59bfdf170f8a2166c0df
rgthree-comfy                   738105af5fb14e96fbecaf406dc356e284797e8c
ComfyUI-Easy-Use                625efbfa2fc20c31797dfffcbb41a26b6d91ab7b
ComfyUI_Comfyroll_CustomNodes   d78b780ae43fcf8c6b7c6505e6ffb4584281ceca
ComfyUI_AudioTools              41463715b476aa1d44de617119a68d8841aa04bd

Important: workflow-embedded commits for VoiceBridge and MelBand differ from current HEADs. Bootstrap should prefer current tested HEADs first, then fall back to workflow-embedded commits only if node API compatibility breaks.

Installation Strategy

Do not vendor custom node repositories into this Space yet. Install them during bootstrap from explicit git URLs and commit pins. This keeps the Space repo small and makes it easier to update or swap a node package when a ComfyUI API change breaks compatibility.

Runtime layout target:

ComfyUI/
|-- custom_nodes/
|   |-- comfyui_voicebridge/
|   |-- ComfyUI_RH_VoxCPM/
|   |-- ComfyUI-MelBandRoFormer/
|   |-- ComfyUI_RH_LLM_API/
|   |-- rgthree-comfy/
|   |-- ComfyUI-Easy-Use/
|   |-- ComfyUI_Comfyroll_CustomNodes/
|   `-- ComfyUI_AudioTools/
`-- models/
    |-- voxcpm/VoxCPM2/
    |-- diffusion_models/MelBandRoFormer_comfy/
    `-- Qwen3-ASR/
        |-- Qwen3-ASR-1.7B/
        `-- Qwen3-ForcedAligner-0.6B/

Model files should be downloaded at runtime or first startup with huggingface_hub/hf download. The Space has persistent storage mounted at /data, so scripts/bootstrap_comfy.py defaults to storing large model files under /data/voicegate_models and creating symlinks at the ComfyUI paths expected by custom nodes. Override this with VOICEGATE_MODEL_ROOT if needed. Do not commit model weights to git.

Remaining Compatibility Risks

  • Can the current comfyui_voicebridge HEAD run the workflow created with earlier recorded VoiceBridge commits?
  • Is flash_attention_2 available in the target ZeroGPU environment, or should the workflow patcher downgrade attention mode automatically?
  • Should EasyUse and Comfyroll be removed from the API workflow by replacing easy string, easy showAnything, CR Text, and ReplaceText with plain Python-side prompt patching?