Spaces:

build-small-hackathon
/

VoiceGate

Running on Zero

App Files Files Community

VoiceGate / docs /dependency-inventory.md

YanTianlong

Record ASR smoke test success

057d7fe 24 days ago

preview code

Raw

History Blame Contribute Delete

8.82 kB

	# VoiceGate Dependency Inventory

	This document tracks the ComfyUI, custom node, Python package, model, and
	runtime dependencies required by `workflows/voicegate_api.json`.

	## Workflow Node Sources

	\| Workflow node(s) \| Source \| Repository \| Version / pin status \| Notes \|
	\| --- \| --- \| --- \| --- \| --- \|
	\| `LoadAudio`, `SaveAudioMP3`, `TrimAudioDuration` \| ComfyUI core \| `https://github.com/comfyanonymous/ComfyUI.git` \| current checked HEAD: `5aa71b9bc28809a16596bb9fa3d0a6300d8e3f0e`; workflow recorded `comfy-core` `0.12.1` \| Core audio nodes. \|
	\| `VoiceBridgeASRLoader`, `VoiceBridgeASRTranscribe`, `GenerateSRT`, `VoiceBridgeSRTSplitter`, `VoiceBridgeAudioListMergerBySRT`, `SaveSRTFromString` \| VoiceBridge \| `https://github.com/YanTianlong-01/comfyui_voicebridge.git` \| current checked HEAD: `3728962c0db7b9e05a1d0b341e3dbbd8adba4409`; workflow recorded `ddefcc0082ab9591f9b613f0de565f25f85d8f2a` and `5149c68df1d156794999bd77ff6a86fcab0314ed` \| Required. Existing workflow uses newer SRT split/merge nodes plus ASR/SRT generation. \|
	\| `RunningHub_VoxCPM_LoadModel`, `RunningHub_VoxCPM_Generate` \| RunningHub VoxCPM \| Prefer `https://github.com/RH-RunningHub/ComfyUI_RH_VoxCPM.git` \| current checked HEAD: `8365fe0e1fa60d7547f83ae5db53453a3d9c627d`; mirror/fork `https://github.com/HM-RunningHub/ComfyUI_RH_VoxCPM.git` HEAD: `1cd7b29fb6596588319fc5ad49cd78f5b5375d76` \| Required for VoxCPM2 TTS. Both candidate repos contain the needed node mappings; README points to `RH-RunningHub`. \|
	\| `MelBandRoFormerModelLoader`, `MelBandRoFormerSampler` \| MelBand RoFormer \| `https://github.com/kijai/ComfyUI-MelBandRoFormer.git` \| current checked HEAD: `92c86854e6654f4aacc97484471af95c98ea16d4`; workflow recorded `b40e263224778ec417114d91d8b3b39934e30de5` \| Required for vocal/background separation. \|
	\| `RH_LLMAPI_NODE` \| RunningHub LLM API \| `https://github.com/HM-RunningHub/ComfyUI_RH_LLM_API.git` \| current checked HEAD: `26e18d1a769bd08e115b59bfdf170f8a2166c0df` \| Required for DeepSeek/OpenAI-compatible SRT translation. No requirements file; code imports `openai`. \|
	\| `Any Switch (rgthree)`, `Fast Groups Bypasser (rgthree)` \| rgthree \| `https://github.com/rgthree/rgthree-comfy.git` \| current checked HEAD: `738105af5fb14e96fbecaf406dc356e284797e8c` \| Required by API workflow for reference audio/text switches. \|
	\| `easy showAnything`, `easy string` \| Easy Use \| `https://github.com/yolain/ComfyUI-Easy-Use.git` \| current checked HEAD: `625efbfa2fc20c31797dfffcbb41a26b6d91ab7b` \| `easy showAnything` is display-oriented and may be removable later, but current workflow references it. \|
	\| `CR Text` \| Comfyroll \| `https://github.com/Suzie1/ComfyUI_Comfyroll_CustomNodes.git` \| current checked HEAD: `d78b780ae43fcf8c6b7c6505e6ffb4584281ceca` \| Required for the translation prompt text node. No requirements file found. \|
	\| `ReplaceText` \| ComfyUI core extra nodes \| `https://github.com/comfyanonymous/ComfyUI.git` \| covered by the ComfyUI pin \| Confirmed in `comfy_extras/nodes_dataset.py` as node id `ReplaceText`. \|
	\| `MergeAudioMW` \| MW AudioTools \| `https://github.com/billwuhao/ComfyUI_AudioTools.git` \| current checked HEAD: `41463715b476aa1d44de617119a68d8841aa04bd` \| Required for merging generated speech with separated background audio. \|

	## Python Requirements Observed

	### ComfyUI Core

	Install from ComfyUI's own `requirements.txt` after cloning the selected
	ComfyUI commit.

	### VoiceBridge

	`comfyui_voicebridge/requirements.txt`:

	```text
	torch
	numpy
	qwen-asr
	transformers
	accelerate
	modelscope
	soundfile
	openai
	```

	### RunningHub VoxCPM

	`ComfyUI_RH_VoxCPM/requirements.txt`:

	```text
	voxcpm
	soundfile
	librosa
	wetext
	modelscope>=1.22.0
	funasr
	inflect
	addict
	simplejson
	sortedcontainers
	pydantic
	transformers
	datasets
	safetensors
	argbind
	```

	The workflow only uses inference nodes, so training-only dependencies may be
	avoidable later. Keep the upstream requirements initially for compatibility,
	then trim once the runtime is known.

	### MW AudioTools

	`ComfyUI_AudioTools/requirements.txt`:

	```text
	sox
	librosa
	pydub
	pyyaml
	rotary-embedding-torch
	typeguard
	git+https://github.com/SesameAILabs/silentcipher
	```

	### MelBand RoFormer

	`ComfyUI-MelBandRoFormer/requirements.txt`:

	```text
	rotary_embedding_torch
	einops
	```

	### Easy Use

	`ComfyUI-Easy-Use/requirements.txt`:

	```text
	diffusers
	accelerate
	clip_interrogator>=0.6.0
	lark
	onnxruntime
	opencv-python-headless
	sentencepiece
	spandrel
	matplotlib
	peft
	```

	### rgthree

	`rgthree-comfy/requirements.txt` exists but is empty.

	### RunningHub LLM API

	No `requirements.txt` found. The node imports `openai`, already covered by
	VoiceBridge.

	### Comfyroll

	No `requirements.txt` found.

	## Model Inventory

	\| Model / asset \| Source \| Target location \| Notes \|
	\| --- \| --- \| --- \| --- \|
	\| Qwen ASR \| `Qwen/Qwen3-ASR-1.7B` \| persistent target `/data/voicegate_models/Qwen3-ASR/Qwen3-ASR-1.7B`; ComfyUI link root `ComfyUI/models/Qwen3-ASR` \| Loaded by `VoiceBridgeASRLoader`. \|
	\| Qwen forced aligner \| `Qwen/Qwen3-ForcedAligner-0.6B` \| persistent target `/data/voicegate_models/Qwen3-ASR/Qwen3-ForcedAligner-0.6B`; ComfyUI link root `ComfyUI/models/Qwen3-ASR` \| Loaded by `VoiceBridgeASRLoader`. \|
	\| VoxCPM2 \| `openbmb/VoxCPM2` \| persistent target `/data/voicegate_models/voxcpm/VoxCPM2`; ComfyUI link `ComfyUI/models/voxcpm/VoxCPM2` \| RunningHub VoxCPM README documents the ComfyUI location. Approx. 4.6 GB. \|
	\| MelBand RoFormer \| `Kijai/MelBandRoFormer_comfy` \| persistent target `/data/voicegate_models/diffusion_models/MelBandRoFormer_comfy`; ComfyUI link `ComfyUI/models/diffusion_models/MelBandRoFormer_comfy` \| Workflow references `MelBandRoFormer_comfy/MelBandRoformer_fp32.safetensors`. \|

	## System Packages

	Current `packages.txt` contains:

	```text
	ffmpeg
	git
	```

	Likely sufficient for the first bootstrap pass. Revisit after dependency
	installation errors.

	## Space Secrets

	Required:

	- `DEEPSEEK_API_KEY` or compatible OpenAI API key for `RH_LLMAPI_NODE`.

	Optional:

	- `DEEPSEEK_BASE_URL`, default `https://api.deepseek.com`.
	- `DEEPSEEK_MODEL`, default currently `deepseek-v4-flash` in the workflow.
	- `HF_TOKEN`, if model downloads require authenticated access.

	## Pinning Strategy

	Use explicit git URLs and commits in the bootstrap script.

	Recommended initial pins:

	```text
	ComfyUI 5aa71b9bc28809a16596bb9fa3d0a6300d8e3f0e
	comfyui_voicebridge 3728962c0db7b9e05a1d0b341e3dbbd8adba4409
	ComfyUI_RH_VoxCPM 8365fe0e1fa60d7547f83ae5db53453a3d9c627d
	ComfyUI-MelBandRoFormer 92c86854e6654f4aacc97484471af95c98ea16d4
	ComfyUI_RH_LLM_API 26e18d1a769bd08e115b59bfdf170f8a2166c0df
	rgthree-comfy 738105af5fb14e96fbecaf406dc356e284797e8c
	ComfyUI-Easy-Use 625efbfa2fc20c31797dfffcbb41a26b6d91ab7b
	ComfyUI_Comfyroll_CustomNodes d78b780ae43fcf8c6b7c6505e6ffb4584281ceca
	ComfyUI_AudioTools 41463715b476aa1d44de617119a68d8841aa04bd
	```

	Important: workflow-embedded commits for VoiceBridge and MelBand differ from
	current HEADs. Bootstrap should prefer current tested HEADs first, then fall
	back to workflow-embedded commits only if node API compatibility breaks.

	## Installation Strategy

	Do not vendor custom node repositories into this Space yet. Install them during
	bootstrap from explicit git URLs and commit pins. This keeps the Space repo
	small and makes it easier to update or swap a node package when a ComfyUI API
	change breaks compatibility.

	Runtime layout target:

	```text
	ComfyUI/
	\|-- custom_nodes/
	\| \|-- comfyui_voicebridge/
	\| \|-- ComfyUI_RH_VoxCPM/
	\| \|-- ComfyUI-MelBandRoFormer/
	\| \|-- ComfyUI_RH_LLM_API/
	\| \|-- rgthree-comfy/
	\| \|-- ComfyUI-Easy-Use/
	\| \|-- ComfyUI_Comfyroll_CustomNodes/
	\| `-- ComfyUI_AudioTools/
	`-- models/
	\|-- voxcpm/VoxCPM2/
	\|-- diffusion_models/MelBandRoFormer_comfy/
	`-- Qwen3-ASR/
	\|-- Qwen3-ASR-1.7B/
	`-- Qwen3-ForcedAligner-0.6B/
	```

	Model files should be downloaded at runtime or first startup with
	`huggingface_hub`/`hf download`. The Space has persistent storage mounted at
	`/data`, so `scripts/bootstrap_comfy.py` defaults to storing large model files
	under `/data/voicegate_models` and creating symlinks at the ComfyUI paths
	expected by custom nodes. Override this with `VOICEGATE_MODEL_ROOT` if needed.
	Do not commit model weights to git.

	## Remaining Compatibility Risks

	- Can the current `comfyui_voicebridge` HEAD run the workflow created with
	earlier recorded VoiceBridge commits?
	- Is `flash_attention_2` available in the target ZeroGPU environment, or should
	the workflow patcher downgrade attention mode automatically?
	- Should EasyUse and Comfyroll be removed from the API workflow by replacing
	`easy string`, `easy showAnything`, `CR Text`, and `ReplaceText` with plain
	Python-side prompt patching?