Spaces:

build-small-hackathon
/

VoiceGate

Running on Zero

App Files Files Community

Harden MelBand model loading on Spaces

by YanTianlong - opened 9 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+735

-622

Files changed (3) hide show

app.py +15 -1
docs/work-log.md +640 -620
scripts/bootstrap_comfy.py +80 -1

app.py CHANGED Viewed

@@ -25,6 +25,7 @@ import spaces
 import torch
 import websocket
 from scripts.workflow_client import load_workflow, patch_voicegate_workflow
@@ -39,6 +40,7 @@ COMFY_PORT = "8188"
 COMFY_PROCESS: subprocess.Popen | None = None
 PREPARE_PROCESS: subprocess.Popen | None = None
 BOOTSTRAPPED = False
 BOOTSTRAP_LOG = Path("/tmp/voicegate_bootstrap.log")
 USER_OUTPUT_DIR = ROOT / "user_outputs"
 REQUIRED_MODEL_PATHS = [
@@ -487,7 +489,18 @@ def run_bootstrap(lines: list[str], *, allow_heavy: bool = True) -> None:
 def missing_required_models() -> list[Path]:
-    return [path for path in REQUIRED_MODEL_PATHS if not path.exists()]
 def ensure_runtime_assets(lines: list[str]) -> None:
@@ -533,6 +546,7 @@ def ensure_comfy(lines: list[str], *, timeout: float = 240) -> dict[str, Any]:
             raise RuntimeError(f"Runtime preparation failed with return code {returncode}.")
     run_bootstrap(lines, allow_heavy=False)
     try:
         stats = wait_for_comfy(timeout=5)

 import torch
 import websocket
+from scripts.bootstrap_comfy import patch_melband_loader, validate_melband_model
 from scripts.workflow_client import load_workflow, patch_voicegate_workflow
 COMFY_PROCESS: subprocess.Popen | None = None
 PREPARE_PROCESS: subprocess.Popen | None = None
 BOOTSTRAPPED = False
+MODELS_VALIDATED = False
 BOOTSTRAP_LOG = Path("/tmp/voicegate_bootstrap.log")
 USER_OUTPUT_DIR = ROOT / "user_outputs"
 REQUIRED_MODEL_PATHS = [
 def missing_required_models() -> list[Path]:
+    global MODELS_VALIDATED
+    missing = [path for path in REQUIRED_MODEL_PATHS if not path.exists()]
+    if missing:
+        MODELS_VALIDATED = False
+        return missing
+    if not MODELS_VALIDATED:
+        melband_valid, _reason = validate_melband_model(verify_hash=True)
+        if not melband_valid:
+            return [REQUIRED_MODEL_PATHS[0]]
+        MODELS_VALIDATED = True
+    return []
 def ensure_runtime_assets(lines: list[str]) -> None:
             raise RuntimeError(f"Runtime preparation failed with return code {returncode}.")
     run_bootstrap(lines, allow_heavy=False)
+    patch_melband_loader()
     try:
         stats = wait_for_comfy(timeout=5)

docs/work-log.md CHANGED Viewed

@@ -1,38 +1,38 @@
-# VoiceGate HF Space Work Log
-This document records the effective work completed while preparing the
-`build-small-hackathon/VoiceGate` Hugging Face Space, plus the pitfalls found
-and how they were resolved.
-## Current Snapshot
-- Space: `https://huggingface.co/spaces/build-small-hackathon/VoiceGate`
-- Space git remote: `https://huggingface.co/spaces/build-small-hackathon/VoiceGate`
-- Runtime hardware: ZeroGPU / `zero-a10g`
-- Space SDK: Gradio
-- Local Space wrapper repo: `VoiceGate-hf`
-- Local upstream reference checkout: `VoiceGate/`
 - Latest confirmed normal runtime commit: `316b35db739d74d05543d6c8c9dd9c16e0580b17`
-- Current expected Space secret: `DEEPSEEK_API_KEY`
-- Default persistent model root: `/data/voicegate_models`
-Do not commit API keys, model weights, uploaded media, generated outputs, or the
-local `VoiceGate/` upstream checkout.
-## Executive Summary
-The Space is no longer just a blank scaffold. It can now run Gradio, invoke
-ZeroGPU, prepare a ComfyUI runtime, start ComfyUI from a GPU-backed Gradio
-function, and submit several segmented ComfyUI workflows.
-Confirmed working:
-- Hugging Face Space git push and normal rebuild flow.
-- Dev Mode SSH for CPU/container diagnostics.
-- ZeroGPU invocation from Gradio through `@spaces.GPU`.
-- ComfyUI startup from inside a `@spaces.GPU` function.
-- ComfyUI API calls from the Gradio process.
-- DeepSeek-compatible LLM node with the Space secret.
 - MelBand RoFormer smoke tests in CPU mode and ZeroGPU mode.
 - VoxCPM2 TTS-only smoke test in ZeroGPU mode.
 - VoiceBridge ASR-only smoke test in ZeroGPU mode.
@@ -44,223 +44,223 @@ Not yet confirmed at the start of 2026-06-06:
 - SRT split -> VoxCPM -> SRT merge.
 - Full short-audio VoiceGate workflow.
 - Final user-facing Gradio upload/download UI.
-## Repository Setup Completed
-- Created and pushed the Space wrapper repository.
-- Kept `VoiceGate/` as a local-only upstream reference and ignored it in git.
-- Preserved Hugging Face LFS rules.
-- Copied deployment workflows:
-  - `workflows/voicegate_api.json`
-  - `workflows/voicegate_ui.json`
-- Confirmed the API workflow JSON is valid.
-- Confirmed workflow files contain no committed API key.
-## Dependency Inventory Completed
-Required workflow node providers were identified and pinned:
-- ComfyUI core:
-  `comfyanonymous/ComfyUI`
-- VoiceBridge:
-  `YanTianlong-01/comfyui_voicebridge`
-- RunningHub VoxCPM:
-  `RH-RunningHub/ComfyUI_RH_VoxCPM`
-- MelBand RoFormer:
-  `kijai/ComfyUI-MelBandRoFormer`
-- RunningHub LLM API:
-  `HM-RunningHub/ComfyUI_RH_LLM_API`
-- rgthree:
-  `rgthree/rgthree-comfy`
-- Easy Use:
-  `yolain/ComfyUI-Easy-Use`
-- Comfyroll:
-  `Suzie1/ComfyUI_Comfyroll_CustomNodes`
-- MW AudioTools:
-  `billwuhao/ComfyUI_AudioTools`
-Important node source confirmations:
-- `ReplaceText` is provided by ComfyUI core extra nodes.
-- `MergeAudioMW` is provided by `ComfyUI_AudioTools`.
-- `RH_LLMAPI_NODE` is provided by `ComfyUI_RH_LLM_API`.
-## Runtime Bootstrap Added
-The following scripts were added:
-- `scripts/bootstrap_comfy.py`
-  - Clones ComfyUI.
-  - Checks out pinned commits.
-  - Clones required custom node repositories.
-  - Installs ComfyUI and custom node Python requirements.
-  - Prepares expected model directories.
-  - Optionally downloads large model assets with `--with-models`.
-- `scripts/run_comfy.py`
-  - Starts ComfyUI.
-  - Waits for `/system_stats`.
-  - Supports `--cpu` for SSH diagnostics.
-- `scripts/workflow_client.py`
-  - Loads `workflows/voicegate_api.json`.
-  - Uploads audio through the ComfyUI API.
-  - Patches workflow inputs.
-  - Submits `/prompt`.
-  - Waits for `/history/{prompt_id}`.
-Workflow patching currently covers:
-- Node `16`: uploaded audio filename.
-- Node `105`: `DEEPSEEK_API_KEY`.
-- Node `105`: API base URL.
-- Node `105`: LLM model name.
-- Node `110`: target language.
-- Node `180`: job-specific audio output prefix.
-- Node `214`: job-specific SRT output prefix.
-## Hugging Face Space Runtime Findings
-### Dev Mode and SSH
-SSH target:
-```text
-build-small-hackathon-voicegate@ssh.hf.space
-```
-Local private key:
-```text
-C:\Users\yantianlong\.ssh\codex_space_voicegate
-```
-SSH is only available while the Space is in Dev Mode. Normal running Spaces do
-not accept SSH and return:
-```text
-Bad request: SSH in only allowed in Dev mode
-```
-Dev Mode can be toggled through the Hugging Face API endpoint:
-```text
-POST /api/spaces/build-small-hackathon/VoiceGate/dev-mode
-```
-Use Dev Mode for diagnostics only. Persistent fixes must be committed locally
-and pushed.
-### Dev Mode Stale Commit Pitfall
-The running container initially stayed on the original template commit:
-```text
-a94117f35a42cb17f654ae70cbe619c15345d057
-```
-even after newer commits were pushed. `restart_space` alone did not move it to
-the latest repository state while Dev Mode was enabled.
-Fix:
-- Disable Dev Mode.
-- Use `factory_reboot=True` or push a new commit to trigger a normal rebuild.
-- Confirm runtime metadata reports the latest commit.
-### ZeroGPU Startup Requirement
-When Dev Mode was disabled, the Space entered `RUNTIME_ERROR` with:
-```text
-No @spaces.GPU function detected during startup
-```
-Fix:
-- Import `spaces`.
-- Add at least one `@spaces.GPU(duration=...)` function in `app.py`.
-Current placeholder fix:
-```python
-@spaces.GPU(duration=30)
-def placeholder():
-    ...
-```
-Later this placeholder was replaced by real diagnostic functions:
-```python
-@spaces.GPU(duration=60)
-def gpu_smoke_test():
-    ...
-@spaces.GPU(duration=900)
-def comfy_runtime_test():
-    ...
-```
-### SSH Does Not Expose ZeroGPU CUDA
-Starting ComfyUI normally through SSH failed with:
-```text
-RuntimeError: No CUDA GPUs are available
-```
-Conclusion:
-- SSH is useful for CPU-mode diagnostics.
-- Real GPU work must run from the Gradio process inside a `@spaces.GPU`
-  function.
-CPU diagnostic command:
-```bash
-python scripts/run_comfy.py --cpu
-```
-### Gradio Request Timeout During Bootstrap
-Long bootstrap work should not run synchronously inside a Gradio request. The
-first attempt did this:
-```text
-Gradio click -> bootstrap_comfy.py -> clone repos -> pip install -> start ComfyUI
-```
-The request was interrupted by Gradio/ZeroGPU's outer queue after roughly 2.5
-minutes and returned:
-```text
-event: error
-data: {"error": null}
-```
-Fix:
-- Add a non-GPU `Prepare` action that starts `scripts/bootstrap_comfy.py` as a
-  background process.
-- Add `Prepare Status` to poll `/tmp/voicegate_bootstrap.log`.
-- Keep GPU actions focused on starting ComfyUI and running actual CUDA work.
-This avoids wasting ZeroGPU time on clone/install steps and prevents the request
-from being killed before diagnostics can return useful logs.
 ### Runtime Pip Install Pitfall
-The background bootstrap installed a large dependency set and upgraded the
-on-disk Torch package. The already-running Gradio process continued to report:
-```text
-torch=2.11.0+cu130
-```
-while the ComfyUI subprocess started afterwards reported:
-```text
-pytorch_version=2.12.0+cu130
-```
 This is workable for diagnostics, but final production should avoid heavy
 runtime `pip install` where possible. Prefer moving stable dependencies into
 Space build-time requirements or explicitly controlling pins.
@@ -295,295 +295,295 @@ The working diagnostic used:
 For future tests, keep diagnostic durations conservative and increase only when
 the workflow has already proven it needs more time.
-## Dependency Pitfalls and Fixes
-`ComfyUI_AudioTools` initially failed to import.
-First failure:
-```text
-SoX could not be found
-ModuleNotFoundError: No module named 'sounddevice'
-```
-Second failure after adding `sounddevice`:
-```text
-OSError: PortAudio library not found
-```
-Third failure:
-```text
-ModuleNotFoundError: No module named 'easydict'
-```
-Fourth failure:
-```text
-ModuleNotFoundError: No module named 'pytorch_lightning'
-```
-Fixes added:
-- `packages.txt`
-  - `sox`
-  - `libportaudio2`
-  - `portaudio19-dev`
-- `requirements.txt`
-  - `sounddevice`
-  - `easydict`
-  - `pytorch-lightning`
-Final verification:
-```text
-0.4 seconds: /home/user/app/ComfyUI/custom_nodes/ComfyUI_AudioTools
-```
-with no `IMPORT FAILED` entry.
-## ComfyUI API Smoke Test
-Test audio source:
-```text
-D:\voicebridge-test-audio\test_audio\2-坤哥.MP3
-```
-The first upload attempt used a plain PowerShell byte pipeline and corrupted the
-binary file. The remote file was identified as text instead of MP3, and
-`LoadAudio` failed with:
-```text
-Invalid data found when processing input: 'avcodec_send_packet()'
-```
-Fix:
-- Upload binary test media through a binary-safe method.
-- Verify remote `sha256sum` before using the file.
-Successful upload result:
-```text
-/tmp/voicegate_test_audio.mp3: Audio file with ID3 version 2.3.0
-```
-ComfyUI API endpoints verified in Dev Mode:
-- `/system_stats`
-- `/upload/image`
-- `/prompt`
-- `/history/{prompt_id}`
-Minimal test workflow:
-```text
-LoadAudio -> SaveAudioMP3
-```
-Successful `/history/{prompt_id}` result:
-```text
-status_str: success
-completed: true
-```
-Output reported by ComfyUI:
-```text
-audio/api_smoke_voicegate_00001.mp3
-```
-## Segmented Workflow Smoke Tests
-### ComfyUI From Gradio ZeroGPU
-On 2026-06-05, `app.py` was expanded with diagnostic Gradio actions:
-- `prepare_runtime`: starts `scripts/bootstrap_comfy.py` in the background and
-  writes progress to `/tmp/voicegate_bootstrap.log`.
-- `prepare_status`: reports the background bootstrap status and log tail.
-- `comfy_runtime_test`: runs inside `@spaces.GPU`, starts ComfyUI, and calls
-  `/system_stats`.
-- `melband_gpu_test`: runs a tiny MelBand workflow inside `@spaces.GPU`.
-- `voxcpm_tts_gpu_test`: runs a tiny VoxCPM2 TTS-only workflow inside
-  `@spaces.GPU`.
-The first attempt ran the full bootstrap synchronously inside a Gradio request
-and the request was interrupted by the outer queue with `event: error` and no
-function payload after roughly 2.5 minutes. The fix was to start bootstrap as a
-background process and poll a status endpoint.
-The background prepare completed successfully. It installed a large dependency
-set and upgraded the on-disk Torch package from `2.11.0` to `2.12.0`. The
-already-running Gradio process still reported its originally imported
-`torch=2.11.0+cu130`, while the newly started ComfyUI subprocess reported:
-```text
-pytorch_version=2.12.0+cu130
-```
-This is acceptable for the smoke test, but runtime pip installs are not ideal
-for the final app. A later pass should move heavy Python dependencies into the
-Space build/install phase or pin the root requirements more deliberately.
-`comfy_runtime_test` result:
-```text
-cuda_available=True
-comfy_ready=true
-comfy_elapsed_sec=16.0
-ComfyUI version=0.24.0
-device=cuda:0 NVIDIA RTX PRO 6000 Blackwell Server Edition MIG 2g.48gb
-vram_total=50868518912
-```
-Observed behavior: separate `@spaces.GPU` calls may run in separate worker
-processes, so the ComfyUI subprocess should not be assumed to persist across
-different button/API calls.
-### ZeroGPU Gradio Invocation
-On 2026-06-05, the Space was tested in normal runtime, with Dev Mode off, using
-a Gradio button backed by:
-```python
-@spaces.GPU(duration=60)
-def gpu_smoke_test():
-    ...
-```
-The private Space API was called with the local Hugging Face token through:
-```text
-POST /gradio_api/call/gpu_smoke_test
-GET /gradio_api/call/gpu_smoke_test/{event_id}
-```
-Result:
-```text
-torch=2.11.0+cu130
-cuda_available=True
-cuda_device_count=1
-device_name=NVIDIA RTX PRO 6000 Blackwell Server Edition MIG 2g.48gb
-total_memory_gb=47.38
-tensor_result=240.0
-memory_reserved_mb=2.00
-```
-This confirms ZeroGPU CUDA is available from the normal Gradio runtime when the
-work is executed inside a `@spaces.GPU` function. SSH still should be treated as
-CPU-only diagnostic access.
-### DeepSeek LLM Node
-On 2026-06-05, `RH_LLMAPI_NODE` was tested through ComfyUI in Dev Mode using
-the Space `DEEPSEEK_API_KEY` secret. The key was not printed.
-Minimal workflow:
-```text
-RH_LLMAPI_NODE -> easy showAnything
-```
-Prompt:
-```text
-Translate to Simplified Chinese: VoiceGate smoke test.
-```
-Result:
-```text
-status_str: success
-output: VoiceGate 冒烟测试。
-```
-This confirms the RunningHub LLM node can read the Space secret and call the
-DeepSeek-compatible API endpoint.
-### MelBand RoFormer
-On 2026-06-05, `MelBandRoFormerModelLoader` and `MelBandRoFormerSampler` were
-tested through ComfyUI in CPU mode.
-Input:
-```text
-1 second synthetic 440 Hz WAV generated with ffmpeg
-```
-Minimal workflow:
-```text
-LoadAudio -> MelBandRoFormerModelLoader -> MelBandRoFormerSampler
-  -> SaveAudioMP3(vocals)
-  -> SaveAudioMP3(instruments)
-```
-Result:
-```text
-status_str: success
-audio/melband_smoke_vocals_00001.mp3
-audio/melband_smoke_instruments_00001.mp3
-```
-CPU-mode runtime for the 1 second smoke input was about 51 seconds. Real runs
-should execute inside a `@spaces.GPU` function.
-Later on 2026-06-05, the same kind of tiny MelBand smoke test was run from the
-normal Gradio runtime inside `@spaces.GPU`.
-Input:
-```text
-1 second synthetic 440 Hz WAV written to ComfyUI/input
-```
-Result:
-```text
-status_str=success
-completed=True
-audio/melband_gpu_32459bea_instruments_00001.mp3
-audio/melband_gpu_32459bea_vocals_00001.mp3
-elapsed_sec=78.3
-```
-This confirms the MelBand custom node and model can execute from the Space
-ZeroGPU path.
 ### VoxCPM2 TTS-only
-On 2026-06-05, a minimal VoxCPM2 TTS-only workflow was run from the normal
-Gradio runtime inside `@spaces.GPU`.
-Minimal workflow:
-```text
-RunningHub_VoxCPM_LoadModel -> RunningHub_VoxCPM_Generate -> SaveAudioMP3
-```
-Prompt text:
-```text
-你好，VoiceGate GPU 语音合成测试。
-```
-Result:
-```text
-status_str=success
-completed=True
-audio/voxcpm_tts_gpu_cda209ec_00001.mp3
-elapsed_sec=766.2
-```
 This confirms VoxCPM2 fits and executes in ZeroGPU, but the first cold TTS-only
 run was very slow. The final app should minimize cold starts, avoid repeated
 ComfyUI/model reloads where possible, and use shorter diagnostic prompts while
@@ -653,103 +653,103 @@ This confirms the Qwen3-ASR model, forced aligner, VoiceBridge ASR nodes, and
 SRT generation can run in the Space ZeroGPU path. The smoke test intentionally
 used `attention=sdpa` instead of `flash_attention_2`; `flash_attention_2`
 availability remains unverified.
-## Secrets and API Keys
-`DEEPSEEK_API_KEY` should be stored only as a Hugging Face Space Secret.
-Current expected secret:
-```text
-DEEPSEEK_API_KEY
-```
-Optional variables:
-```text
-DEEPSEEK_BASE_URL=https://api.deepseek.com
-DEEPSEEK_MODEL=deepseek-v4-flash
-```
-Never store these values in:
-- `app.py`
-- workflow JSON files
-- README files
-- docs
-- `.env` files committed to git
-`scripts/workflow_client.py` reads these from environment variables.
-`scripts/check_space_env.py` verifies whether these environment variables are
-present without printing their values.
-## Model Storage
-Large model files should live on the Space persistent storage volume instead of
-inside `/home/user/app`, because `/home/user/app` can be replaced during Space
-rebuilds.
-Default model root:
-```text
-/data/voicegate_models
-```
-`scripts/bootstrap_comfy.py` creates symlinks from ComfyUI's expected paths to
-that persistent root:
-```text
-ComfyUI/models/voxcpm/VoxCPM2
-  -> /data/voicegate_models/voxcpm/VoxCPM2
 ComfyUI/models/diffusion_models/MelBandRoFormer_comfy
   -> /data/voicegate_models/diffusion_models/MelBandRoFormer_comfy
 ComfyUI/models/Qwen3-ASR
   -> /data/voicegate_models/Qwen3-ASR
-```
-Override the root with:
-```text
-VOICEGATE_MODEL_ROOT
-```
-On 2026-06-05, the first two explicit ComfyUI-path models were downloaded to
-persistent storage:
-```text
 /data/voicegate_models/voxcpm/VoxCPM2/model.safetensors
 /data/voicegate_models/voxcpm/VoxCPM2/audiovae.pth
 /data/voicegate_models/diffusion_models/MelBandRoFormer_comfy/MelBandRoformer_fp32.safetensors
 /data/voicegate_models/Qwen3-ASR/Qwen3-ASR-1.7B
 /data/voicegate_models/Qwen3-ASR/Qwen3-ForcedAligner-0.6B
-```
-Verified symlinks:
-```text
-/home/user/app/ComfyUI/models/voxcpm/VoxCPM2
-  -> /data/voicegate_models/voxcpm/VoxCPM2
 /home/user/app/ComfyUI/models/diffusion_models/MelBandRoFormer_comfy
   -> /data/voicegate_models/diffusion_models/MelBandRoFormer_comfy
 /home/user/app/ComfyUI/models/Qwen3-ASR
   -> /data/voicegate_models/Qwen3-ASR
-```
-`DEEPSEEK_API_KEY` was also verified as present in the Space environment without
-printing its value.
-Model download pitfall:
-- `huggingface-cli download` is deprecated and failed in the Space.
-- `hf download` also failed because of a CLI dependency compatibility issue.
-- `scripts/bootstrap_comfy.py` now uses the `huggingface_hub` Python API
-  directly for model downloads.
 ## Current Known Good Commits
 - `683b147` Add ComfyUI runtime bootstrap scripts
@@ -905,3 +905,23 @@ Next recommended steps:
 2. Polish the first Gradio user interface and validate the automatic model
    preparation path after Space rebuilds/hardware changes.
 3. Reduce runtime dependency installation and model reload overhead.

+# VoiceGate HF Space Work Log
+This document records the effective work completed while preparing the
+`build-small-hackathon/VoiceGate` Hugging Face Space, plus the pitfalls found
+and how they were resolved.
+## Current Snapshot
+- Space: `https://huggingface.co/spaces/build-small-hackathon/VoiceGate`
+- Space git remote: `https://huggingface.co/spaces/build-small-hackathon/VoiceGate`
+- Runtime hardware: ZeroGPU / `zero-a10g`
+- Space SDK: Gradio
+- Local Space wrapper repo: `VoiceGate-hf`
+- Local upstream reference checkout: `VoiceGate/`
 - Latest confirmed normal runtime commit: `316b35db739d74d05543d6c8c9dd9c16e0580b17`
+- Current expected Space secret: `DEEPSEEK_API_KEY`
+- Default persistent model root: `/data/voicegate_models`
+Do not commit API keys, model weights, uploaded media, generated outputs, or the
+local `VoiceGate/` upstream checkout.
+## Executive Summary
+The Space is no longer just a blank scaffold. It can now run Gradio, invoke
+ZeroGPU, prepare a ComfyUI runtime, start ComfyUI from a GPU-backed Gradio
+function, and submit several segmented ComfyUI workflows.
+Confirmed working:
+- Hugging Face Space git push and normal rebuild flow.
+- Dev Mode SSH for CPU/container diagnostics.
+- ZeroGPU invocation from Gradio through `@spaces.GPU`.
+- ComfyUI startup from inside a `@spaces.GPU` function.
+- ComfyUI API calls from the Gradio process.
+- DeepSeek-compatible LLM node with the Space secret.
 - MelBand RoFormer smoke tests in CPU mode and ZeroGPU mode.
 - VoxCPM2 TTS-only smoke test in ZeroGPU mode.
 - VoiceBridge ASR-only smoke test in ZeroGPU mode.
 - SRT split -> VoxCPM -> SRT merge.
 - Full short-audio VoiceGate workflow.
 - Final user-facing Gradio upload/download UI.
+## Repository Setup Completed
+- Created and pushed the Space wrapper repository.
+- Kept `VoiceGate/` as a local-only upstream reference and ignored it in git.
+- Preserved Hugging Face LFS rules.
+- Copied deployment workflows:
+  - `workflows/voicegate_api.json`
+  - `workflows/voicegate_ui.json`
+- Confirmed the API workflow JSON is valid.
+- Confirmed workflow files contain no committed API key.
+## Dependency Inventory Completed
+Required workflow node providers were identified and pinned:
+- ComfyUI core:
+  `comfyanonymous/ComfyUI`
+- VoiceBridge:
+  `YanTianlong-01/comfyui_voicebridge`
+- RunningHub VoxCPM:
+  `RH-RunningHub/ComfyUI_RH_VoxCPM`
+- MelBand RoFormer:
+  `kijai/ComfyUI-MelBandRoFormer`
+- RunningHub LLM API:
+  `HM-RunningHub/ComfyUI_RH_LLM_API`
+- rgthree:
+  `rgthree/rgthree-comfy`
+- Easy Use:
+  `yolain/ComfyUI-Easy-Use`
+- Comfyroll:
+  `Suzie1/ComfyUI_Comfyroll_CustomNodes`
+- MW AudioTools:
+  `billwuhao/ComfyUI_AudioTools`
+Important node source confirmations:
+- `ReplaceText` is provided by ComfyUI core extra nodes.
+- `MergeAudioMW` is provided by `ComfyUI_AudioTools`.
+- `RH_LLMAPI_NODE` is provided by `ComfyUI_RH_LLM_API`.
+## Runtime Bootstrap Added
+The following scripts were added:
+- `scripts/bootstrap_comfy.py`
+  - Clones ComfyUI.
+  - Checks out pinned commits.
+  - Clones required custom node repositories.
+  - Installs ComfyUI and custom node Python requirements.
+  - Prepares expected model directories.
+  - Optionally downloads large model assets with `--with-models`.
+- `scripts/run_comfy.py`
+  - Starts ComfyUI.
+  - Waits for `/system_stats`.
+  - Supports `--cpu` for SSH diagnostics.
+- `scripts/workflow_client.py`
+  - Loads `workflows/voicegate_api.json`.
+  - Uploads audio through the ComfyUI API.
+  - Patches workflow inputs.
+  - Submits `/prompt`.
+  - Waits for `/history/{prompt_id}`.
+Workflow patching currently covers:
+- Node `16`: uploaded audio filename.
+- Node `105`: `DEEPSEEK_API_KEY`.
+- Node `105`: API base URL.
+- Node `105`: LLM model name.
+- Node `110`: target language.
+- Node `180`: job-specific audio output prefix.
+- Node `214`: job-specific SRT output prefix.
+## Hugging Face Space Runtime Findings
+### Dev Mode and SSH
+SSH target:
+```text
+build-small-hackathon-voicegate@ssh.hf.space
+```
+Local private key:
+```text
+C:\Users\yantianlong\.ssh\codex_space_voicegate
+```
+SSH is only available while the Space is in Dev Mode. Normal running Spaces do
+not accept SSH and return:
+```text
+Bad request: SSH in only allowed in Dev mode
+```
+Dev Mode can be toggled through the Hugging Face API endpoint:
+```text
+POST /api/spaces/build-small-hackathon/VoiceGate/dev-mode
+```
+Use Dev Mode for diagnostics only. Persistent fixes must be committed locally
+and pushed.
+### Dev Mode Stale Commit Pitfall
+The running container initially stayed on the original template commit:
+```text
+a94117f35a42cb17f654ae70cbe619c15345d057
+```
+even after newer commits were pushed. `restart_space` alone did not move it to
+the latest repository state while Dev Mode was enabled.
+Fix:
+- Disable Dev Mode.
+- Use `factory_reboot=True` or push a new commit to trigger a normal rebuild.
+- Confirm runtime metadata reports the latest commit.
+### ZeroGPU Startup Requirement
+When Dev Mode was disabled, the Space entered `RUNTIME_ERROR` with:
+```text
+No @spaces.GPU function detected during startup
+```
+Fix:
+- Import `spaces`.
+- Add at least one `@spaces.GPU(duration=...)` function in `app.py`.
+Current placeholder fix:
+```python
+@spaces.GPU(duration=30)
+def placeholder():
+    ...
+```
+Later this placeholder was replaced by real diagnostic functions:
+```python
+@spaces.GPU(duration=60)
+def gpu_smoke_test():
+    ...
+@spaces.GPU(duration=900)
+def comfy_runtime_test():
+    ...
+```
+### SSH Does Not Expose ZeroGPU CUDA
+Starting ComfyUI normally through SSH failed with:
+```text
+RuntimeError: No CUDA GPUs are available
+```
+Conclusion:
+- SSH is useful for CPU-mode diagnostics.
+- Real GPU work must run from the Gradio process inside a `@spaces.GPU`
+  function.
+CPU diagnostic command:
+```bash
+python scripts/run_comfy.py --cpu
+```
+### Gradio Request Timeout During Bootstrap
+Long bootstrap work should not run synchronously inside a Gradio request. The
+first attempt did this:
+```text
+Gradio click -> bootstrap_comfy.py -> clone repos -> pip install -> start ComfyUI
+```
+The request was interrupted by Gradio/ZeroGPU's outer queue after roughly 2.5
+minutes and returned:
+```text
+event: error
+data: {"error": null}
+```
+Fix:
+- Add a non-GPU `Prepare` action that starts `scripts/bootstrap_comfy.py` as a
+  background process.
+- Add `Prepare Status` to poll `/tmp/voicegate_bootstrap.log`.
+- Keep GPU actions focused on starting ComfyUI and running actual CUDA work.
+This avoids wasting ZeroGPU time on clone/install steps and prevents the request
+from being killed before diagnostics can return useful logs.
 ### Runtime Pip Install Pitfall
+The background bootstrap installed a large dependency set and upgraded the
+on-disk Torch package. The already-running Gradio process continued to report:
+```text
+torch=2.11.0+cu130
+```
+while the ComfyUI subprocess started afterwards reported:
+```text
+pytorch_version=2.12.0+cu130
+```
 This is workable for diagnostics, but final production should avoid heavy
 runtime `pip install` where possible. Prefer moving stable dependencies into
 Space build-time requirements or explicitly controlling pins.
 For future tests, keep diagnostic durations conservative and increase only when
 the workflow has already proven it needs more time.
+## Dependency Pitfalls and Fixes
+`ComfyUI_AudioTools` initially failed to import.
+First failure:
+```text
+SoX could not be found
+ModuleNotFoundError: No module named 'sounddevice'
+```
+Second failure after adding `sounddevice`:
+```text
+OSError: PortAudio library not found
+```
+Third failure:
+```text
+ModuleNotFoundError: No module named 'easydict'
+```
+Fourth failure:
+```text
+ModuleNotFoundError: No module named 'pytorch_lightning'
+```
+Fixes added:
+- `packages.txt`
+  - `sox`
+  - `libportaudio2`
+  - `portaudio19-dev`
+- `requirements.txt`
+  - `sounddevice`
+  - `easydict`
+  - `pytorch-lightning`
+Final verification:
+```text
+0.4 seconds: /home/user/app/ComfyUI/custom_nodes/ComfyUI_AudioTools
+```
+with no `IMPORT FAILED` entry.
+## ComfyUI API Smoke Test
+Test audio source:
+```text
+D:\voicebridge-test-audio\test_audio\2-坤哥.MP3
+```
+The first upload attempt used a plain PowerShell byte pipeline and corrupted the
+binary file. The remote file was identified as text instead of MP3, and
+`LoadAudio` failed with:
+```text
+Invalid data found when processing input: 'avcodec_send_packet()'
+```
+Fix:
+- Upload binary test media through a binary-safe method.
+- Verify remote `sha256sum` before using the file.
+Successful upload result:
+```text
+/tmp/voicegate_test_audio.mp3: Audio file with ID3 version 2.3.0
+```
+ComfyUI API endpoints verified in Dev Mode:
+- `/system_stats`
+- `/upload/image`
+- `/prompt`
+- `/history/{prompt_id}`
+Minimal test workflow:
+```text
+LoadAudio -> SaveAudioMP3
+```
+Successful `/history/{prompt_id}` result:
+```text
+status_str: success
+completed: true
+```
+Output reported by ComfyUI:
+```text
+audio/api_smoke_voicegate_00001.mp3
+```
+## Segmented Workflow Smoke Tests
+### ComfyUI From Gradio ZeroGPU
+On 2026-06-05, `app.py` was expanded with diagnostic Gradio actions:
+- `prepare_runtime`: starts `scripts/bootstrap_comfy.py` in the background and
+  writes progress to `/tmp/voicegate_bootstrap.log`.
+- `prepare_status`: reports the background bootstrap status and log tail.
+- `comfy_runtime_test`: runs inside `@spaces.GPU`, starts ComfyUI, and calls
+  `/system_stats`.
+- `melband_gpu_test`: runs a tiny MelBand workflow inside `@spaces.GPU`.
+- `voxcpm_tts_gpu_test`: runs a tiny VoxCPM2 TTS-only workflow inside
+  `@spaces.GPU`.
+The first attempt ran the full bootstrap synchronously inside a Gradio request
+and the request was interrupted by the outer queue with `event: error` and no
+function payload after roughly 2.5 minutes. The fix was to start bootstrap as a
+background process and poll a status endpoint.
+The background prepare completed successfully. It installed a large dependency
+set and upgraded the on-disk Torch package from `2.11.0` to `2.12.0`. The
+already-running Gradio process still reported its originally imported
+`torch=2.11.0+cu130`, while the newly started ComfyUI subprocess reported:
+```text
+pytorch_version=2.12.0+cu130
+```
+This is acceptable for the smoke test, but runtime pip installs are not ideal
+for the final app. A later pass should move heavy Python dependencies into the
+Space build/install phase or pin the root requirements more deliberately.
+`comfy_runtime_test` result:
+```text
+cuda_available=True
+comfy_ready=true
+comfy_elapsed_sec=16.0
+ComfyUI version=0.24.0
+device=cuda:0 NVIDIA RTX PRO 6000 Blackwell Server Edition MIG 2g.48gb
+vram_total=50868518912
+```
+Observed behavior: separate `@spaces.GPU` calls may run in separate worker
+processes, so the ComfyUI subprocess should not be assumed to persist across
+different button/API calls.
+### ZeroGPU Gradio Invocation
+On 2026-06-05, the Space was tested in normal runtime, with Dev Mode off, using
+a Gradio button backed by:
+```python
+@spaces.GPU(duration=60)
+def gpu_smoke_test():
+    ...
+```
+The private Space API was called with the local Hugging Face token through:
+```text
+POST /gradio_api/call/gpu_smoke_test
+GET /gradio_api/call/gpu_smoke_test/{event_id}
+```
+Result:
+```text
+torch=2.11.0+cu130
+cuda_available=True
+cuda_device_count=1
+device_name=NVIDIA RTX PRO 6000 Blackwell Server Edition MIG 2g.48gb
+total_memory_gb=47.38
+tensor_result=240.0
+memory_reserved_mb=2.00
+```
+This confirms ZeroGPU CUDA is available from the normal Gradio runtime when the
+work is executed inside a `@spaces.GPU` function. SSH still should be treated as
+CPU-only diagnostic access.
+### DeepSeek LLM Node
+On 2026-06-05, `RH_LLMAPI_NODE` was tested through ComfyUI in Dev Mode using
+the Space `DEEPSEEK_API_KEY` secret. The key was not printed.
+Minimal workflow:
+```text
+RH_LLMAPI_NODE -> easy showAnything
+```
+Prompt:
+```text
+Translate to Simplified Chinese: VoiceGate smoke test.
+```
+Result:
+```text
+status_str: success
+output: VoiceGate 冒烟测试。
+```
+This confirms the RunningHub LLM node can read the Space secret and call the
+DeepSeek-compatible API endpoint.
+### MelBand RoFormer
+On 2026-06-05, `MelBandRoFormerModelLoader` and `MelBandRoFormerSampler` were
+tested through ComfyUI in CPU mode.
+Input:
+```text
+1 second synthetic 440 Hz WAV generated with ffmpeg
+```
+Minimal workflow:
+```text
+LoadAudio -> MelBandRoFormerModelLoader -> MelBandRoFormerSampler
+  -> SaveAudioMP3(vocals)
+  -> SaveAudioMP3(instruments)
+```
+Result:
+```text
+status_str: success
+audio/melband_smoke_vocals_00001.mp3
+audio/melband_smoke_instruments_00001.mp3
+```
+CPU-mode runtime for the 1 second smoke input was about 51 seconds. Real runs
+should execute inside a `@spaces.GPU` function.
+Later on 2026-06-05, the same kind of tiny MelBand smoke test was run from the
+normal Gradio runtime inside `@spaces.GPU`.
+Input:
+```text
+1 second synthetic 440 Hz WAV written to ComfyUI/input
+```
+Result:
+```text
+status_str=success
+completed=True
+audio/melband_gpu_32459bea_instruments_00001.mp3
+audio/melband_gpu_32459bea_vocals_00001.mp3
+elapsed_sec=78.3
+```
+This confirms the MelBand custom node and model can execute from the Space
+ZeroGPU path.
 ### VoxCPM2 TTS-only
+On 2026-06-05, a minimal VoxCPM2 TTS-only workflow was run from the normal
+Gradio runtime inside `@spaces.GPU`.
+Minimal workflow:
+```text
+RunningHub_VoxCPM_LoadModel -> RunningHub_VoxCPM_Generate -> SaveAudioMP3
+```
+Prompt text:
+```text
+你好，VoiceGate GPU 语音合成测试。
+```
+Result:
+```text
+status_str=success
+completed=True
+audio/voxcpm_tts_gpu_cda209ec_00001.mp3
+elapsed_sec=766.2
+```
 This confirms VoxCPM2 fits and executes in ZeroGPU, but the first cold TTS-only
 run was very slow. The final app should minimize cold starts, avoid repeated
 ComfyUI/model reloads where possible, and use shorter diagnostic prompts while
 SRT generation can run in the Space ZeroGPU path. The smoke test intentionally
 used `attention=sdpa` instead of `flash_attention_2`; `flash_attention_2`
 availability remains unverified.
+## Secrets and API Keys
+`DEEPSEEK_API_KEY` should be stored only as a Hugging Face Space Secret.
+Current expected secret:
+```text
+DEEPSEEK_API_KEY
+```
+Optional variables:
+```text
+DEEPSEEK_BASE_URL=https://api.deepseek.com
+DEEPSEEK_MODEL=deepseek-v4-flash
+```
+Never store these values in:
+- `app.py`
+- workflow JSON files
+- README files
+- docs
+- `.env` files committed to git
+`scripts/workflow_client.py` reads these from environment variables.
+`scripts/check_space_env.py` verifies whether these environment variables are
+present without printing their values.
+## Model Storage
+Large model files should live on the Space persistent storage volume instead of
+inside `/home/user/app`, because `/home/user/app` can be replaced during Space
+rebuilds.
+Default model root:
+```text
+/data/voicegate_models
+```
+`scripts/bootstrap_comfy.py` creates symlinks from ComfyUI's expected paths to
+that persistent root:
+```text
+ComfyUI/models/voxcpm/VoxCPM2
+  -> /data/voicegate_models/voxcpm/VoxCPM2
 ComfyUI/models/diffusion_models/MelBandRoFormer_comfy
   -> /data/voicegate_models/diffusion_models/MelBandRoFormer_comfy
 ComfyUI/models/Qwen3-ASR
   -> /data/voicegate_models/Qwen3-ASR
+```
+Override the root with:
+```text
+VOICEGATE_MODEL_ROOT
+```
+On 2026-06-05, the first two explicit ComfyUI-path models were downloaded to
+persistent storage:
+```text
 /data/voicegate_models/voxcpm/VoxCPM2/model.safetensors
 /data/voicegate_models/voxcpm/VoxCPM2/audiovae.pth
 /data/voicegate_models/diffusion_models/MelBandRoFormer_comfy/MelBandRoformer_fp32.safetensors
 /data/voicegate_models/Qwen3-ASR/Qwen3-ASR-1.7B
 /data/voicegate_models/Qwen3-ASR/Qwen3-ForcedAligner-0.6B
+```
+Verified symlinks:
+```text
+/home/user/app/ComfyUI/models/voxcpm/VoxCPM2
+  -> /data/voicegate_models/voxcpm/VoxCPM2
 /home/user/app/ComfyUI/models/diffusion_models/MelBandRoFormer_comfy
   -> /data/voicegate_models/diffusion_models/MelBandRoFormer_comfy
 /home/user/app/ComfyUI/models/Qwen3-ASR
   -> /data/voicegate_models/Qwen3-ASR
+```
+`DEEPSEEK_API_KEY` was also verified as present in the Space environment without
+printing its value.
+Model download pitfall:
+- `huggingface-cli download` is deprecated and failed in the Space.
+- `hf download` also failed because of a CLI dependency compatibility issue.
+- `scripts/bootstrap_comfy.py` now uses the `huggingface_hub` Python API
+  directly for model downloads.
 ## Current Known Good Commits
 - `683b147` Add ComfyUI runtime bootstrap scripts
 2. Polish the first Gradio user interface and validate the automatic model
    preparation path after Space rebuilds/hardware changes.
 3. Reduce runtime dependency installation and model reload overhead.
+## 2026-06-22: ZeroGPU MelBand SIGBUS recovery
+- Symptom: the user workflow returned
+  `WebSocketConnectionClosedException: Connection to remote host was lost`.
+- Root cause: the ComfyUI child process terminated with `Fatal Python error:
+  Bus error` while `comfy.utils.load_safetensors` memory-mapped
+  `MelBandRoformer_fp32.safetensors` from persistent `/data` storage.
+- The WebSocket error was secondary; it happened because the ComfyUI process
+  had already crashed.
+- Added strict validation for the MelBand model:
+  - expected size: `912885656` bytes
+  - expected SHA-256:
+    `450caec8e8e261ff79426f17ccf16d43490ba4b790ff84d573083cf94e111258`
+- Invalid files are removed and force-downloaded again from
+  `Kijai/MelBandRoFormer_comfy`.
+- The bootstrap now patches the pinned MelBand custom node to load safetensors
+  from regular file bytes instead of mmap. This prevents a persistent-storage
+  mmap failure from terminating the Python interpreter with SIGBUS.
+- The Space runtime validates the model once per container before accepting a
+  full workflow request.

scripts/bootstrap_comfy.py CHANGED Viewed

@@ -8,6 +8,7 @@ explicitly requested.
 from __future__ import annotations
 import argparse
 import os
 import shutil
 import subprocess
@@ -20,6 +21,9 @@ ROOT = Path(__file__).resolve().parents[1]
 COMFY_DIR = ROOT / "ComfyUI"
 CUSTOM_NODES_DIR = COMFY_DIR / "custom_nodes"
 DEFAULT_PERSISTENT_MODEL_ROOT = Path("/data/voicegate_models")
 @dataclass(frozen=True)
@@ -184,6 +188,70 @@ def prepare_model_dirs(dry_run: bool = False) -> None:
         ensure_model_link(name, dry_run=dry_run)
 def download_models(dry_run: bool = False) -> None:
     """Download large model assets.
@@ -213,12 +281,22 @@ def download_models(dry_run: bool = False) -> None:
         local_dir=model_target("voxcpm2"),
         token=token,
     )
     hf_hub_download(
         repo_id="Kijai/MelBandRoFormer_comfy",
-        filename="MelBandRoformer_fp32.safetensors",
         local_dir=model_target("melband"),
         token=token,
     )
     snapshot_download(
         repo_id="Qwen/Qwen3-ASR-1.7B",
         local_dir=model_target("qwen3_asr") / "Qwen3-ASR-1.7B",
@@ -262,6 +340,7 @@ def main() -> None:
         CUSTOM_NODES_DIR.mkdir(parents=True, exist_ok=True)
     for repo in CUSTOM_NODE_REPOS:
         ensure_git_repo(repo, dry_run=args.dry_run)
     if not args.skip_pip:
         install_requirements(COMFYUI, dry_run=args.dry_run)

 from __future__ import annotations
 import argparse
+import hashlib
 import os
 import shutil
 import subprocess
 COMFY_DIR = ROOT / "ComfyUI"
 CUSTOM_NODES_DIR = COMFY_DIR / "custom_nodes"
 DEFAULT_PERSISTENT_MODEL_ROOT = Path("/data/voicegate_models")
+MELBAND_FILENAME = "MelBandRoformer_fp32.safetensors"
+MELBAND_SIZE = 912_885_656
+MELBAND_SHA256 = "450caec8e8e261ff79426f17ccf16d43490ba4b790ff84d573083cf94e111258"
 @dataclass(frozen=True)
         ensure_model_link(name, dry_run=dry_run)
+def file_sha256(path: Path) -> str:
+    digest = hashlib.sha256()
+    with path.open("rb") as file:
+        for chunk in iter(lambda: file.read(8 * 1024 * 1024), b""):
+            digest.update(chunk)
+    return digest.hexdigest()
+def melband_model_path() -> Path:
+    return model_target("melband") / MELBAND_FILENAME
+def validate_melband_model(*, verify_hash: bool = True) -> tuple[bool, str]:
+    path = melband_model_path()
+    if not path.is_file():
+        return False, "missing"
+    size = path.stat().st_size
+    if size != MELBAND_SIZE:
+        return False, f"size_mismatch expected={MELBAND_SIZE} actual={size}"
+    if verify_hash:
+        try:
+            digest = file_sha256(path)
+        except OSError as exc:
+            return False, f"read_error {type(exc).__name__}: {exc}"
+        if digest != MELBAND_SHA256:
+            return False, f"sha256_mismatch expected={MELBAND_SHA256} actual={digest}"
+    return True, "ok"
+def patch_melband_loader(dry_run: bool = False) -> None:
+    """Avoid safetensors mmap on persistent Space storage.
+    ComfyUI's generic loader uses safetensors.safe_open(), which memory maps the
+    model file. A damaged file or an unstable mmap on /data can terminate the
+    interpreter with SIGBUS before Python can report a normal exception.
+    Loading from bytes uses regular reads and turns corruption into a catchable
+    safetensors error instead.
+    """
+    nodes_path = CUSTOM_NODES_DIR / "ComfyUI-MelBandRoFormer" / "nodes.py"
+    print(f"+ patch non-mmap MelBand loader: {nodes_path}", flush=True)
+    if dry_run:
+        return
+    if not nodes_path.is_file():
+        raise RuntimeError(f"MelBand node file is missing: {nodes_path}")
+    text = nodes_path.read_text(encoding="utf-8")
+    if "load_safetensors_bytes" not in text:
+        text = text.replace(
+            "import torchaudio.functional as TAF\n",
+            "import torchaudio.functional as TAF\n"
+            "from safetensors.torch import load as load_safetensors_bytes\n",
+        )
+        text = text.replace(
+            "model.load_state_dict(load_torch_file(model_path), strict=True)",
+            "with open(model_path, \"rb\") as model_file:\n"
+            "            state_dict = load_safetensors_bytes(model_file.read())\n"
+            "        model.load_state_dict(state_dict, strict=True)",
+        )
+    if "load_safetensors_bytes" not in text or "state_dict = load_safetensors_bytes" not in text:
+        raise RuntimeError("Could not apply the non-mmap MelBand loader patch")
+    nodes_path.write_text(text, encoding="utf-8")
 def download_models(dry_run: bool = False) -> None:
     """Download large model assets.
         local_dir=model_target("voxcpm2"),
         token=token,
     )
+    melband_valid, melband_reason = validate_melband_model(verify_hash=True)
+    print(f"+ validate MelBand model: {melband_reason}", flush=True)
+    if not melband_valid and melband_model_path().exists():
+        print(f"+ remove invalid MelBand model: {melband_model_path()}", flush=True)
+        melband_model_path().unlink()
     hf_hub_download(
         repo_id="Kijai/MelBandRoFormer_comfy",
+        filename=MELBAND_FILENAME,
         local_dir=model_target("melband"),
         token=token,
+        force_download=not melband_valid,
     )
+    melband_valid, melband_reason = validate_melband_model(verify_hash=True)
+    print(f"+ verify downloaded MelBand model: {melband_reason}", flush=True)
+    if not melband_valid:
+        raise RuntimeError(f"MelBand model validation failed: {melband_reason}")
     snapshot_download(
         repo_id="Qwen/Qwen3-ASR-1.7B",
         local_dir=model_target("qwen3_asr") / "Qwen3-ASR-1.7B",
         CUSTOM_NODES_DIR.mkdir(parents=True, exist_ok=True)
     for repo in CUSTOM_NODE_REPOS:
         ensure_git_repo(repo, dry_run=args.dry_run)
+    patch_melband_loader(dry_run=args.dry_run)
     if not args.skip_pip:
         install_requirements(COMFYUI, dry_run=args.dry_run)