Spaces:

JackIsNotInTheBox
/

Generate_Audio_for_Video

Running on Zero

BoxOfColors Claude Opus 4.6 commited on 3 days ago

Commit

813c771

1 Parent(s): fe18eeb

Fix xregen_mmaudio abort: raise duration floor and load overhead

MMAudio regen was aborting (GPU task aborted, duration=10) because
the ZeroGPU allocation of 30s was too tight when open_clip model
(3.95GB) needed to download on cold start inside the GPU window.

- MMAUDIO_LOAD_OVERHEAD: 15 → 30s (covers cold-start download)
- Regen duration floor: 30 → 60s (matches initial gen minimum,
prevents abort when any model needs a cold-start download)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (1) hide show

app.py +2 -2

app.py CHANGED Viewed

@@ -345,7 +345,7 @@ TARO_SECS_PER_STEP = 0.05  # measured 0.043s/step on H200 (8.2s video, 2 segs ×
 TARO_LOAD_OVERHEAD     = 15    # seconds: model load + CAVP feature extraction
 MMAUDIO_WINDOW         = 8.0   # seconds — MMAudio's fixed generation window
 MMAUDIO_SECS_PER_STEP  = 0.25  # measured 0.230s/step on H200 (8.3s video, 2 segs × 25 steps = 11.5s wall)
-MMAUDIO_LOAD_OVERHEAD  = 15
 HUNYUAN_MAX_DUR        = 15.0  # seconds — HunyuanFoley max video duration
 HUNYUAN_SECS_PER_STEP  = 0.35  # measured 0.328s/step on H200 (8.3s video, 1 seg × 50 steps = 16.4s wall)
 HUNYUAN_LOAD_OVERHEAD  = 55    # ~55s to load the 10GB XXL model weights into GPU
@@ -414,7 +414,7 @@ def _estimate_regen_duration(model_key: str, num_steps: int) -> int:
     one segment — saves 30s of wasted ZeroGPU quota per regen call."""
     cfg  = MODEL_CONFIGS[model_key]
     secs = int(num_steps) * cfg["secs_per_step"] + cfg["load_overhead"]
-    result = min(GPU_DURATION_CAP, max(30, int(secs)))
     print(f"[duration] {cfg['label']} regen: 1 seg × {int(num_steps)} steps → {secs:.0f}s → capped {result}s")
     return result

 TARO_LOAD_OVERHEAD     = 15    # seconds: model load + CAVP feature extraction
 MMAUDIO_WINDOW         = 8.0   # seconds — MMAudio's fixed generation window
 MMAUDIO_SECS_PER_STEP  = 0.25  # measured 0.230s/step on H200 (8.3s video, 2 segs × 25 steps = 11.5s wall)
+MMAUDIO_LOAD_OVERHEAD  = 30    # 15s warm + up to 30s cold-start model download
 HUNYUAN_MAX_DUR        = 15.0  # seconds — HunyuanFoley max video duration
 HUNYUAN_SECS_PER_STEP  = 0.35  # measured 0.328s/step on H200 (8.3s video, 1 seg × 50 steps = 16.4s wall)
 HUNYUAN_LOAD_OVERHEAD  = 55    # ~55s to load the 10GB XXL model weights into GPU
     one segment — saves 30s of wasted ZeroGPU quota per regen call."""
     cfg  = MODEL_CONFIGS[model_key]
     secs = int(num_steps) * cfg["secs_per_step"] + cfg["load_overhead"]
+    result = min(GPU_DURATION_CAP, max(60, int(secs)))
     print(f"[duration] {cfg['label']} regen: 1 seg × {int(num_steps)} steps → {secs:.0f}s → capped {result}s")
     return result