Spaces:

JackIsNotInTheBox
/

Generate_Audio_for_Video

Running on Zero

BoxOfColors Claude Sonnet 4.6 commited on about 7 hours ago

Commit

cc23c05

1 Parent(s): e3d955b

Fix xregen ZeroGPU TypeError: duration fns must accept all positional args

When xregen_mmaudio/xregen_hunyuan call _mmaudio_gpu_infer / _hunyuan_gpu_infer
with clip_start_s and clip_dur_s as positional args (positions 12/13 and 15/16
respectively), ZeroGPU forwards all args to the duration fn before running the
GPU fn. The duration fns only had 9/10 explicit params + **_kwargs — but **_kwargs
only absorbs *keyword* args, not extra positionals. This caused:

TypeError: _mmaudio_duration() takes from 9 to 10 positional arguments but 13 given

which aborted the GPU task before inference even started, causing the Video slot
to show the red Gradio 'Error' component.

Fix: add silent_video, segments_json, clip_start_s, clip_dur_s (and total_dur_s
for hunyuan) to the duration fn signatures with safe defaults. These params are
ignored in the duration calculation — they only exist to match the GPU fn's full
positional signature so ZeroGPU's internal dispatch doesn't blow up.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (1) hide show

app.py +12 -7

app.py CHANGED Viewed

@@ -1004,10 +1004,12 @@ def generate_taro(video_file, seed_val, cfg_scale, num_steps, mode,
 def _mmaudio_duration(video_file, prompt, negative_prompt, seed_val,
                       cfg_strength, num_steps, crossfade_s, crossfade_db, num_samples,
-                      **_kwargs):
-    """Pre-GPU callable — must match _mmaudio_gpu_infer's input order exactly.
-    Extra kwargs (silent_video, segments_json) are absorbed by **_kwargs so the
-    duration fn signature stays in sync with the 9-input Gradio registration."""
     return _estimate_gpu_duration("mmaudio", int(num_samples), int(num_steps),
                                   video_file=video_file, crossfade_s=crossfade_s)
@@ -1167,9 +1169,12 @@ def generate_mmaudio(video_file, prompt, negative_prompt, seed_val,
 def _hunyuan_duration(video_file, prompt, negative_prompt, seed_val,
                       guidance_scale, num_steps, model_size, crossfade_s, crossfade_db,
-                      num_samples, **_kwargs):
-    """Pre-GPU callable — must match _hunyuan_gpu_infer's input order exactly.
-    Extra kwargs (silent_video, segments_json, total_dur_s) absorbed by **_kwargs."""
     return _estimate_gpu_duration("hunyuan", int(num_samples), int(num_steps),
                                   video_file=video_file, crossfade_s=crossfade_s)

 def _mmaudio_duration(video_file, prompt, negative_prompt, seed_val,
                       cfg_strength, num_steps, crossfade_s, crossfade_db, num_samples,
+                      silent_video=None, segments_json=None,
+                      clip_start_s=0.0, clip_dur_s=None, **_kwargs):
+    """Pre-GPU callable — must match _mmaudio_gpu_infer's input signature exactly.
+    silent_video, segments_json, clip_start_s, clip_dur_s are extra positional args
+    that xregen passes; they must appear here so ZeroGPU doesn't raise TypeError
+    when forwarding all args to this duration fn before the GPU fn runs."""
     return _estimate_gpu_duration("mmaudio", int(num_samples), int(num_steps),
                                   video_file=video_file, crossfade_s=crossfade_s)
 def _hunyuan_duration(video_file, prompt, negative_prompt, seed_val,
                       guidance_scale, num_steps, model_size, crossfade_s, crossfade_db,
+                      num_samples, silent_video=None, segments_json=None, total_dur_s=None,
+                      clip_start_s=0.0, clip_dur_s=None, **_kwargs):
+    """Pre-GPU callable — must match _hunyuan_gpu_infer's input signature exactly.
+    silent_video, segments_json, total_dur_s, clip_start_s, clip_dur_s are extra
+    positional args that xregen passes; they must appear here so ZeroGPU doesn't
+    raise TypeError when forwarding all args to this duration fn."""
     return _estimate_gpu_duration("hunyuan", int(num_samples), int(num_steps),
                                   video_file=video_file, crossfade_s=crossfade_s)