Spaces:

JackIsNotInTheBox
/

Generate_Audio_for_Video

Running on Zero

BoxOfColors commited on 4 days ago

Commit

c97fd8e

1 Parent(s): dc0df75

fix: derive total_dur_s from ffprobe not CAVP frame count to prevent audio truncation

CAVP feature extractor can drop the last partial window, causing total_dur_s
to be shorter than the actual video (e.g. 10s instead of 25s). This caused
generate_taro() to build too few segments and produce audio that cuts off early.
Fix: use get_video_duration(video_file) as the canonical source for total_dur_s.

Files changed (1) hide show

app.py +3 -1

app.py CHANGED Viewed

@@ -356,7 +356,9 @@ def generate_taro(video_file, seed_val, cfg_scale, num_steps, mode,
     strip_audio_from_video(video_file, silent_video)
     cavp_feats  = extract_cavp(silent_video, tmp_path=tmp_dir)
-    total_dur_s = cavp_feats.shape[0] / TARO_FPS
     segments    = _build_segments(total_dur_s, TARO_MODEL_DUR, crossfade_s)
     outputs = []

     strip_audio_from_video(video_file, silent_video)
     cavp_feats  = extract_cavp(silent_video, tmp_path=tmp_dir)
+    # Use actual video duration from ffprobe — CAVP frame count can under-count
+    # if the extractor drops the last partial window, leading to truncated audio.
+    total_dur_s = get_video_duration(video_file)
     segments    = _build_segments(total_dur_s, TARO_MODEL_DUR, crossfade_s)
     outputs = []