BoxOfColors commited on
Commit
c97fd8e
·
1 Parent(s): dc0df75

fix: derive total_dur_s from ffprobe not CAVP frame count to prevent audio truncation

Browse files

CAVP feature extractor can drop the last partial window, causing total_dur_s
to be shorter than the actual video (e.g. 10s instead of 25s). This caused
generate_taro() to build too few segments and produce audio that cuts off early.
Fix: use get_video_duration(video_file) as the canonical source for total_dur_s.

Files changed (1) hide show
  1. app.py +3 -1
app.py CHANGED
@@ -356,7 +356,9 @@ def generate_taro(video_file, seed_val, cfg_scale, num_steps, mode,
356
  strip_audio_from_video(video_file, silent_video)
357
 
358
  cavp_feats = extract_cavp(silent_video, tmp_path=tmp_dir)
359
- total_dur_s = cavp_feats.shape[0] / TARO_FPS
 
 
360
  segments = _build_segments(total_dur_s, TARO_MODEL_DUR, crossfade_s)
361
 
362
  outputs = []
 
356
  strip_audio_from_video(video_file, silent_video)
357
 
358
  cavp_feats = extract_cavp(silent_video, tmp_path=tmp_dir)
359
+ # Use actual video duration from ffprobe — CAVP frame count can under-count
360
+ # if the extractor drops the last partial window, leading to truncated audio.
361
+ total_dur_s = get_video_duration(video_file)
362
  segments = _build_segments(total_dur_s, TARO_MODEL_DUR, crossfade_s)
363
 
364
  outputs = []