Spaces:

JackIsNotInTheBox
/

Generate_Audio_for_Video

Running on Zero

BoxOfColors Claude Opus 4.6 commited on 2 days ago

Commit

64f71ea

1 Parent(s): 26e8bca

Fix HunyuanFoley CLAP tokenizer overflow for long prompts

CLAP text encoder has a 512-token max but encode_text_feat() called
the tokenizer without truncation=True/max_length, causing a tensor
shape mismatch when prompts exceed 512 tokens (523 > 512 error).

Added truncation=True, max_length=512 to the tokenizer call.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (1) hide show

HunyuanVideo-Foley/hunyuanvideo_foley/utils/feature_utils.py +1 -1

HunyuanVideo-Foley/hunyuanvideo_foley/utils/feature_utils.py CHANGED Viewed

@@ -129,7 +129,7 @@ def encode_video_features(video_path, model_dict):
 @torch.inference_mode()
 def encode_text_feat(text: List[str], model_dict):
     # x: (B, L)
-    inputs = model_dict.clap_tokenizer(text, padding=True, return_tensors="pt").to(model_dict.device)
     outputs = model_dict.clap_model(**inputs, output_hidden_states=True, return_dict=True)
     return outputs.last_hidden_state, outputs.attentions

 @torch.inference_mode()
 def encode_text_feat(text: List[str], model_dict):
     # x: (B, L)
+    inputs = model_dict.clap_tokenizer(text, padding=True, truncation=True, max_length=512, return_tensors="pt").to(model_dict.device)
     outputs = model_dict.clap_model(**inputs, output_hidden_states=True, return_dict=True)
     return outputs.last_hidden_state, outputs.attentions