Long video with prompt per section using ltx native audio instead of custom audio?

#43

by ColtonBrown1 - opened Mar 19

Mar 19

I really like the Long video workflow with the prompt per section, but I am struggling to make it work with ltx in-built audio, I've tried the switch in your workflow but it gives an error, I've tried hardwiring everything and completely removing the custom audio section but I guess this is above my current level in comfyui. How can I achieve this? Currently I've managed to get the audio working for the first video but it goes silent immediately upon reaching the 2nd generation.

RuneXX

Owner Mar 19

Already started on something like that.. With prompt "travel" for each loop or group.
As well as optional image input for scene change.

Will see if i can finishing soon and share ;-)

ColtonBrown1

Mar 19

Awesome, really love your workflows, can't wait!

RuneXX

Owner Mar 20

•

edited Mar 20

So while doing this workflow again, i remembered why I didnt complete it easily last time
The challenge is that each section is an independent video. In that it uses the reference start image and last frames of previous video, to continue the video ...

But if there is no voice in the last 1-2 seconds of the video part before (used as reference), the next section will have a completely different voice.. no consistency.
So thats why custom audio was the easier choice as first variant (can be solved by setting the reference longer, to say 50% of the previous video or even more, but it will be a slow long wait to complete ..)
And there are work other around such as using latent audio part with voice as start of each group and then chop that off at the end, its a little complicated to get the timing right.

But by pure luck a new lora is coming out, that solves this issue: https://huggingface.co/AviadDahan/LTX-2.3-ID-LoRA-TalkVid-3K ( https://id-lora.github.io/ )
This lora uses a short reference audio clip much the same way you use image input, and can be used over and over at all the extended sections.

This lora will make it work easily, with consistent voice ;-) Its not yet added to comfy though, but its on the roadmap

(i'll try finish a workflow regardless though, it can be nice for all sorts of other long shots, based on image or text input. And add support for that lora as soon as its supported fully. And it will still have audio consistency, but might lose voice consistency sometimes)

ColtonBrown1

Mar 20

I appreciate you trying to work on it. The voice consistency is not a problem because I train custom loras with voice for my characters so their voice is always the same across different videos anyways. Again, look forward to whatever you come up with and that new lora sounds awesome, it will save having to train a character lora for every new character.

jprins

Mar 21

how about this one it's for ComfyUI : https://github.com/ID-LoRA/ID-LoRA-LTX2.3-ComfyUI

RuneXX

Owner Mar 21

how about this one it's for ComfyUI : https://github.com/ID-LoRA/ID-LoRA-LTX2.3-ComfyUI

yes thats the one that looks very promising. So far its no support for it in Comfyui native mode. The above is a wrapper. So it can not be integrated into regular comfyUI workflows, but works more as a "stand alone" on its own (it has it own sampler, its own model loader etc etc). But hopefully a more native like integration of that lora comes soon

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment