Workflow - I2V & T2V with Custom Audio
I2V & T2V - With Custom Audio
https://huggingface.co/RuneXX/LTX-2.3-Workflows/tree/main
Use your own audio files with lip sync, and synced motion
(was quite surprised how well the drummer syncs to the drum input solo .. ;-))
Thanks for quick support
I need your help. I just tested the workflow with custom audio, but the lip-sync isn’t matching. Am I missing any settings?
hmm thats really strange.
Connect the LTX-audio output to the final video combine node (instead of the "original audio" get node)
See if the workflow actually gets your audio as input?
You can also try connect the audio directly to the LTXV audio embed, in case you had something wrong with MelBand (the nodes below the audio input that extracts vocals only)
(will double check if i made an error when i saved the workflow for upload)
Thanks for quick support again, will try
Hmm i got same result as you, will try figure it out. Must be the prompt perhaps, that is not super clear for the model who is talking, and if it should be a voice over narrator.
(or there is some error in the node connections.. checking)
I had this challenge one time with LTX-2.0 as well.
Its the input image (or the combo of a "narrator style" voice and the image). Why, i dont know. But curiously the image I had issues with before, also was a bit "pale blue tone colors".
(or it could just be the first frame where they are a bit "too far away" and the model gets confused..)
Will do some testing.
But it for sure works fine with other image
Thanks for your input, you are correct LTX need clear in image & prompt who speaking
Thanks for your input, you are correct LTX need clear in image & prompt who speaking
Yes on rare occasions it struggle to make the characters talk. Instead its a narrator speaking over the video.
Why this happen, not sure. Since it happen so rarely its not easy to see any specific pattern.
Speculating it could either be the input voice sounds like a narrator (the voice being "center" and too clear, not alike dialogs in a movie), or its the image/prompt, that confused the model as to who should talk and when. But thats speculations ;-)
The model is likely trained on all sorts of videos, where many of them do have narrators I bet. But the rare few times i did run into this, i did manage to make it work in the end, with multiple tweaks to the prompt to "force" it to work. But have only ran into this a couple of times, so i dont know if its easy or hard, or even sometimes "impossible" ;-)
I have found I have better luck if I prompt the video exactly as how I would If I was doing a T2V video, and just ignore that I'm providing the audio and image.
However I've also found that it's very much a roll of the dice with respect to the seed.
.Sometimes you'll land on a seed that nails exactly what you want... and if you run it 7 more times you'll just get crap.
Yes, I am really curious if there is one particular cause. Because when you first run into a "stubborn image" its almost impossible to "fix it".
I did see someone on youtube speculate it was due to particular sizes, but i cant really find any logic to that.
And yes, prompting as if it was T2V might be a good way.
