Workflow - I2V & T2V with Custom Audio

#4
by RuneXX - opened

I2V & T2V - With Custom Audio
https://huggingface.co/RuneXX/LTX-2.3-Workflows/tree/main

Use your own audio files with lip sync, and synced motion

(was quite surprised how well the drummer syncs to the drum input solo .. ;-))

RuneXX pinned discussion

Thanks for quick support

I need your help. I just tested the workflow with custom audio, but the lip-sync isn’t matching. Am I missing any settings?
Screenshot 2026-03-07 220053

Prompt - The woman who is standing beside the chair is the only person speaking in this scene. She holds a thin digital tablet and clearly moves her lips while talking, and her mouth movements match the spoken words. The woman sitting in the chair listens quietly and does not speak. The scene takes place inside a bright futuristic space agency selection chamber with white walls, soft overhead lighting, and polished floors. Several astronaut candidates wearing identical grey mission suits sit quietly in rows in the background. In the foreground, the seated woman sits upright in a modern chair with calm posture, looking up attentively at the standing officer. The standing woman makes small natural hand gestures while speaking. The camera holds a steady cinematic medium shot and slowly pushes closer toward the two characters with subtle natural body movement and realistic facial expressions. The standing officer clearly speaks the following sentence aloud: "Dr. Voss, your atmospheric modeling results were the highest we've recorded in twelve years of testing."

hmm thats really strange.

Connect the LTX-audio output to the final video combine node (instead of the "original audio" get node)
See if the workflow actually gets your audio as input?

You can also try connect the audio directly to the LTXV audio embed, in case you had something wrong with MelBand (the nodes below the audio input that extracts vocals only)

image

(will double check if i made an error when i saved the workflow for upload)

Thanks for quick support again, will try

Hmm i got same result as you, will try figure it out. Must be the prompt perhaps, that is not super clear for the model who is talking, and if it should be a voice over narrator.
(or there is some error in the node connections.. checking)

I had this challenge one time with LTX-2.0 as well.
Its the input image (or the combo of a "narrator style" voice and the image). Why, i dont know. But curiously the image I had issues with before, also was a bit "pale blue tone colors".
(or it could just be the first frame where they are a bit "too far away" and the model gets confused..)
Will do some testing.

But it for sure works fine with other image

RuneXX unpinned discussion

Thanks for your input, you are correct LTX need clear in image & prompt who speaking

Thanks for your input, you are correct LTX need clear in image & prompt who speaking

Yes on rare occasions it struggle to make the characters talk. Instead its a narrator speaking over the video.
Why this happen, not sure. Since it happen so rarely its not easy to see any specific pattern.

Speculating it could either be the input voice sounds like a narrator (the voice being "center" and too clear, not alike dialogs in a movie), or its the image/prompt, that confused the model as to who should talk and when. But thats speculations ;-)

The model is likely trained on all sorts of videos, where many of them do have narrators I bet. But the rare few times i did run into this, i did manage to make it work in the end, with multiple tweaks to the prompt to "force" it to work. But have only ran into this a couple of times, so i dont know if its easy or hard, or even sometimes "impossible" ;-)

I have found I have better luck if I prompt the video exactly as how I would If I was doing a T2V video, and just ignore that I'm providing the audio and image.
However I've also found that it's very much a roll of the dice with respect to the seed.
.Sometimes you'll land on a seed that nails exactly what you want... and if you run it 7 more times you'll just get crap.

Yes, I am really curious if there is one particular cause. Because when you first run into a "stubborn image" its almost impossible to "fix it".
I did see someone on youtube speculate it was due to particular sizes, but i cant really find any logic to that.

And yes, prompting as if it was T2V might be a good way.

Sign up or log in to comment