I2V custom audio + lora = auto lipsync
By adding a camera control lora to your I2V custom audio workflow, you can make it lipsync audio by just telling it to - "The man says the attached audio, lips perfectly in sync". Audio was from an IndexTTS2 workflow. Image generation is a little faster on my 5060ti 16GB, just 67sec for this 3sec clip. I have a feeling I'll be using your workflows going forwards. Thanks!
That looks really good ;-)
Yes it should be quite capable of doing "talking avatar" sort of results
the LTX-2 is a bit of a jack-of-all-trades, I'm still in a process of discovery ;-)
And quite interesting to add a TTS node in the workflow.
I might also try one ;-)
As well as doing on-the-fly "music videos" with something like https://github.com/HeartMuLa/heartlib or similar prompt-to-music nodes
I uploaded a modification that includes the lora's and calculates frames based off audio length and how many seconds you want to pre/post pad the audio. It's available here if you want to take any ideas for your own workflows - https://gist.github.com/forkineye/4d9ea730c5d9c5f086251e69c8b243b2. Handy for one-shotting stuff for my friends :) Here's a sample generated with it, thanks again!
nice one ;-) will definitively give it a run
RuneXX... you think there's a way to have an audio input of two voices and then an input image with a text prompt where LTX-2 is able to generate something beyond a talking avatar? That would be gamechanging...
RuneXX... you think there's a way to have an audio input of two voices and then an input image with a text prompt where LTX-2 is able to generate something beyond a talking avatar? That would be gamechanging...
Might be possible, will see if i can think of some way. Could be as easy as prompting for it and the model would understand, but perhaps we arent that lucky ;-)
will try see how it works
RuneXX, I just used the workflow from the site you recommended and it worked... it's only a 3 second video (so was the audio), but it worked!
https://www.reddit.com/r/StableDiffusion/comments/1qjfi5b/ltx2_audioimage_to_video_impressive/
Ah very nice. Seems to work with 2 talkers out of the box ;-) nice nice. I'll defo try some here too, thats quite nice, if it works with dialogs
and yes i like their workflows. They are clean and simple. Easy to understand ;-)
I wish to use tghis WF for i have always this error
Did you update ComfyUI to latest version, as well as update KJNodes and City69's GGUF nodes?
https://github.com/kijai/ComfyUI-KJNodes/
https://github.com/city96/ComfyUI-GGUF


