I2V custom audio + lora = auto lipsync

by f0rkineye - opened Jan 21

•

By adding a camera control lora to your I2V custom audio workflow, you can make it lipsync audio by just telling it to - "The man says the attached audio, lips perfectly in sync". Audio was from an IndexTTS2 workflow. Image generation is a little faster on my 5060ti 16GB, just 67sec for this 3sec clip. I have a feeling I'll be using your workflows going forwards. Thanks!

RuneXX

Owner Jan 21

•

edited Jan 21

That looks really good ;-)
Yes it should be quite capable of doing "talking avatar" sort of results

the LTX-2 is a bit of a jack-of-all-trades, I'm still in a process of discovery ;-)

RuneXX

Owner Jan 21

And quite interesting to add a TTS node in the workflow.
I might also try one ;-)

As well as doing on-the-fly "music videos" with something like https://github.com/HeartMuLa/heartlib or similar prompt-to-music nodes

f0rkineye

Jan 21

I uploaded a modification that includes the lora's and calculates frames based off audio length and how many seconds you want to pre/post pad the audio. It's available here if you want to take any ideas for your own workflows - https://gist.github.com/forkineye/4d9ea730c5d9c5f086251e69c8b243b2. Handy for one-shotting stuff for my friends :) Here's a sample generated with it, thanks again!

RuneXX

Owner Jan 21

nice one ;-) will definitively give it a run

gtufaro

Jan 21

RuneXX... you think there's a way to have an audio input of two voices and then an input image with a text prompt where LTX-2 is able to generate something beyond a talking avatar? That would be gamechanging...

RuneXX

Owner Jan 21

RuneXX... you think there's a way to have an audio input of two voices and then an input image with a text prompt where LTX-2 is able to generate something beyond a talking avatar? That would be gamechanging...

Might be possible, will see if i can think of some way. Could be as easy as prompting for it and the model would understand, but perhaps we arent that lucky ;-)
will try see how it works

gtufaro

Jan 22

RuneXX, I just used the workflow from the site you recommended and it worked... it's only a 3 second video (so was the audio), but it worked!

https://www.reddit.com/r/StableDiffusion/comments/1qjfi5b/ltx2_audioimage_to_video_impressive/

RuneXX

Owner Jan 22

•

edited Jan 22

Ah very nice. Seems to work with 2 talkers out of the box ;-) nice nice. I'll defo try some here too, thats quite nice, if it works with dialogs

and yes i like their workflows. They are clean and simple. Easy to understand ;-)

ArkaDio81

Jan 23

I wish to use tghis WF for i have always this error

RuneXX

Owner Jan 23

I wish to use tghis WF for i have always this error

Did you update ComfyUI to latest version, as well as update KJNodes and City69's GGUF nodes?

https://github.com/kijai/ComfyUI-KJNodes/
https://github.com/city96/ComfyUI-GGUF

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment