Workflow - I2V & T2V with ID-LoRA for consistent voice across video generations

#59

by RuneXX - opened Mar 25

Discussion

RuneXX

Owner Mar 25

•

edited Mar 25

I2V & T2V Basic - consistent voice with ID-LoRA and reference audio

The workflow adds ID-Lora that lets you use a 5 second reference audio clip to have consistent voice for each video you make
https://id-lora.github.io/

Unlike using custom spoken audio input that strips the ambient sound away, with ID-LoRA you can prompt what the person should say, the background sound, etc
And all it needs is a 5 second reference audio, that you can prompt any dialog from based on the reference audio, giving you full flexibility

(the above video was ran with lowest strength so you can increase the strength for even higher consistency)

Make sure ComfyUI is up to date, the support for ID-Lora was added recently.
Thanks to AviadDahan and the ID-Lora team for the models, and thanks to Kijai for doing his magic ;-)

Download the lora's here:
https://huggingface.co/AviadDahan/LTX-2.3-ID-LoRA-CelebVHQ-3K
https://huggingface.co/AviadDahan/LTX-2.3-ID-LoRA-TalkVid-3K

(might come some updates to the workflow, was a first attempt, so hopefully somewhat correct)

APCOTech

Mar 25

Thank you for your great efforts and amazing work..... but can you explain the differences between the two loras?

RuneXX

Owner Mar 26

•

edited Mar 26

Thank you for your great efforts and amazing work..... but can you explain the differences between the two loras?

Not entirely sure, but think the difference is only the dataset used. CelebVHQ is the name of a dataset https://celebv-hq.github.io/, same is TalkVid https://github.com/FreedomIntelligence/TalkVid
Both loras works great though ;-)

jprins

Mar 26

•

edited Mar 26

thanks for the workflow, but i do have a question: in the upsampling phase, you are NOT using the ID-lORA right? only in 1st pass? is there a reason?

i tried myself but i am struggling really to get good audio consistency and i cannot for the life of me decide if with or without upsampler is better or even without lora, as all versions do have differing voices to be honest, seems more dependent on seed how different than just lora strenght or application.. sadly

let alone face/identity consistency identity_guidance_scale anything > 0 basically destroys the scene, distorts faces etc. :-/

RuneXX

Owner Mar 26

The lora is only for 1st phase, yes.
And not having any issues here. Not had any distortions at all
Will take a look if it could be anything...

jprins

Mar 26

The lora is only for 1st phase, yes.
And not having any issues here. Not had any distortions at all
Will take a look if it could be anything...

My reference audio is actually the audio generated in a previous LTX generation. It contains some background noise, sometimes a little music playing, so it's not just the voice, maybe that's it. I will try with some hard voice only. Thanks!

RuneXX

Owner Mar 26

•

edited Mar 26

My reference audio is actually the audio generated in a previous LTX generation. It contains some background noise, sometimes a little music playing, so it's not just the voice, maybe that's it. I will try with some hard voice only. Thanks!

ah yes that might influence things, i haven't tried with other than clean vocal input. I'll add the MelBandRoformer nodes to the workflow.
These nodes will extract the vocals only, and remove everything else.
(or you can try yourself if you want https://github.com/kijai/ComfyUI-MelBandRoFormer )

jprins

Mar 27

ah using the vocals (only) output of that MelBandRoFormer as reference_audio input to ID-Lora node definately makes a difference, many thanks for pointing that out!

RuneXX

Owner Apr 1

•

edited Apr 1

The ID-Lora workflow is updated with a few small changes for higher voice consistency (higher strength at the ID-Lora node)
Plus added a node to record your own voice, based on idea from ComfyUI native (might be fun ;-))
As well as added MelBandRoFormer to make a clean vocal input for best results.

https://huggingface.co/RuneXX/LTX-2.3-Workflows/

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment