Workflow - I2V & T2V with ID-LoRA for consistent voice across video generations

#40

by RuneXX - opened 9 days ago

Discussion

RuneXX

9 days ago

•

edited 9 days ago

I2V & T2V Basic - consistent voice with ID-LoRA and reference audio

The workflow adds ID-Lora that lets you use a 5 second reference audio clip to have consistent voice for each video you make
https://id-lora.github.io/

Unlike using custom spoken audio input that strips the ambient sound away, with ID-LoRA you can prompt what the person should say, the background sound, etc
And all it needs is a 5 second reference audio, that you can prompt any dialog from based on the reference audio, giving you full flexibility

(the above video was ran with lowest strength so you can increase the strength for even higher consistency)

Make sure ComfyUI is up to date, the support for ID-Lora was added recently.
Thanks to @AviadDahan and the ID-Lora team for the models, and thanks to Kijai for doing his magic ;-)

Download the lora's here:
https://huggingface.co/AviadDahan/LTX-2.3-ID-LoRA-CelebVHQ-3K
https://huggingface.co/AviadDahan/LTX-2.3-ID-LoRA-TalkVid-3K

(might come some updates to the workflow, was a first attempt, so hopefully somewhat correct)

https://huggingface.co/RuneXX/LTX-2.3-Workflows

moranyanuka

9 days ago

Thanks for using our models, looks like an awesome workflow!

anr2me

8 days ago

•

edited 8 days ago

Thanks for using our models, looks like an awesome workflow!

Thanks for the models. 👍

Btw, what are the use cases when choosing between CelebVHQ vs TalkVid ID-LoRA? 🤔

TalkVid dataset seems to be newer (2025) than CelebV-HQ (2022). Is TalkVid the recommended one to use for general case?

We train ID-LoRA on CelebV-HQ Zhu et al. (2022) and TalkVid Chen et al. (2025), maintaining separate checkpoints for each dataset.

AviadDahan

8 days ago

Generally speaking CelebVHQ tends to include a higher variety of scene changes and more scenes with background music or noises such as crowd etc. so it should generalize better.

Talkvid has a higher speaker count so it should theoretically support more speaking styles/voices.

We are planning to release a checkpoint which is trained on both but that's later in the road map and might take a while.

bushi893

3 days ago

•

edited 3 days ago

Hello, thanks for the lora, is there a way to control the voice speed?

RuneXX

2 days ago

The ID-Lora workflow is updated with a few small changes for higher voice consistency (higher strength at the ID-Lora node)
Plus added a node to record your own voice, based on idea from ComfyUI native (might be fun ;-))
As well as added MelBandRoFormer to make a clean vocal input for best results.

https://huggingface.co/RuneXX/LTX-2.3-Workflows/

RuneXX

2 days ago

Hello, thanks for the lora, is there a way to control the voice speed?

I tried with some dots "...." but might be other way. Good question actually ;-) will try with a multi [speech] input, see if that works, with some context to pause between

bushi893

2 days ago

I also tried with MelBandRoFormer because I still have some consistency issue on some generation. I'm not sure what is the cause.
Yes, I also tried with dots but the speaker does a short pause between words, it seems it needs a slow input audio.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment