Workflow - Talking Avatar (voice cloning with Qwen-TTS)

#44
by RuneXX - opened

LTX - Talking Avatar with Qwen TTS (and steady camera lora optionally if you want "true" talking avatar)

A workflow with Qwen TTS directly connected to LTX-2 to allow voice cloning for consistent voice across video generations.
Prompt the dialog directly into Qwen TTS node and use a reference audio for voice cloning.

Image to video (I2V) : https://huggingface.co/RuneXX/LTX-2-Workflows/blob/main/LTX-2%20-%20I2V%20Talking%20Avatar%20(voice%20clone%20Qwen-TTS).json
Text to video (T2V) : https://huggingface.co/RuneXX/LTX-2-Workflows/blob/main/LTX-2%20-%20T2V%20Talking%20Avatar%20(voice%20clone%20Qwen-TTS).json

Needed nodes

Credit to @f0rkineye for the idea.

Btw, there are many Qwen TTS repros out now, and they all work, and they are all similar or same-ish. The one I picked was a bit random, based on it having a simple view node (as well as a complex one)
You can easily swap out the Qwen TTS node i used, for other one, if you already have Qwen TTS installed. Some alternative Qwen TTS nodes:
https://github.com/HAIGC/Comfyui-HAIGC-QwenTTS
https://github.com/DarioFT/ComfyUI-Qwen3-TTS
https://github.com/flybirdxx/ComfyUI-Qwen-TTS

Qwen-TTS has currently released 0.6b small fast model that is light on the computer, and a 1.7b more accurate model https://qwen.ai/blog?id=qwen3tts-0115
( you can of course use other TTS models such as VibeVoice from Microsoft that is very good, IndexTTS etc etc etc )

DarioFT's allows to set local model path and has nodes for training (fine-tuning) but no option for seed.

Your Image to video (I2V) workflow link seems to be broken πŸ€”

Your Image to video (I2V) workflow link seems to be broken πŸ€”

thanks for the notice ;-) fixed the link

If I raise the steps, what do I have to change?

If I raise the steps, what do I have to change?

raise steps, you need to use DEV model as the main model (for first part of the workflow) without the distilled lora.
But then you need to enable this lora only in the 2nd sampler upscale part. I think I made it easy to bypass this lora for the main part, and a simple enable in the upscale group.
(if not I will add that, but feel free to try yourself)

DarioFT's allows to set local model path and has nodes for training (fine-tuning) but no option for seed.

I've tried that one first because of the local model path. However with an 8 GB VRAM (64 GB RAM), the node had problems to generate two rounds. I had to restart Comfy, which did not help, and restart the computer. This node proved to be the most useful for me: https://github.com/flybirdxx/ComfyUI-Qwen-TTS.

No problems at all and with other options (multi-character chat aka role bank). As DarioFT's it has the option for seed in clone voice, custom voice and voice design, but not for training. I assumed the seed is automatically the same as long as the speaker name is the same?

Sign up or log in to comment