Workflow - Talking Avatar (voice cloning with Qwen-TTS)

#44

by RuneXX - opened Jan 24

Discussion

RuneXX

Jan 24

•

edited Jan 25

LTX - Talking Avatar with Qwen TTS (and steady camera lora optionally if you want "true" talking avatar)

A workflow with Qwen TTS directly connected to LTX-2 to allow voice cloning for consistent voice across video generations.
Prompt the dialog directly into Qwen TTS node and use a reference audio for voice cloning.

Image to video (I2V) : https://huggingface.co/RuneXX/LTX-2-Workflows/blob/main/LTX-2%20-%20I2V%20Talking%20Avatar%20(voice%20clone%20Qwen-TTS).json
Text to video (T2V) : https://huggingface.co/RuneXX/LTX-2-Workflows/blob/main/LTX-2%20-%20T2V%20Talking%20Avatar%20(voice%20clone%20Qwen-TTS).json

Needed nodes

https://github.com/kijai/ComfyUI-KJNodes - update to have the latest
https://github.com/city96/ComfyUI-GGUF - update to have the latest
https://github.com/1038lab/ComfyUI-QwenTTS - the models are auto downloaded on first use

Credit to @f0rkineye for the idea.

RuneXX

Jan 24

Btw, there are many Qwen TTS repros out now, and they all work, and they are all similar or same-ish. The one I picked was a bit random, based on it having a simple view node (as well as a complex one)
You can easily swap out the Qwen TTS node i used, for other one, if you already have Qwen TTS installed. Some alternative Qwen TTS nodes:
https://github.com/HAIGC/Comfyui-HAIGC-QwenTTS
https://github.com/DarioFT/ComfyUI-Qwen3-TTS
https://github.com/flybirdxx/ComfyUI-Qwen-TTS

Qwen-TTS has currently released 0.6b small fast model that is light on the computer, and a 1.7b more accurate model https://qwen.ai/blog?id=qwen3tts-0115
( you can of course use other TTS models such as VibeVoice from Microsoft that is very good, IndexTTS etc etc etc )

vvhitevvizard

Jan 25

•

edited Jan 25

DarioFT's allows to set local model path and has nodes for training (fine-tuning) but no option for seed.

anr2me

Jan 25

Your Image to video (I2V) workflow link seems to be broken 🤔

RuneXX

Jan 25

Your Image to video (I2V) workflow link seems to be broken 🤔

thanks for the notice ;-) fixed the link

bicio78ita

Jan 28

If I raise the steps, what do I have to change?

RuneXX

Jan 31

If I raise the steps, what do I have to change?

raise steps, you need to use DEV model as the main model (for first part of the workflow) without the distilled lora.
But then you need to enable this lora only in the 2nd sampler upscale part. I think I made it easy to bypass this lora for the main part, and a simple enable in the upscale group.
(if not I will add that, but feel free to try yourself)

GlamoramaAttack

Feb 1

DarioFT's allows to set local model path and has nodes for training (fine-tuning) but no option for seed.

I've tried that one first because of the local model path. However with an 8 GB VRAM (64 GB RAM), the node had problems to generate two rounds. I had to restart Comfy, which did not help, and restart the computer. This node proved to be the most useful for me: https://github.com/flybirdxx/ComfyUI-Qwen-TTS.

No problems at all and with other options (multi-character chat aka role bank). As DarioFT's it has the option for seed in clone voice, custom voice and voice design, but not for training. I assumed the seed is automatically the same as long as the speaker name is the same?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment