Freaking great
I look up this website almost everyday to see if a new TTS model with voice cloning came up for my CPU and this was exactly what I needed. Using it with Ryzen 5 5600 and its not exactly realtime but it is faster and really good at voice cloning it`s almost scary that this is local and technology came this far.
Not my experience. Language was English. Their biggest model also underperformed. Results were distorted, not legible at all.
Take all the compliments you hear with a grain of salt.
My current TTS rankings are:
Higgs > Qwen > VoxCPM > EchoTTS
Higgs is the goat but too much variability. Qwen is the best all-rounder.
VoxCPM is good if you do not care about it not sounding natural.
EchoTTS has a weird license. Outputs are not that great but if you use their training text you can create a better reference audio.
Best path I found is EchoTTS to improve reference, Higgs to create a lot of examples, then use those to finetune your own Qwen -> Chef kiss.
Unfortunately I have RX 590 so I cant use those in realtime even with GGUF or cant use at all. Moss TTS is the only one I can use with my CPU with voice cloning and it`s the current best one.