Quality

by xfrz - opened Oct 31, 2025

Oct 31, 2025

Renaming the model to “high” doesn’t help, quality is terrible. Original model Jirka-medium was far better.

There is a problem in eSpeak-ng engine rendering it unusable (at least for me). The problem is missing phonemes for Czech language.

I have a high quality dataset for personal use but it won’t help “bypass” this issue. Not to criticize only, if you know how to fix the problem with phonemes, I am willing to cooperate further.

Thomcles

Owner Oct 31, 2025

I made this model for someone who speaks Czech (which I don't, so I wasn't aware of the quality until you told me).

I tried training it with the configuration from the high-end models (which I found by searching the old GitHub repository):

--model.resblock 1 \
  --model.resblock_kernel_sizes "(3, 7, 11)" \
  --model.resblock_dilation_sizes "((1, 3, 5), (1, 3, 5), (1, 3, 5))" \
  --model.upsample_rates "(8, 8, 2, 2)" \
  --model.upsample_initial_channel 512 \
  --model.upsample_kernel_sizes "(16, 16, 4, 4)" \

Also, I trained it from scratch, over 500 epochs (the doc indicates that it would need 2000! 4 times more).

Do you think the model will perform better with more training?

Thomcles

Owner Oct 31, 2025

Renaming the model to “high” doesn’t help, quality is terrible. Original model Jirka-medium was far better.

There is a problem in eSpeak-ng engine rendering it unusable (at least for me). The problem is missing phonemes for Czech language.

I have a high quality dataset for personal use but it won’t help “bypass” this issue. Not to criticize only, if you know how to fix the problem with phonemes, I am willing to cooperate further.

Have you also tried the Medium model from my repository? Apparently, it's much better than the High model.

xfrz

Nov 1, 2025

Oh, ok. I thought you started finetuning Jirka-medium model (that is what I did before) and I had success with teaching it a more pleasant voice. Still there were these phonemes missing causing inconsistent pronounciation.

I doubt it there is a fix unless someone is willing to modify the eSpeak itself. There are specific phonemes in Czech language which are simply absent in the eSpeak itself. You were probably noticed about this when you started the training.

Otherwise, Jirka-medium is somewhat fine, just a bit annoying. Finetuning it takes only a few hours.

Thomcles

Owner Nov 10, 2025

Oh, ok. I thought you started finetuning Jirka-medium model (that is what I did before) and I had success with teaching it a more pleasant voice. Still there were these phonemes missing causing inconsistent pronounciation.

I doubt it there is a fix unless someone is willing to modify the eSpeak itself. There are specific phonemes in Czech language which are simply absent in the eSpeak itself. You were probably noticed about this when you started the training.

Otherwise, Jirka-medium is somewhat fine, just a bit annoying. Finetuning it takes only a few hours.

@xfrz

I have published a Chatterbox Czech model which, according to my feedback, performs better than Piper TTS and XTTSv2 in Czech :
https://huggingface.co/Thomcles/Chatterbox-TTS-Czech

Thomcles changed discussion status to closed Nov 10, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment