Quality
Renaming the model to “high” doesn’t help, quality is terrible. Original model Jirka-medium was far better.
There is a problem in eSpeak-ng engine rendering it unusable (at least for me). The problem is missing phonemes for Czech language.
I have a high quality dataset for personal use but it won’t help “bypass” this issue. Not to criticize only, if you know how to fix the problem with phonemes, I am willing to cooperate further.
I made this model for someone who speaks Czech (which I don't, so I wasn't aware of the quality until you told me).
I tried training it with the configuration from the high-end models (which I found by searching the old GitHub repository):
--model.resblock 1 \
--model.resblock_kernel_sizes "(3, 7, 11)" \
--model.resblock_dilation_sizes "((1, 3, 5), (1, 3, 5), (1, 3, 5))" \
--model.upsample_rates "(8, 8, 2, 2)" \
--model.upsample_initial_channel 512 \
--model.upsample_kernel_sizes "(16, 16, 4, 4)" \
Also, I trained it from scratch, over 500 epochs (the doc indicates that it would need 2000! 4 times more).
Do you think the model will perform better with more training?
Renaming the model to “high” doesn’t help, quality is terrible. Original model Jirka-medium was far better.
There is a problem in eSpeak-ng engine rendering it unusable (at least for me). The problem is missing phonemes for Czech language.
I have a high quality dataset for personal use but it won’t help “bypass” this issue. Not to criticize only, if you know how to fix the problem with phonemes, I am willing to cooperate further.
Have you also tried the Medium model from my repository? Apparently, it's much better than the High model.
Oh, ok. I thought you started finetuning Jirka-medium model (that is what I did before) and I had success with teaching it a more pleasant voice. Still there were these phonemes missing causing inconsistent pronounciation.
I doubt it there is a fix unless someone is willing to modify the eSpeak itself. There are specific phonemes in Czech language which are simply absent in the eSpeak itself. You were probably noticed about this when you started the training.
Otherwise, Jirka-medium is somewhat fine, just a bit annoying. Finetuning it takes only a few hours.
Oh, ok. I thought you started finetuning Jirka-medium model (that is what I did before) and I had success with teaching it a more pleasant voice. Still there were these phonemes missing causing inconsistent pronounciation.
I doubt it there is a fix unless someone is willing to modify the eSpeak itself. There are specific phonemes in Czech language which are simply absent in the eSpeak itself. You were probably noticed about this when you started the training.
Otherwise, Jirka-medium is somewhat fine, just a bit annoying. Finetuning it takes only a few hours.
I have published a Chatterbox Czech model which, according to my feedback, performs better than Piper TTS and XTTSv2 in Czech :
https://huggingface.co/Thomcles/Chatterbox-TTS-Czech