Feedback on Vietnamese support in Supertonic-3
Hello,
First of all, thank you for adding Vietnamese support to Supertonic-3. I really appreciate that your team is expanding multilingual coverage, especially for languages that are often overlooked.
I tried the model quickly on the Hugging Face Space using the default setting (8 steps). The Vietnamese output is understandable, but I noticed several issues that may help improve future versions:
speech is sometimes not smooth or natural
occasional stuttering or repeated syllables/words
some words are skipped or cut off
pacing can become unstable during longer sentences
Despite these issues, it is still exciting to see Vietnamese included, and I hope the language quality continues to improve in future updates.
Thank you again for your work and for supporting Vietnamese users.
Best regards
Hi thuongvv,
Thank you very much for trying Supertonic 3 and for sharing detailed feedback on Vietnamese. We really appreciate it.
We did run quantitative evaluations for Vietnamese before release, but we also have to acknowledge that we do not yet have enough native-speaker review coverage to carefully listen to and evaluate Vietnamese output in the same way we can for languages like Korean and English. Feedback like this is therefore very helpful for us.
For Korean and English, we observed that issues such as stuttering, repeated words, skipped words, and unstable pacing were significantly reduced compared with Supertonic 2. We believe this improvement came from a larger and more diverse dataset, as well as changes in the training scheme aimed at improving reading stability.
However, Vietnamese is one of the lower-resource languages in our current dataset, and we suspect that this is one reason why the same level of improvement is not yet uniform across all supported languages. Your observations are consistent with the kind of language-specific quality gap we still need to address.
We are looking into additional training approaches, including supervised fine-tuning and further preference/reinforcement-based training, to improve consistency across languages. If we make improvements to Vietnamese quality in a future update, we will share them.
Thanks again for the thoughtful feedback and for supporting Vietnamese TTS users.