Text-to-Speech
mms
vits

Mongolian tokenizer mapping seems incorrect for ө and ш

#10
by Munyeong - opened

I found what looks like an incorrect character mapping in the tokenizer for facebook/mms-tts-mon.

The vocab does not contain modern Mongolian Cyrillic:

  • ө (U+04E9, CYRILLIC SMALL LETTER BARRED O)

  • ш (U+0448, CYRILLIC SMALL LETTER SHA)
    But it does contain visually/symbolically confusing alternatives:

  • ѳ (U+0473, CYRILLIC SMALL LETTER FITA)

  • щ (U+0449, CYRILLIC SMALL LETTER SHCHA)
    Two simple examples:

өргөн drops ө unless I remap ө -> ѳ
нар шиг drops ш unless I remap ш -> щ
After applying these remaps locally, the pronunciations become correct in practice (sometimes it pronounces ш as 'ye' though.)

This makes it look like the tokenizer/artifact mapping is wrong rather than the acoustic model itself.

Also, in Mongolian, щ is generally not used except in Russian-derived words, so having щ in the vocab while missing basic Mongolian ш seems especially suspicious.

Could this be fixed upstream?

Probably not. Even changing them if he is trained to the wrong ones won't help. You can try just converting the text to the wrong one before sending it so that he doesn't reject your Cyrillic as an illegal symbol. If you're lucky he only accepts it wrong, but pronounces it correctly, that's easy as I said and you probably figured it out, you'll just transfer it from Ш to Щ so that he accepts it correctly.

Sign up or log in to comment