Skipping words very frequently

#17

by anujchopra - opened Dec 22, 2025

Dec 22, 2025

I am facing long duration of silence while conversion. The model is skipping words quite a lot, sometimes even full sentences.
Example -

For F3 speaker and sentence = Finally, they discovered their sense of touch by exploring different textures - soft feathers, rough sandpaper, and squishy playdough.

I skips almost the whole of sentence and makes mistakes. I have even tried with steps = 40.

datwelk

Dec 22, 2025

How did you do inference? On the huggingface space? https://huggingface.co/spaces/Supertone/supertonic

anlgboy-cream

Dec 24, 2025

I tested the same sentence using the Supertonic Python package and did not observe the issue in the generated audio. Additional details about your inference setup would help us identify the root cause.

I synthesized the audio using the following command:
supertonic tts --voice F3 -o out.wav "Finally, they discovered their sense of touch by exploring different textures - soft feathers, rough sandpaper, and squishy playdough."

anujchopra

Dec 29, 2025

Can you try with slower speed? 0.8 or 0.7
It skips words when I use slower speed.
Speed 1.0 is fast for non-english speaking people.

kybird

Mar 14

it happen specific seed and specific word. seems like opensouce bait or something. i must find other solution.

juheon2

Supertone org May 7

Thanks for the reports, and sorry for the delayed follow-up.

We recognize that earlier Supertonic releases could sometimes skip words, repeat words, or produce long silences for certain text / voice / seed / speed combinations. Increasing the number of steps does not always fix this, because the issue is related to reading stability rather than just inference quality.

We focused on improving this in Supertonic 3. The updated model substantially reduces repeat/skip failures and long-silence cases, especially on short and medium-length utterances.

We also re-tested the example sentence you shared with the new model:

“Finally, they discovered their sense of touch by exploring different textures - soft feathers, rough sandpaper, and squishy playdough.”

With Supertonic 3, we were able to generate the sentence correctly in our test.

Please try the updated release:

Hugging Face demo: https://huggingface.co/spaces/Supertone/supertonic-3
Model: https://huggingface.co/Supertone/supertonic-3
GitHub: https://github.com/supertone-inc/supertonic

If you still encounter skipping or long-silence issues with Supertonic 3, please share the text, voice, language, speed, step count, and seed if available. Those details are very helpful for debugging and future improvements.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment