Skipping words very frequently
I am facing long duration of silence while conversion. The model is skipping words quite a lot, sometimes even full sentences.
Example -
For F3 speaker and sentence = Finally, they discovered their sense of touch by exploring different textures - soft feathers, rough sandpaper, and squishy playdough.
I skips almost the whole of sentence and makes mistakes. I have even tried with steps = 40.
How did you do inference? On the huggingface space? https://huggingface.co/spaces/Supertone/supertonic
I tested the same sentence using the Supertonic Python package and did not observe the issue in the generated audio. Additional details about your inference setup would help us identify the root cause.
I synthesized the audio using the following command:supertonic tts --voice F3 -o out.wav "Finally, they discovered their sense of touch by exploring different textures - soft feathers, rough sandpaper, and squishy playdough."