Made Gradio Demo for veena TTS.
I made gradio demo for veena tts. It is simple for testing out. But it does have some limitations. It allows you to write/paste the text in textbox, select from 4 speakers, and generate audio which can be played on gradio page itself, also saves as mp3 in specific folder. I added slider to control speed, but it basically changes playback speed, so lower speed gives male like voice, higher speed gives fast and cartoon/female like voice. If anyone is interested to try it out then let me know.
Limitations : 1.) can only generate upto 19 sec audio, mostly due to token limitation. That would not be the case in their commercial model i guess.
2.) Hallucinations, causes skipping of words or sentences sometimes.
3.) Sometimes it generates very rapid speech, as it does not identify hindi punctuation mark at times. using " . " as punctuation instead of " । " does help some times.
Besides all this, I think you guys are on right path to make something big. Because Suno started the same way(initially it was called "bark" i think). Later it improved a lot.
How to increase the length of the audio file?
Like if I give multiple sentences for an audio file, it chops it for me. I tried with English only text thought.
How to increase the length of the audio file?
Like if I give multiple sentences for an audio file, it chops it for me. I tried with English only text thought.
I played a lot with this model but there is something strange about this model, it changes voice every time no matter what speaker you have alloted. I could generate longer audio by limiting generation to one sentence and generated as batch process. But every sentence has different voice. I think it is intentional to keep people from using long generation.