Spaces:
Runtime error
This is cool..
Waa arrin dhab ahaan xiiso leh in la arko dhallinyaro Soomaaliyeed oo dhisaya adeegyada ay bulshadoodu u baahan tahay, gaar ahaan mashiinadda hadalka (Text-to-Speech - TTS). In kasta oo aan horay u naqaannay moodallada Microsoft (sida MuuseNeural iyo UbaxNeural), haddana moodalladaas waa kuwo aad u isticmaal badan (overused) oo xaddidan. Sidaa darteed, aad bay u wanaagsan tahay in la arko moodallo cusub oo ay Soomaali leedahay.
Halkaas ka sii wada dadaalka! Waxaan idinka codsanayaa inaad nala wadaagtaan sida aad ugu guulaysateen dhisidda moodalladan, hardware-ka aad isticmaasheen, iyo qaabka aad u xalliseen dhibaatooyinka farsamo, waayo waxaa jira dad badan (aniga oo kale ah) oo u baahan caawinaad ku saabsan sidii ay u samaysan lahaayeen TTS Soomaali ah.
Dhibaatooyinka aan ilaa hadda la kulmay:
Waxaan isku dayay inaan moodal TTS Soomaali ah ku dhiso aaladaha kala ah Coqui TTS, Piper TTS, iyo XTTS, laakiin dhammaantood waxay u baahan yihiin nidaamka G2P (Grapheme-to-Phoneme) oo isticmaalaya espeak-ng, kaas oo aanan u malaynaynin inuu hadda support gareeyo af-Soomaaliga. Waxaan dhab ahaan u baahanahay gacan ka geysashada arrintan; waxaan isku dayay inaan AI (sida Gemini) ka kaashado, laakiin wey ku fashilantay inay xal waafi ah i siiso ilaa aan markii dambe ka quustay.
Sidoo kale, waxaan isku dayay inaan Fine-tune ku sameeyo Microsoft SpeechT5, anigoo isticmaalaya xogta: Somali-ASR-Subset-68H. In kasta oo xogtaas ay tahay midda ugu fiican ee aan helay, haddana moodalkii aan dhisay wuxuu u dhawaaqayaa sidii "Robot" oo kale, codkuna ma aha mid dabiici ah.
Codsi ku socda horumariyeyaasha (Developers):
Waxay noqon lahayd wax aad loogu farxo haddii aad nala wadaagtaan casharro kooban (tutorials), ha noqoto YouTube ama inaad ku qortaan faahfaahinta (README) moodallada aad soo gelisaan Hugging Face. Tan waxay naga caawinaysaa inaan raacno dhabbihii aad martay oo aan annaguna wax cusub dhisno.
Aan iska kaashanno sidii aan u horumarin lahayn tignoolajiyada luqaddeenna!
English Version
It is truly inspiring to see the Somali community building tools specifically tailored to our needs, especially in the realm of Text-to-Speech (TTS). While many of us are familiar with Microsoft’s offerings (like MuuseNeural and UbaxNeural), those models can feel overused and are limited to specific platforms. Seeing the rise of independent Somali TTS models is a massive step forward.
To those of you who have successfully deployed models here: please keep going! I would also encourage you to share more about your process—specifically how you managed to pull this off, what hardware you used for training, and your general workflow. There are many developers (like myself) who want to contribute but are hitting technical walls.
The Challenges I’ve Encountered:
I have attempted to build Somali TTS models using frameworks like Coqui TTS, Piper, and XTTS. However, a major bottleneck is that these frameworks often rely on G2P (Grapheme-to-Phoneme) conversion via espeak-ng, which currently does not support the Somali language. I’ve tried using AI tools like Gemini to help me bridge this gap, but the technical hurdles remained unresolved, and I eventually hit a dead end.
I also experimented with fine-tuning Microsoft SpeechT5 using the Somali-ASR-Subset-68H dataset. Despite it being one of the best datasets available (I filtered for a single speaker to maintain consistency), the resulting model sounded very "robotic" and lacked natural prosody.
A Request to the Community:
It would be incredibly helpful if those who have succeeded could post brief tutorials, share their training scripts, or even link to YouTube walkthroughs in their model READMEs. Providing a roadmap or a "recipe" for how you handled Somali phonology and training would allow more of us to follow in your footsteps and build even better tools for our community.
Let’s keep building!