Sofelia TTS 82M

A Palestinian Arabic text-to-speech model — Kokoro-82M fine-tuned for the Levantine/Palestinian dialect. Ships a single natural female voice, Eliaa.

Architecture: Kokoro-82M (StyleTTS2 backbone, ISTFTNet vocoder), 82M params
Sample rate: 24 kHz
Speaker: Eliaa (voices/eliaa.pt)
Runs on CPU in real time (RTF ≈ 0.15–0.34 on 4 threads)

Why a custom frontend

espeak-ng's Arabic G2P assumes Modern Standard Arabic and mis-handles dialect text. sofelia_frontend.py fixes the systematic problems before phonemization:

Arabic punctuation (،؛؟) → Latin so the model receives pause/question cues
word-final ة wrongly read as /t/ for out-of-lexicon words → bare fatha (a)
a Palestinian pronunciation lexicon (ar_lexicon.json) for words espeak mis-reads or strips vowels from (هيك, وين, تنين, قلت, معلش, …)
post-G2P remap of phonemes outside Kokoro's 178-token vocab (ħ→ʰ, ʕ→ʁ, ˤ→ᵊ)

Usage

pip install kokoro misaki espeakng-loader phonemizer-fork soundfile torch
python inference.py "بدي أروح ع السوق أشتري خضرة وفواكه للبيت."

Arabic has no Kokoro lang_code, so phonemize with misaki espeak-ng ar (+ the Sofelia frontend) and call KModel directly — see inference.py.

Files

File	Purpose
`kokoro_sofelia_82M.pth`	model weights (Kokoro KModel format)
`config.json`	Kokoro config (178-token vocab)
`voices/eliaa.pt`	Eliaa speaker voicepack `[510, 1, 256]`
`sofelia_frontend.py`	Palestinian Arabic text frontend
`ar_lexicon.json`	dialect pronunciation lexicon (extensible)
`inference.py`	minimal end-to-end example

Limitations

Numbers/dates/times are read via espeak's MSA expansion; a dialect number normalizer is planned. Spell numbers out for best results.
Single voice (Eliaa). The lexicon is small and community-extensible — add surface_word: diacritized_form entries to fix new words.

Credits

Fine-tuned from hexgrad/Kokoro-82M with a patched StyleTTS2 recipe. Dialect voice data: Sofelia Palestinian TTS.

Downloads last month: 54