Sofelia TTS 82M
A Palestinian Arabic text-to-speech model โ Kokoro-82M fine-tuned for the Levantine/Palestinian dialect. Ships a single natural female voice, Eliaa.
- Architecture: Kokoro-82M (StyleTTS2 backbone, ISTFTNet vocoder), 82M params
- Sample rate: 24 kHz
- Speaker:
Eliaa(voices/eliaa.pt) - Runs on CPU in real time (RTF โ 0.15โ0.34 on 4 threads)
Why a custom frontend
espeak-ng's Arabic G2P assumes Modern Standard Arabic and mis-handles dialect
text. sofelia_frontend.py fixes the systematic problems before phonemization:
- Arabic punctuation (
ุุุ) โ Latin so the model receives pause/question cues - word-final
ุฉwrongly read as/t/for out-of-lexicon words โ bare fatha (a) - a Palestinian pronunciation lexicon (
ar_lexicon.json) for words espeak mis-reads or strips vowels from (ููู,ููู,ุชููู,ููุช,ู ุนูุด, โฆ) - post-G2P remap of phonemes outside Kokoro's 178-token vocab
(
ฤงโสฐ,สโส,หคโแต)
Usage
pip install kokoro misaki espeakng-loader phonemizer-fork soundfile torch
python inference.py "ุจุฏู ุฃุฑูุญ ุน ุงูุณูู ุฃุดุชุฑู ุฎุถุฑุฉ ูููุงูู ููุจูุช."
Arabic has no Kokoro lang_code, so phonemize with misaki espeak-ng ar
(+ the Sofelia frontend) and call KModel directly โ see inference.py.
Files
| File | Purpose |
|---|---|
kokoro_sofelia_82M.pth |
model weights (Kokoro KModel format) |
config.json |
Kokoro config (178-token vocab) |
voices/eliaa.pt |
Eliaa speaker voicepack [510, 1, 256] |
sofelia_frontend.py |
Palestinian Arabic text frontend |
ar_lexicon.json |
dialect pronunciation lexicon (extensible) |
inference.py |
minimal end-to-end example |
Limitations
- Numbers/dates/times are read via espeak's MSA expansion; a dialect number normalizer is planned. Spell numbers out for best results.
- Single voice (Eliaa). The lexicon is small and community-extensible โ
add
surface_word: diacritized_formentries to fix new words.
Credits
Fine-tuned from hexgrad/Kokoro-82M with a patched StyleTTS2 recipe. Dialect voice data: Sofelia Palestinian TTS.
- Downloads last month
- 54