Instructions to use MenaVoice/KasbahTTS-V0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- F5-TTS
How to use MenaVoice/KasbahTTS-V0 with F5-TTS:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Habibi-TTS
How to use MenaVoice/KasbahTTS-V0 with Habibi-TTS:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Overview
KasbahTTS is the first open-source Text-to-Speech model purpose-built for Algerian Dardja (الدارجة الجزائرية).
Most Arabic TTS systems only speak Modern Standard Arabic — a language nobody uses on the street. KasbahTTS speaks like real Algerians do: the Dardja of Algiers' alleyways, the warmth of Oran's markets, the rhythm of Constantine's conversations.
Built on the F5-TTS architecture (DiT-based flow matching) and fine-tuned from Habibi-TTS, KasbahTTS brings Algerian speech synthesis into the open-source world.
أول موديل TTS مفتوح المصدر يهدر بالدارجة الجزائرية.
Audio Samples
Listen to KasbahTTS in action — each sample shows the reference voice the model clones from, followed by the generated speech:
Sample 1
| Audio | |
|---|---|
| Reference | |
| Generated |
Sample 2
| Audio | |
|---|---|
| Reference | |
| Generated |
Sample 3
| Audio | |
|---|---|
| Reference | |
| Generated |
Model Details
| Model | KasbahTTS V0 |
| Task | Text-to-Speech (Zero-Shot Voice Cloning) |
| Architecture | F5-TTS — DiT-based flow matching |
| Base Model | Habibi-TTS |
| Dialect | Algerian Dardja (الدارجة الجزائرية) |
| License | MIT |
Features
- Zero-shot voice cloning — Give it a few seconds of any voice, and it speaks Dardja in that voice
- Native Algerian Dardja — Trained on real Algerian conversational speech, not textbook Arabic
- F5-TTS backbone — State-of-the-art DiT architecture with flow matching for natural, high-fidelity synthesis
- Open source — Fully open weights under MIT license, use it however you want
Quick Start
Installation
pip install habibi-tts
Note: On first run, the Vocos vocoder (~40 MB) will be automatically downloaded from HuggingFace. After that, everything runs fully offline.
Python Inference
import torch
import soundfile as sf
from f5_tts.infer.utils_infer import load_model, load_vocoder, preprocess_ref_audio_text
from f5_tts.model import DiT
from habibi_tts.infer.utils_infer import infer_process
# Load vocoder
vocoder = load_vocoder(vocoder_name="vocos", is_local=False)
# Load KasbahTTS
model = load_model(
DiT,
dict(dim=1024, depth=22, heads=16, ff_mult=2, text_dim=512, conv_layers=4),
ckpt_path="ALGERIA.safetensors",
vocab_file="vocab.txt",
)
# Prepare reference audio
ref_audio, ref_text = preprocess_ref_audio_text(
"reference.wav",
"النص المرجعي هنا"
)
# Generate speech
audio, sr, _ = infer_process(
ref_audio=ref_audio,
ref_text=ref_text,
gen_text="واش راك خويا، لاباس عليك؟",
model_obj=model,
vocoder=vocoder,
speed=1.0,
)
sf.write("output.wav", audio, sr)
CLI Usage
habibi-tts_infer-cli \
--model_cls DiT \
--ckpt_file ALGERIA.safetensors \
--vocab_file vocab.txt \
--ref_audio reference.wav \
--ref_text "النص المرجعي" \
--gen_text "صباح الخير، واش راك اليوم؟"
Known Limitations
Arabic Only — No French Code-Switching
This version handles Arabic Dardja text only. Algerian Dardja naturally mixes Arabic and French in daily conversation, but KasbahTTS V0 does not support French words or Latin script. Mixing in French text will produce unpredictable results. Write your input in Arabic script only.
❌
واش راك؟ ça va bien→ Unpredictable✅
واش راك؟ لاباس عليك→ Works great
No Number Handling
The model cannot process numerical digits. Numbers in the input text (like "5" or "2024") will not be spoken correctly. Write numbers out as words instead.
❌
عندي 3 خاوتي→ Won't work✅
عندي ثلاثة خاوتي→ Works
No Diacritics (تشكيل)
This version does not support Arabic diacritics (harakat). Input text should be plain, unvocalized Arabic. Diacritic support is planned for future versions.
Repetition
The model may occasionally repeat words or phrases. To reduce this:
- Try different
nfe_stepvalues (32 or 64) - Adjust
cfg_strength(default: 2.0) - Break long text into shorter segments
Dialect Scope
KasbahTTS is trained specifically on Algerian Dardja. Other Arabic dialects or MSA text may produce lower quality or unexpected output.
What's Next
KasbahTTS V0 is just the beginning. Planned improvements include:
- French code-switching support (Dardja-French mixing)
- Number and digit handling
- Diacritics (تشكيل) support
- Longer and more stable generation
- Additional Algerian regional accents
Citation
If you use KasbahTTS in your research or projects, please cite the underlying Habibi-TTS work:
@article{habibi2025,
title={Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis},
author={...},
journal={arXiv preprint arXiv:2601.13802},
year={2025}
}
Built with ❤️ for Algeria by MenaVoice
من القصبة للعالم — From the Kasbah to the world
- Downloads last month
- 3