KasbahTTS — صوت القصبة



F5-TTS Habibi-TTS License Algeria


Overview

KasbahTTS is the first open-source Text-to-Speech model purpose-built for Algerian Dardja (الدارجة الجزائرية).

Most Arabic TTS systems only speak Modern Standard Arabic — a language nobody uses on the street. KasbahTTS speaks like real Algerians do: the Dardja of Algiers' alleyways, the warmth of Oran's markets, the rhythm of Constantine's conversations.

Built on the F5-TTS architecture (DiT-based flow matching) and fine-tuned from Habibi-TTS, KasbahTTS brings Algerian speech synthesis into the open-source world.

أول موديل TTS مفتوح المصدر يهدر بالدارجة الجزائرية.


Audio Samples

Listen to KasbahTTS in action — each sample shows the reference voice the model clones from, followed by the generated speech:

Sample 1

Audio
Reference
Generated

Sample 2

Audio
Reference
Generated

Sample 3

Audio
Reference
Generated

Model Details

Model KasbahTTS V0
Task Text-to-Speech (Zero-Shot Voice Cloning)
Architecture F5-TTS — DiT-based flow matching
Base Model Habibi-TTS
Dialect Algerian Dardja (الدارجة الجزائرية)
License MIT

Features

  • Zero-shot voice cloning — Give it a few seconds of any voice, and it speaks Dardja in that voice
  • Native Algerian Dardja — Trained on real Algerian conversational speech, not textbook Arabic
  • F5-TTS backbone — State-of-the-art DiT architecture with flow matching for natural, high-fidelity synthesis
  • Open source — Fully open weights under MIT license, use it however you want

Quick Start

Installation

pip install habibi-tts

Note: On first run, the Vocos vocoder (~40 MB) will be automatically downloaded from HuggingFace. After that, everything runs fully offline.

Python Inference

import torch
import soundfile as sf
from f5_tts.infer.utils_infer import load_model, load_vocoder, preprocess_ref_audio_text
from f5_tts.model import DiT
from habibi_tts.infer.utils_infer import infer_process

# Load vocoder
vocoder = load_vocoder(vocoder_name="vocos", is_local=False)

# Load KasbahTTS
model = load_model(
    DiT,
    dict(dim=1024, depth=22, heads=16, ff_mult=2, text_dim=512, conv_layers=4),
    ckpt_path="ALGERIA.safetensors",
    vocab_file="vocab.txt",
)

# Prepare reference audio
ref_audio, ref_text = preprocess_ref_audio_text(
    "reference.wav",
    "النص المرجعي هنا"
)

# Generate speech
audio, sr, _ = infer_process(
    ref_audio=ref_audio,
    ref_text=ref_text,
    gen_text="واش راك خويا، لاباس عليك؟",
    model_obj=model,
    vocoder=vocoder,
    speed=1.0,
)

sf.write("output.wav", audio, sr)

CLI Usage

habibi-tts_infer-cli \
  --model_cls DiT \
  --ckpt_file ALGERIA.safetensors \
  --vocab_file vocab.txt \
  --ref_audio reference.wav \
  --ref_text "النص المرجعي" \
  --gen_text "صباح الخير، واش راك اليوم؟"

Known Limitations

Arabic Only — No French Code-Switching

This version handles Arabic Dardja text only. Algerian Dardja naturally mixes Arabic and French in daily conversation, but KasbahTTS V0 does not support French words or Latin script. Mixing in French text will produce unpredictable results. Write your input in Arabic script only.

واش راك؟ ça va bien → Unpredictable

واش راك؟ لاباس عليك → Works great

No Number Handling

The model cannot process numerical digits. Numbers in the input text (like "5" or "2024") will not be spoken correctly. Write numbers out as words instead.

عندي 3 خاوتي → Won't work

عندي ثلاثة خاوتي → Works

No Diacritics (تشكيل)

This version does not support Arabic diacritics (harakat). Input text should be plain, unvocalized Arabic. Diacritic support is planned for future versions.

Repetition

The model may occasionally repeat words or phrases. To reduce this:

  • Try different nfe_step values (32 or 64)
  • Adjust cfg_strength (default: 2.0)
  • Break long text into shorter segments

Dialect Scope

KasbahTTS is trained specifically on Algerian Dardja. Other Arabic dialects or MSA text may produce lower quality or unexpected output.


What's Next

KasbahTTS V0 is just the beginning. Planned improvements include:

  • French code-switching support (Dardja-French mixing)
  • Number and digit handling
  • Diacritics (تشكيل) support
  • Longer and more stable generation
  • Additional Algerian regional accents

Citation

If you use KasbahTTS in your research or projects, please cite the underlying Habibi-TTS work:

@article{habibi2025,
  title={Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis},
  author={...},
  journal={arXiv preprint arXiv:2601.13802},
  year={2025}
}

Built with ❤️ for Algeria by MenaVoice

من القصبة للعالم — From the Kasbah to the world

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Paper for MenaVoice/KasbahTTS-V0