| | --- |
| | license: mit |
| | language: |
| | - ar |
| | base_model: |
| | - ResembleAI/chatterbox |
| | pipeline_tag: text-to-speech |
| | tags: |
| | - Saudi |
| | - Arabic |
| | - Saudi-Dialect |
| | - Chatterbox |
| | - TTS |
| | - voice-cloning |
| | - multilingual-tts |
| | library_name: chatterbox |
| | --- |
| | |
| |  |
| |
|
| | # 🇸🇦 NAMAA-Saudi-TTS |
| |
|
| | **NAMAA-Saudi-TTS** is a Saudi Arabic Text-to-Speech (TTS) model built on top of the **Chatterbox Multilingual TTS** architecture. |
| | The model is configured and refined to generate **natural Saudi dialect speech**, targeting everyday conversational usage rather than Modern Standard Arabic (MSA). |
| |
|
| | This model is developed and released by **NAMAA Community (Network for Advancing Modern Arabic AI)** as part of its efforts to advance high-quality Arabic speech and language technologies. |
| |
|
| | --- |
| |
|
| | ## 🔊 Live Demo (Hugging Face Space) |
| |
|
| | 👉 **Try the model here:** |
| | https://huggingface.co/spaces/omarelshehy/NAMAA-Saudi-Voice |
| |
|
| | --- |
| |
|
| | ## ✨ Model Capabilities |
| |
|
| | The model supports: |
| |
|
| | - **Saudi Arabic text input** (`language_id = "ar"`) |
| | - Natural conversational prosody |
| | - Saudi dialect phrasing and rhythm |
| | - Optional **reference audio prompting** for: |
| | - Speaker similarity |
| | - Style and tone transfer |
| | - GPU-accelerated inference |
| |
|
| | This repository contains all required **model checkpoints and assets** for local or hosted inference. |
| |
|
| | --- |
| |
|
| | ## 🗣️ Example Text (Saudi Dialect) |
| |
|
| | ```text |
| | آبي أروح البقالة أشتري كم غرض وأرجع بسرعة. |
| | ``` |
| |
|
| | ## ⚠️ Limitations |
| |
|
| | Please be aware of the following current limitations: |
| |
|
| | - Lack of tashkeel may affect pronunciation accuracy. |
| | - Numeric normalization will be improved in future releases. |
| | - This is a known limitation of the current flow-based generation. |
| |
|
| |
|
| | These limitations are actively being addressed in upcoming versions. |
| |
|
| | ## 🧪 Example Usage (Inference) |
| |
|
| | ```python |
| | import numpy as np |
| | import torchaudio as ta |
| | from huggingface_hub import snapshot_download |
| | from safetensors.torch import load_file as load_safetensors |
| | from chatterbox import mtl_tts |
| | |
| | device = "cuda" # or "cpu" / "mps" |
| | |
| | ckpt_dir = snapshot_download( |
| | repo_id="NAMAA-Space/NAMAA-Saudi-TTS", |
| | repo_type="model", |
| | revision="main" |
| | ) |
| | |
| | # Load model |
| | model = mtl_tts.ChatterboxMultilingualTTS.from_pretrained(device=device) |
| | |
| | t3_state = load_safetensors( |
| | f"{ckpt_dir}/t3_mtl23ls_v2.safetensors", |
| | device=device |
| | ) |
| | model.t3.load_state_dict(t3_state) |
| | model.t3.to(device).eval() |
| | |
| | # Saudi Arabic text |
| | text = "أنا الحين بروح الشغل وإذا رجعت بمرّ البقالة" |
| | |
| | wav = model.generate(text, language_id="ar") |
| | ta.save("namma_saudi.wav", wav, model.sr) |
| | ``` |
| |
|
| | ### 🔹 Inference with Reference Audio (Voice / Style Transfer) |
| |
|
| | ```python |
| | text = "آبي أخلص الشغل اليوم وأرتاح بكرة" |
| | |
| | wav = model.generate( |
| | text, |
| | language_id="ar", |
| | audio_prompt_path="/content/reference_saudi.wav" |
| | ) |
| | |
| | ta.save("namma_saudi_ref.wav", wav, model.sr) |
| | ``` |
| |
|
| | ## 🧠 Base Model |
| |
|
| | This model is built on top of: |
| |
|
| | - **ResembleAI/chatterbox** |
| | - **Chatterbox Multilingual TTS architecture** |
| |
|
| | The Saudi dialect behavior is achieved through **specialized configuration, prompting, and curated usage patterns**, rather than training focused on Modern Standard Arabic (MSA). |
| |
|
| | --- |
| |
|
| | ## 📜 License |
| |
|
| | This model is released under the **MIT License**, allowing both **research and commercial usage** with proper attribution. |
| |
|
| | --- |
| |
|
| | ## 🤝 Community & Contributions |
| |
|
| | Developed and maintained by **NAMAA Community** |
| | *(Network for Advancing Modern Arabic NLP & AI)* |
| |
|
| | We welcome: |
| |
|
| | - Feedback and evaluations |
| | - Dialect-specific test cases |
| | - Contributions toward improving Arabic Text-to-Speech systems |
| |
|
| | --- |
| |
|
| | ## 📌 Citation |
| |
|
| | If you use this model in research or production, please cite: |
| |
|
| | ```bibtex |
| | @misc{namaa_saudi_tts, |
| | title = {NAMAA-Saudi-TTS: Saudi Dialect Text-to-Speech}, |
| | author = {{NAMAA Community}}, |
| | year = {2026}, |
| | url = {https://huggingface.co/NAMAA-Space/NAMAA-Saudi-TTS} |
| | } |