---
license: mit
language:
- ar
base_model:
- ResembleAI/chatterbox
pipeline_tag: text-to-speech
tags:
- Saudi
- Arabic
- Saudi-Dialect
- Chatterbox
- TTS
- voice-cloning
- multilingual-tts
library_name: chatterbox
---

![NAMAA Saudi TTS Banner](https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/2d4VIgVYji-CS2w8n_3tS.png)

# 🇸🇦 NAMAA-Saudi-TTS

**NAMAA-Saudi-TTS** is a Saudi Arabic Text-to-Speech (TTS) model built on top of the **Chatterbox Multilingual TTS** architecture.  
The model is configured and refined to generate **natural Saudi dialect speech**, targeting everyday conversational usage rather than Modern Standard Arabic (MSA).

This model is developed and released by **NAMAA Community (Network for Advancing Modern Arabic AI)** as part of its efforts to advance high-quality Arabic speech and language technologies.

---

## 🔊 Live Demo (Hugging Face Space)

👉 **Try the model here:**  
https://huggingface.co/spaces/omarelshehy/NAMAA-Saudi-Voice

---

## ✨ Model Capabilities

The model supports:

- **Saudi Arabic text input** (`language_id = "ar"`)
- Natural conversational prosody
- Saudi dialect phrasing and rhythm
- Optional **reference audio prompting** for:
  - Speaker similarity
  - Style and tone transfer
- GPU-accelerated inference

This repository contains all required **model checkpoints and assets** for local or hosted inference.

---

## 🗣️ Example Text (Saudi Dialect)

```text
آبي أروح البقالة أشتري كم غرض وأرجع بسرعة.
```

## ⚠️ Limitations

Please be aware of the following current limitations:

- Lack of tashkeel may affect pronunciation accuracy.
- Numeric normalization will be improved in future releases.
- This is a known limitation of the current flow-based generation.


These limitations are actively being addressed in upcoming versions.

## 🧪 Example Usage (Inference)

```python
import numpy as np
import torchaudio as ta
from huggingface_hub import snapshot_download
from safetensors.torch import load_file as load_safetensors
from chatterbox import mtl_tts

device = "cuda"  # or "cpu" / "mps"

ckpt_dir = snapshot_download(
    repo_id="NAMAA-Space/NAMAA-Saudi-TTS",
    repo_type="model",
    revision="main"
)

# Load model
model = mtl_tts.ChatterboxMultilingualTTS.from_pretrained(device=device)

t3_state = load_safetensors(
    f"{ckpt_dir}/t3_mtl23ls_v2.safetensors",
    device=device
)
model.t3.load_state_dict(t3_state)
model.t3.to(device).eval()

# Saudi Arabic text
text = "أنا الحين بروح الشغل وإذا رجعت بمرّ البقالة"

wav = model.generate(text, language_id="ar")
ta.save("namma_saudi.wav", wav, model.sr)
```

### 🔹 Inference with Reference Audio (Voice / Style Transfer)

```python
text = "آبي أخلص الشغل اليوم وأرتاح بكرة"

wav = model.generate(
    text,
    language_id="ar",
    audio_prompt_path="/content/reference_saudi.wav"
)

ta.save("namma_saudi_ref.wav", wav, model.sr)
```

## 🧠 Base Model

This model is built on top of:

- **ResembleAI/chatterbox**
- **Chatterbox Multilingual TTS architecture**

The Saudi dialect behavior is achieved through **specialized configuration, prompting, and curated usage patterns**, rather than training focused on Modern Standard Arabic (MSA).

---

## 📜 License

This model is released under the **MIT License**, allowing both **research and commercial usage** with proper attribution.

---

## 🤝 Community & Contributions

Developed and maintained by **NAMAA Community**  
*(Network for Advancing Modern Arabic NLP & AI)*

We welcome:

- Feedback and evaluations  
- Dialect-specific test cases  
- Contributions toward improving Arabic Text-to-Speech systems  

---

## 📌 Citation

If you use this model in research or production, please cite:

```bibtex
@misc{namaa_saudi_tts,
  title = {NAMAA-Saudi-TTS: Saudi Dialect Text-to-Speech},
  author = {{NAMAA Community}},
  year = {2026},
  url = {https://huggingface.co/NAMAA-Space/NAMAA-Saudi-TTS}
}