NAMAA-Saudi-TTS / README.md
omarelshehy's picture
Update README.md
ca58e2c verified
---
license: mit
language:
- ar
base_model:
- ResembleAI/chatterbox
pipeline_tag: text-to-speech
tags:
- Saudi
- Arabic
- Saudi-Dialect
- Chatterbox
- TTS
- voice-cloning
- multilingual-tts
library_name: chatterbox
---
![NAMAA Saudi TTS Banner](https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/2d4VIgVYji-CS2w8n_3tS.png)
# 🇸🇦 NAMAA-Saudi-TTS
**NAMAA-Saudi-TTS** is a Saudi Arabic Text-to-Speech (TTS) model built on top of the **Chatterbox Multilingual TTS** architecture.
The model is configured and refined to generate **natural Saudi dialect speech**, targeting everyday conversational usage rather than Modern Standard Arabic (MSA).
This model is developed and released by **NAMAA Community (Network for Advancing Modern Arabic AI)** as part of its efforts to advance high-quality Arabic speech and language technologies.
---
## 🔊 Live Demo (Hugging Face Space)
👉 **Try the model here:**
https://huggingface.co/spaces/omarelshehy/NAMAA-Saudi-Voice
---
## ✨ Model Capabilities
The model supports:
- **Saudi Arabic text input** (`language_id = "ar"`)
- Natural conversational prosody
- Saudi dialect phrasing and rhythm
- Optional **reference audio prompting** for:
- Speaker similarity
- Style and tone transfer
- GPU-accelerated inference
This repository contains all required **model checkpoints and assets** for local or hosted inference.
---
## 🗣️ Example Text (Saudi Dialect)
```text
آبي أروح البقالة أشتري كم غرض وأرجع بسرعة.
```
## ⚠️ Limitations
Please be aware of the following current limitations:
- Lack of tashkeel may affect pronunciation accuracy.
- Numeric normalization will be improved in future releases.
- This is a known limitation of the current flow-based generation.
These limitations are actively being addressed in upcoming versions.
## 🧪 Example Usage (Inference)
```python
import numpy as np
import torchaudio as ta
from huggingface_hub import snapshot_download
from safetensors.torch import load_file as load_safetensors
from chatterbox import mtl_tts
device = "cuda" # or "cpu" / "mps"
ckpt_dir = snapshot_download(
repo_id="NAMAA-Space/NAMAA-Saudi-TTS",
repo_type="model",
revision="main"
)
# Load model
model = mtl_tts.ChatterboxMultilingualTTS.from_pretrained(device=device)
t3_state = load_safetensors(
f"{ckpt_dir}/t3_mtl23ls_v2.safetensors",
device=device
)
model.t3.load_state_dict(t3_state)
model.t3.to(device).eval()
# Saudi Arabic text
text = "أنا الحين بروح الشغل وإذا رجعت بمرّ البقالة"
wav = model.generate(text, language_id="ar")
ta.save("namma_saudi.wav", wav, model.sr)
```
### 🔹 Inference with Reference Audio (Voice / Style Transfer)
```python
text = "آبي أخلص الشغل اليوم وأرتاح بكرة"
wav = model.generate(
text,
language_id="ar",
audio_prompt_path="/content/reference_saudi.wav"
)
ta.save("namma_saudi_ref.wav", wav, model.sr)
```
## 🧠 Base Model
This model is built on top of:
- **ResembleAI/chatterbox**
- **Chatterbox Multilingual TTS architecture**
The Saudi dialect behavior is achieved through **specialized configuration, prompting, and curated usage patterns**, rather than training focused on Modern Standard Arabic (MSA).
---
## 📜 License
This model is released under the **MIT License**, allowing both **research and commercial usage** with proper attribution.
---
## 🤝 Community & Contributions
Developed and maintained by **NAMAA Community**
*(Network for Advancing Modern Arabic NLP & AI)*
We welcome:
- Feedback and evaluations
- Dialect-specific test cases
- Contributions toward improving Arabic Text-to-Speech systems
---
## 📌 Citation
If you use this model in research or production, please cite:
```bibtex
@misc{namaa_saudi_tts,
title = {NAMAA-Saudi-TTS: Saudi Dialect Text-to-Speech},
author = {{NAMAA Community}},
year = {2026},
url = {https://huggingface.co/NAMAA-Space/NAMAA-Saudi-TTS}
}