๐ธ๐ฆ NAMAA-Saudi-TTS
NAMAA-Saudi-TTS is a Saudi Arabic Text-to-Speech (TTS) model built on top of the Chatterbox Multilingual TTS architecture.
The model is configured and refined to generate natural Saudi dialect speech, targeting everyday conversational usage rather than Modern Standard Arabic (MSA).
This model is developed and released by NAMAA Community (Network for Advancing Modern Arabic AI) as part of its efforts to advance high-quality Arabic speech and language technologies.
๐ Live Demo (Hugging Face Space)
๐ Try the model here:
https://huggingface.co/spaces/omarelshehy/NAMAA-Saudi-Voice
โจ Model Capabilities
The model supports:
- Saudi Arabic text input (
language_id = "ar") - Natural conversational prosody
- Saudi dialect phrasing and rhythm
- Optional reference audio prompting for:
- Speaker similarity
- Style and tone transfer
- GPU-accelerated inference
This repository contains all required model checkpoints and assets for local or hosted inference.
๐ฃ๏ธ Example Text (Saudi Dialect)
ุขุจู ุฃุฑูุญ ุงูุจูุงูุฉ ุฃุดุชุฑู ูู
ุบุฑุถ ูุฃุฑุฌุน ุจุณุฑุนุฉ.
โ ๏ธ Limitations
Please be aware of the following current limitations:
- Lack of tashkeel may affect pronunciation accuracy.
- Numeric normalization will be improved in future releases.
- This is a known limitation of the current flow-based generation.
These limitations are actively being addressed in upcoming versions.
๐งช Example Usage (Inference)
import numpy as np
import torchaudio as ta
from huggingface_hub import snapshot_download
from safetensors.torch import load_file as load_safetensors
from chatterbox import mtl_tts
device = "cuda" # or "cpu" / "mps"
ckpt_dir = snapshot_download(
repo_id="NAMAA-Space/NAMAA-Saudi-TTS",
repo_type="model",
revision="main"
)
# Load model
model = mtl_tts.ChatterboxMultilingualTTS.from_pretrained(device=device)
t3_state = load_safetensors(
f"{ckpt_dir}/t3_mtl23ls_v2.safetensors",
device=device
)
model.t3.load_state_dict(t3_state)
model.t3.to(device).eval()
# Saudi Arabic text
text = "ุฃูุง ุงูุญูู ุจุฑูุญ ุงูุดุบู ูุฅุฐุง ุฑุฌุนุช ุจู
ุฑู ุงูุจูุงูุฉ"
wav = model.generate(text, language_id="ar")
ta.save("namma_saudi.wav", wav, model.sr)
๐น Inference with Reference Audio (Voice / Style Transfer)
text = "ุขุจู ุฃุฎูุต ุงูุดุบู ุงูููู
ูุฃุฑุชุงุญ ุจูุฑุฉ"
wav = model.generate(
text,
language_id="ar",
audio_prompt_path="/content/reference_saudi.wav"
)
ta.save("namma_saudi_ref.wav", wav, model.sr)
๐ง Base Model
This model is built on top of:
- ResembleAI/chatterbox
- Chatterbox Multilingual TTS architecture
The Saudi dialect behavior is achieved through specialized configuration, prompting, and curated usage patterns, rather than training focused on Modern Standard Arabic (MSA).
๐ License
This model is released under the MIT License, allowing both research and commercial usage with proper attribution.
๐ค Community & Contributions
Developed and maintained by NAMAA Community
(Network for Advancing Modern Arabic NLP & AI)
We welcome:
- Feedback and evaluations
- Dialect-specific test cases
- Contributions toward improving Arabic Text-to-Speech systems
๐ Citation
If you use this model in research or production, please cite:
@misc{namaa_saudi_tts,
title = {NAMAA-Saudi-TTS: Saudi Dialect Text-to-Speech},
author = {{NAMAA Community}},
year = {2026},
url = {https://huggingface.co/NAMAA-Space/NAMAA-Saudi-TTS}
}
- Downloads last month
- -
Model tree for NAMAA-Space/NAMAA-Saudi-TTS
Base model
ResembleAI/chatterbox