Upload Saudi Arabic Piper TTS model - Epoch 455

b51190f verified 3 months ago

4.34 kB

	# Saudi Arabic (MSA) TTS Model - Piper

	This repository contains a high-quality Piper TTS model trained on Saudi Arabic (Modern Standard Arabic) dataset for 455 epochs.

	## Model Details

	- Language: Arabic (Saudi dialect)
	- Framework: Piper TTS
	- Sample Rate: 22050 Hz
	- Training Epochs: 455
	- Dataset Size: 11,592 audio samples
	- Speakers: 5 speakers (SPK1-SPK5)
	- Model Quality: Professional grade

	## Model Files

	- `checkpoints/epoch=455-step=1189248.ckpt` - PyTorch Lightning checkpoint (807 MB)
	- `config.json` - Model configuration file
	- `training_data.csv` - Training dataset metadata
	- `scripts/export_jit.py` - ONNX export script

	## Quick Start

	### Export to ONNX

	```bash
	python3 scripts/export_jit.py
	```

	This will create an ONNX model file that can be used with Piper for inference.

	### Usage with Piper

	```bash
	# Install Piper TTS
	pip install piper-tts

	# After exporting to ONNX
	echo 'مرحبا بك في نظام التحويل النصي إلى كلام' \| \
	piper --model saudi_msa_epoch455.onnx --output_file output.wav
	```

	### Python Usage

	```python
	from piper import PiperVoice

	voice = PiperVoice.load("saudi_msa_epoch455.onnx")

	# Synthesize speech
	with open("output.wav", "wb") as f:
	voice.synthesize_stream_raw("مرحبا بك", f)
	```

	## Training Details

	### Dataset Statistics

	\| Speaker \| Samples \|
	\|---------\|---------\|
	\| SPK1 \| 3,000 \|
	\| SPK2 \| 714 \|
	\| SPK3 \| 1,656 \|
	\| SPK4 \| 2,057 \|
	\| SPK5 \| 4,193 \|
	\| Total \| 11,592 \|

	### Training Configuration

	```yaml
	voice_name: saudi_msa
	sample_rate: 22050
	espeak_voice: ar
	batch_size: 8
	epochs: 455
	optimizer: Adam
	```

	### Training Environment

	- Python 3.11
	- PyTorch 2.x with CUDA
	- Lightning 2.x
	- Total training time: ~85+ hours

	## Model Performance

	This model has been trained for 455 epochs, providing:

	- ✅ Excellent audio quality with minimal background noise
	- ✅ Clear pronunciation of Arabic words
	- ✅ Natural prosody and intonation
	- ✅ Professional-grade output suitable for production use

	The model performs exceptionally well on:
	- Customer service dialogues
	- Banking and financial terminology
	- General conversational Arabic
	- Saudi dialect expressions

	## Export Instructions

	To export the checkpoint to ONNX format:

	```bash
	cd scripts
	python3 export_jit.py
	```

	The script will:
	1. Load the checkpoint from `checkpoints/epoch=455-step=1189248.ckpt`
	2. Export to ONNX format with optimizations
	3. Create `saudi_msa_epoch455.onnx` file

	Make sure to copy the `config.json` file alongside the ONNX model:

	```bash
	cp config.json saudi_msa_epoch455.onnx.json
	```

	## Files Structure

	```
	.
	├── README.md
	├── config.json # Model configuration
	├── training_data.csv # Dataset metadata
	├── checkpoints/
	│ └── epoch=455-step=1189248.ckpt # Latest checkpoint (807 MB)
	└── scripts/
	├── export_jit.py # ONNX export script
	├── train_piper.sh # Training script
	└── create_training_file.py # Data preparation script
	```

	## License

	This model is trained using the Piper TTS framework which is licensed under GPL-3.0.

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{saudi_msa_piper_2026,
	title={Saudi Arabic TTS Model for Piper - Epoch 455},
	author={Piper MSA Project},
	year={2026},
	publisher={Hugging Face},
	howpublished={\url{https://huggingface.co/YOUR_USERNAME/saudi-msa-piper}}
	}
	```

	## Acknowledgments

	- Piper TTS: https://github.com/rhasspy/piper
	- eSpeak-ng for Arabic phonemization
	- Original dataset contributors

	## Sample Usage

	```python
	# Example: Generate customer service greeting
	text = "حياك الله عميلنا العزيز، كيف اقدر اساعدك اليوم؟"
	echo text \| piper --model saudi_msa_epoch455.onnx --output_file greeting.wav
	```

	## Model Comparison

	\| Epoch \| Quality \| Noise Level \| Clarity \|
	\|-------\|---------\|-------------\|---------\|
	\| 65 \| Good \| Moderate \| Fair \|
	\| 176 \| Very Good \| Low \| Good \|
	\| 438 \| Excellent \| Very Low \| Excellent \|
	\| 455 \| Professional \| Minimal \| Excellent \|

	---

	For questions or issues, please open an issue on the repository.