Add comprehensive inference guide

6bc88af verified 3 months ago

8 kB

	# Saudi MSA Piper TTS - Inference Guide

	Complete guide for running the Saudi Arabic TTS model on any computer.

	## Quick Start

	### 1. Download the Model

	```bash
	# Clone the repository
	git clone https://huggingface.co/ISTNetworks/saudi-msa-piper
	cd saudi-msa-piper

	# Or download specific files
	wget https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx
	wget https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx.json
	```

	### 2. Install Dependencies

	```bash
	# Install piper-tts
	pip install piper-tts

	# Or install all dependencies
	pip install -r requirements.txt
	```

	### 3. Run Inference

	Option A: Using the provided Python script
	```bash
	python3 inference.py -t "مرحبا بك في نظام التحويل النصي إلى كلام" -o output.wav
	```

	Option B: Using the bash script
	```bash
	chmod +x inference.sh
	./inference.sh "مرحبا بك" output.wav
	```

	Option C: Using piper directly
	```bash
	echo "مرحبا بك" \| piper --model saudi_msa_epoch455.onnx --output_file output.wav
	```

	## Detailed Usage

	### Python Script (inference.py)

	The Python script provides the most flexibility and error handling.

	Basic usage:
	```bash
	python3 inference.py -t "Arabic text here" -o output.wav
	```

	Read from stdin:
	```bash
	echo "مرحبا بك" \| python3 inference.py -o output.wav
	```

	Read from file:
	```bash
	cat arabic_text.txt \| python3 inference.py -o output.wav
	```

	Specify custom model path:
	```bash
	python3 inference.py -t "مرحبا بك" -m /path/to/model.onnx -o output.wav
	```

	Full options:
	```bash
	python3 inference.py --help

	Options:
	-t, --text TEXT Arabic text to synthesize
	-m, --model PATH Path to ONNX model file
	-o, --output PATH Output WAV file path (required)
	-c, --config PATH Path to config JSON file (auto-detected)
	```

	### Bash Script (inference.sh)

	Simple shell script for quick inference.

	Basic usage:
	```bash
	./inference.sh "مرحبا بك" output.wav
	```

	Read from stdin:
	```bash
	echo "مرحبا بك" \| ./inference.sh - output.wav
	```

	Custom model path:
	```bash
	MODEL_FILE=/path/to/model.onnx ./inference.sh "مرحبا بك" output.wav
	```

	### Direct Piper Usage

	For advanced users who want direct control.

	Basic:
	```bash
	echo "مرحبا بك" \| piper --model saudi_msa_epoch455.onnx --output_file output.wav
	```

	With custom config:
	```bash
	echo "مرحبا بك" \| piper \
	--model saudi_msa_epoch455.onnx \
	--config saudi_msa_epoch455.onnx.json \
	--output_file output.wav
	```

	Output to stdout (for piping):
	```bash
	echo "مرحبا بك" \| piper --model saudi_msa_epoch455.onnx --output-raw \| \
	aplay -r 22050 -f S16_LE -t raw -
	```

	## Python API Usage

	For integration into Python applications:

	```python
	from piper import PiperVoice

	# Load the model
	voice = PiperVoice.load("saudi_msa_epoch455.onnx")

	# Synthesize to file
	with open("output.wav", "wb") as f:
	voice.synthesize_stream_raw("مرحبا بك في نظام التحويل النصي إلى كلام", f)

	# Or get audio data
	audio_data = voice.synthesize("مرحبا بك")
	```

	Advanced usage:
	```python
	from piper import PiperVoice
	import wave

	# Load model
	voice = PiperVoice.load("saudi_msa_epoch455.onnx")

	# Synthesize with custom parameters
	text = "مرحبا بك"

	# Get raw audio
	with open("output.wav", "wb") as f:
	# Synthesize
	voice.synthesize_stream_raw(text, f)

	print("Audio generated successfully!")
	```

	## System Requirements

	### Minimum Requirements
	- OS: Linux, macOS, or Windows
	- Python: 3.8 or higher
	- RAM: 2 GB
	- Storage: 100 MB for model files

	### Recommended Requirements
	- OS: Linux or macOS
	- Python: 3.10 or higher
	- RAM: 4 GB
	- Storage: 1 GB

	## Installation on Different Systems

	### Ubuntu/Debian
	```bash
	# Install system dependencies
	sudo apt-get update
	sudo apt-get install -y python3 python3-pip

	# Install piper-tts
	pip3 install piper-tts

	# Download model
	wget https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx
	wget https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx.json
	```

	### macOS
	```bash
	# Install Python (if not installed)
	brew install python3

	# Install piper-tts
	pip3 install piper-tts

	# Download model
	curl -L -o saudi_msa_epoch455.onnx \
	https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx
	curl -L -o saudi_msa_epoch455.onnx.json \
	https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx.json
	```

	### Windows
	```powershell
	# Install Python from python.org

	# Install piper-tts
	pip install piper-tts

	# Download model (using PowerShell)
	Invoke-WebRequest -Uri "https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx" -OutFile "saudi_msa_epoch455.onnx"
	Invoke-WebRequest -Uri "https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx.json" -OutFile "saudi_msa_epoch455.onnx.json"
	```

	## Example Use Cases

	### Customer Service Greeting
	```bash
	python3 inference.py -t "حياك الله عميلنا العزيز، كيف اقدر اساعدك اليوم؟" -o greeting.wav
	```

	### Banking Message
	```bash
	python3 inference.py -t "تراني راسلت الفرع الرئيسي باكر الصبح، وان شا الله بيردون علينا قبل الظهر" -o banking.wav
	```

	### Batch Processing
	```bash
	# Process multiple texts
	while IFS= read -r line; do
	filename=$(echo "$line" \| md5sum \| cut -d' ' -f1).wav
	python3 inference.py -t "$line" -o "$filename"
	done < texts.txt
	```

	### Web Service Integration
	```python
	from flask import Flask, request, send_file
	from piper import PiperVoice
	import tempfile

	app = Flask(__name__)
	voice = PiperVoice.load("saudi_msa_epoch455.onnx")

	@app.route('/synthesize', methods=['POST'])
	def synthesize():
	text = request.json.get('text')

	# Create temporary file
	with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as f:
	voice.synthesize_stream_raw(text, f)
	temp_path = f.name

	return send_file(temp_path, mimetype='audio/wav')

	if __name__ == '__main__':
	app.run(host='0.0.0.0', port=5000)
	```

	## Troubleshooting

	### Model file not found
	```bash
	# Make sure you're in the correct directory
	ls -lh saudi_msa_epoch455.onnx

	# Or specify full path
	python3 inference.py -m /full/path/to/saudi_msa_epoch455.onnx -t "مرحبا" -o output.wav
	```

	### Config file not found
	```bash
	# The config file should have the same name as the model with .json extension
	# saudi_msa_epoch455.onnx -> saudi_msa_epoch455.onnx.json

	# Or specify manually
	python3 inference.py -t "مرحبا" -c config.json -o output.wav
	```

	### piper-tts not installed
	```bash
	pip install piper-tts

	# If that fails, try:
	pip install --upgrade pip
	pip install piper-tts
	```

	### Permission denied
	```bash
	chmod +x inference.sh
	chmod +x inference.py
	```

	## Performance Tips

	1. First run is slower: The model loads into memory on first use
	2. Batch processing: Load the model once and reuse for multiple texts
	3. Memory usage: The model uses ~500 MB RAM when loaded
	4. CPU vs GPU: This model runs on CPU; no GPU required

	## File Structure

	After downloading, you should have:
	```
	saudi-msa-piper/
	├── saudi_msa_epoch455.onnx # Main model file (61 MB)
	├── saudi_msa_epoch455.onnx.json # Config file (5 KB)
	├── inference.py # Python inference script
	├── inference.sh # Bash inference script
	├── INFERENCE_GUIDE.md # This guide
	└── requirements.txt # Python dependencies
	```

	## Support

	For issues or questions:
	- Repository: https://huggingface.co/ISTNetworks/saudi-msa-piper
	- Piper TTS: https://github.com/rhasspy/piper

	## License

	This model is based on Piper TTS (GPL-3.0 license).