saudi-msa-piper / INFERENCE_GUIDE.md
ISTNetworks's picture
Add comprehensive inference guide
6bc88af verified

Saudi MSA Piper TTS - Inference Guide

Complete guide for running the Saudi Arabic TTS model on any computer.

Quick Start

1. Download the Model

# Clone the repository
git clone https://huggingface.co/ISTNetworks/saudi-msa-piper
cd saudi-msa-piper

# Or download specific files
wget https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx
wget https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx.json

2. Install Dependencies

# Install piper-tts
pip install piper-tts

# Or install all dependencies
pip install -r requirements.txt

3. Run Inference

Option A: Using the provided Python script

python3 inference.py -t "مرحبا بك في نظام التحويل النصي إلى كلام" -o output.wav

Option B: Using the bash script

chmod +x inference.sh
./inference.sh "مرحبا بك" output.wav

Option C: Using piper directly

echo "مرحبا بك" | piper --model saudi_msa_epoch455.onnx --output_file output.wav

Detailed Usage

Python Script (inference.py)

The Python script provides the most flexibility and error handling.

Basic usage:

python3 inference.py -t "Arabic text here" -o output.wav

Read from stdin:

echo "مرحبا بك" | python3 inference.py -o output.wav

Read from file:

cat arabic_text.txt | python3 inference.py -o output.wav

Specify custom model path:

python3 inference.py -t "مرحبا بك" -m /path/to/model.onnx -o output.wav

Full options:

python3 inference.py --help

Options:
  -t, --text TEXT       Arabic text to synthesize
  -m, --model PATH      Path to ONNX model file
  -o, --output PATH     Output WAV file path (required)
  -c, --config PATH     Path to config JSON file (auto-detected)

Bash Script (inference.sh)

Simple shell script for quick inference.

Basic usage:

./inference.sh "مرحبا بك" output.wav

Read from stdin:

echo "مرحبا بك" | ./inference.sh - output.wav

Custom model path:

MODEL_FILE=/path/to/model.onnx ./inference.sh "مرحبا بك" output.wav

Direct Piper Usage

For advanced users who want direct control.

Basic:

echo "مرحبا بك" | piper --model saudi_msa_epoch455.onnx --output_file output.wav

With custom config:

echo "مرحبا بك" | piper \
  --model saudi_msa_epoch455.onnx \
  --config saudi_msa_epoch455.onnx.json \
  --output_file output.wav

Output to stdout (for piping):

echo "مرحبا بك" | piper --model saudi_msa_epoch455.onnx --output-raw | \
  aplay -r 22050 -f S16_LE -t raw -

Python API Usage

For integration into Python applications:

from piper import PiperVoice

# Load the model
voice = PiperVoice.load("saudi_msa_epoch455.onnx")

# Synthesize to file
with open("output.wav", "wb") as f:
    voice.synthesize_stream_raw("مرحبا بك في نظام التحويل النصي إلى كلام", f)

# Or get audio data
audio_data = voice.synthesize("مرحبا بك")

Advanced usage:

from piper import PiperVoice
import wave

# Load model
voice = PiperVoice.load("saudi_msa_epoch455.onnx")

# Synthesize with custom parameters
text = "مرحبا بك"

# Get raw audio
with open("output.wav", "wb") as f:
    # Synthesize
    voice.synthesize_stream_raw(text, f)

print("Audio generated successfully!")

System Requirements

Minimum Requirements

  • OS: Linux, macOS, or Windows
  • Python: 3.8 or higher
  • RAM: 2 GB
  • Storage: 100 MB for model files

Recommended Requirements

  • OS: Linux or macOS
  • Python: 3.10 or higher
  • RAM: 4 GB
  • Storage: 1 GB

Installation on Different Systems

Ubuntu/Debian

# Install system dependencies
sudo apt-get update
sudo apt-get install -y python3 python3-pip

# Install piper-tts
pip3 install piper-tts

# Download model
wget https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx
wget https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx.json

macOS

# Install Python (if not installed)
brew install python3

# Install piper-tts
pip3 install piper-tts

# Download model
curl -L -o saudi_msa_epoch455.onnx \
  https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx
curl -L -o saudi_msa_epoch455.onnx.json \
  https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx.json

Windows

# Install Python from python.org

# Install piper-tts
pip install piper-tts

# Download model (using PowerShell)
Invoke-WebRequest -Uri "https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx" -OutFile "saudi_msa_epoch455.onnx"
Invoke-WebRequest -Uri "https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx.json" -OutFile "saudi_msa_epoch455.onnx.json"

Example Use Cases

Customer Service Greeting

python3 inference.py -t "حياك الله عميلنا العزيز، كيف اقدر اساعدك اليوم؟" -o greeting.wav

Banking Message

python3 inference.py -t "تراني راسلت الفرع الرئيسي باكر الصبح، وان شا الله بيردون علينا قبل الظهر" -o banking.wav

Batch Processing

# Process multiple texts
while IFS= read -r line; do
    filename=$(echo "$line" | md5sum | cut -d' ' -f1).wav
    python3 inference.py -t "$line" -o "$filename"
done < texts.txt

Web Service Integration

from flask import Flask, request, send_file
from piper import PiperVoice
import tempfile

app = Flask(__name__)
voice = PiperVoice.load("saudi_msa_epoch455.onnx")

@app.route('/synthesize', methods=['POST'])
def synthesize():
    text = request.json.get('text')
    
    # Create temporary file
    with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as f:
        voice.synthesize_stream_raw(text, f)
        temp_path = f.name
    
    return send_file(temp_path, mimetype='audio/wav')

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Troubleshooting

Model file not found

# Make sure you're in the correct directory
ls -lh saudi_msa_epoch455.onnx

# Or specify full path
python3 inference.py -m /full/path/to/saudi_msa_epoch455.onnx -t "مرحبا" -o output.wav

Config file not found

# The config file should have the same name as the model with .json extension
# saudi_msa_epoch455.onnx -> saudi_msa_epoch455.onnx.json

# Or specify manually
python3 inference.py -t "مرحبا" -c config.json -o output.wav

piper-tts not installed

pip install piper-tts

# If that fails, try:
pip install --upgrade pip
pip install piper-tts

Permission denied

chmod +x inference.sh
chmod +x inference.py

Performance Tips

  1. First run is slower: The model loads into memory on first use
  2. Batch processing: Load the model once and reuse for multiple texts
  3. Memory usage: The model uses ~500 MB RAM when loaded
  4. CPU vs GPU: This model runs on CPU; no GPU required

File Structure

After downloading, you should have:

saudi-msa-piper/
├── saudi_msa_epoch455.onnx          # Main model file (61 MB)
├── saudi_msa_epoch455.onnx.json     # Config file (5 KB)
├── inference.py                      # Python inference script
├── inference.sh                      # Bash inference script
├── INFERENCE_GUIDE.md               # This guide
└── requirements.txt                  # Python dependencies

Support

For issues or questions:

License

This model is based on Piper TTS (GPL-3.0 license).