saudi-msa-piper / INFERENCE_GUIDE.md
ISTNetworks's picture
Add comprehensive inference guide
6bc88af verified
# Saudi MSA Piper TTS - Inference Guide
Complete guide for running the Saudi Arabic TTS model on any computer.
## Quick Start
### 1. Download the Model
```bash
# Clone the repository
git clone https://huggingface.co/ISTNetworks/saudi-msa-piper
cd saudi-msa-piper
# Or download specific files
wget https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx
wget https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx.json
```
### 2. Install Dependencies
```bash
# Install piper-tts
pip install piper-tts
# Or install all dependencies
pip install -r requirements.txt
```
### 3. Run Inference
**Option A: Using the provided Python script**
```bash
python3 inference.py -t "مرحبا بك في نظام التحويل النصي إلى كلام" -o output.wav
```
**Option B: Using the bash script**
```bash
chmod +x inference.sh
./inference.sh "مرحبا بك" output.wav
```
**Option C: Using piper directly**
```bash
echo "مرحبا بك" | piper --model saudi_msa_epoch455.onnx --output_file output.wav
```
## Detailed Usage
### Python Script (inference.py)
The Python script provides the most flexibility and error handling.
**Basic usage:**
```bash
python3 inference.py -t "Arabic text here" -o output.wav
```
**Read from stdin:**
```bash
echo "مرحبا بك" | python3 inference.py -o output.wav
```
**Read from file:**
```bash
cat arabic_text.txt | python3 inference.py -o output.wav
```
**Specify custom model path:**
```bash
python3 inference.py -t "مرحبا بك" -m /path/to/model.onnx -o output.wav
```
**Full options:**
```bash
python3 inference.py --help
Options:
-t, --text TEXT Arabic text to synthesize
-m, --model PATH Path to ONNX model file
-o, --output PATH Output WAV file path (required)
-c, --config PATH Path to config JSON file (auto-detected)
```
### Bash Script (inference.sh)
Simple shell script for quick inference.
**Basic usage:**
```bash
./inference.sh "مرحبا بك" output.wav
```
**Read from stdin:**
```bash
echo "مرحبا بك" | ./inference.sh - output.wav
```
**Custom model path:**
```bash
MODEL_FILE=/path/to/model.onnx ./inference.sh "مرحبا بك" output.wav
```
### Direct Piper Usage
For advanced users who want direct control.
**Basic:**
```bash
echo "مرحبا بك" | piper --model saudi_msa_epoch455.onnx --output_file output.wav
```
**With custom config:**
```bash
echo "مرحبا بك" | piper \
--model saudi_msa_epoch455.onnx \
--config saudi_msa_epoch455.onnx.json \
--output_file output.wav
```
**Output to stdout (for piping):**
```bash
echo "مرحبا بك" | piper --model saudi_msa_epoch455.onnx --output-raw | \
aplay -r 22050 -f S16_LE -t raw -
```
## Python API Usage
For integration into Python applications:
```python
from piper import PiperVoice
# Load the model
voice = PiperVoice.load("saudi_msa_epoch455.onnx")
# Synthesize to file
with open("output.wav", "wb") as f:
voice.synthesize_stream_raw("مرحبا بك في نظام التحويل النصي إلى كلام", f)
# Or get audio data
audio_data = voice.synthesize("مرحبا بك")
```
**Advanced usage:**
```python
from piper import PiperVoice
import wave
# Load model
voice = PiperVoice.load("saudi_msa_epoch455.onnx")
# Synthesize with custom parameters
text = "مرحبا بك"
# Get raw audio
with open("output.wav", "wb") as f:
# Synthesize
voice.synthesize_stream_raw(text, f)
print("Audio generated successfully!")
```
## System Requirements
### Minimum Requirements
- **OS:** Linux, macOS, or Windows
- **Python:** 3.8 or higher
- **RAM:** 2 GB
- **Storage:** 100 MB for model files
### Recommended Requirements
- **OS:** Linux or macOS
- **Python:** 3.10 or higher
- **RAM:** 4 GB
- **Storage:** 1 GB
## Installation on Different Systems
### Ubuntu/Debian
```bash
# Install system dependencies
sudo apt-get update
sudo apt-get install -y python3 python3-pip
# Install piper-tts
pip3 install piper-tts
# Download model
wget https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx
wget https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx.json
```
### macOS
```bash
# Install Python (if not installed)
brew install python3
# Install piper-tts
pip3 install piper-tts
# Download model
curl -L -o saudi_msa_epoch455.onnx \
https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx
curl -L -o saudi_msa_epoch455.onnx.json \
https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx.json
```
### Windows
```powershell
# Install Python from python.org
# Install piper-tts
pip install piper-tts
# Download model (using PowerShell)
Invoke-WebRequest -Uri "https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx" -OutFile "saudi_msa_epoch455.onnx"
Invoke-WebRequest -Uri "https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx.json" -OutFile "saudi_msa_epoch455.onnx.json"
```
## Example Use Cases
### Customer Service Greeting
```bash
python3 inference.py -t "حياك الله عميلنا العزيز، كيف اقدر اساعدك اليوم؟" -o greeting.wav
```
### Banking Message
```bash
python3 inference.py -t "تراني راسلت الفرع الرئيسي باكر الصبح، وان شا الله بيردون علينا قبل الظهر" -o banking.wav
```
### Batch Processing
```bash
# Process multiple texts
while IFS= read -r line; do
filename=$(echo "$line" | md5sum | cut -d' ' -f1).wav
python3 inference.py -t "$line" -o "$filename"
done < texts.txt
```
### Web Service Integration
```python
from flask import Flask, request, send_file
from piper import PiperVoice
import tempfile
app = Flask(__name__)
voice = PiperVoice.load("saudi_msa_epoch455.onnx")
@app.route('/synthesize', methods=['POST'])
def synthesize():
text = request.json.get('text')
# Create temporary file
with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as f:
voice.synthesize_stream_raw(text, f)
temp_path = f.name
return send_file(temp_path, mimetype='audio/wav')
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
```
## Troubleshooting
### Model file not found
```bash
# Make sure you're in the correct directory
ls -lh saudi_msa_epoch455.onnx
# Or specify full path
python3 inference.py -m /full/path/to/saudi_msa_epoch455.onnx -t "مرحبا" -o output.wav
```
### Config file not found
```bash
# The config file should have the same name as the model with .json extension
# saudi_msa_epoch455.onnx -> saudi_msa_epoch455.onnx.json
# Or specify manually
python3 inference.py -t "مرحبا" -c config.json -o output.wav
```
### piper-tts not installed
```bash
pip install piper-tts
# If that fails, try:
pip install --upgrade pip
pip install piper-tts
```
### Permission denied
```bash
chmod +x inference.sh
chmod +x inference.py
```
## Performance Tips
1. **First run is slower:** The model loads into memory on first use
2. **Batch processing:** Load the model once and reuse for multiple texts
3. **Memory usage:** The model uses ~500 MB RAM when loaded
4. **CPU vs GPU:** This model runs on CPU; no GPU required
## File Structure
After downloading, you should have:
```
saudi-msa-piper/
├── saudi_msa_epoch455.onnx # Main model file (61 MB)
├── saudi_msa_epoch455.onnx.json # Config file (5 KB)
├── inference.py # Python inference script
├── inference.sh # Bash inference script
├── INFERENCE_GUIDE.md # This guide
└── requirements.txt # Python dependencies
```
## Support
For issues or questions:
- Repository: https://huggingface.co/ISTNetworks/saudi-msa-piper
- Piper TTS: https://github.com/rhasspy/piper
## License
This model is based on Piper TTS (GPL-3.0 license).