ISTNetworks commited on
Commit
6bc88af
·
verified ·
1 Parent(s): b67625e

Add comprehensive inference guide

Browse files
Files changed (1) hide show
  1. INFERENCE_GUIDE.md +328 -0
INFERENCE_GUIDE.md ADDED
@@ -0,0 +1,328 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Saudi MSA Piper TTS - Inference Guide
2
+
3
+ Complete guide for running the Saudi Arabic TTS model on any computer.
4
+
5
+ ## Quick Start
6
+
7
+ ### 1. Download the Model
8
+
9
+ ```bash
10
+ # Clone the repository
11
+ git clone https://huggingface.co/ISTNetworks/saudi-msa-piper
12
+ cd saudi-msa-piper
13
+
14
+ # Or download specific files
15
+ wget https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx
16
+ wget https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx.json
17
+ ```
18
+
19
+ ### 2. Install Dependencies
20
+
21
+ ```bash
22
+ # Install piper-tts
23
+ pip install piper-tts
24
+
25
+ # Or install all dependencies
26
+ pip install -r requirements.txt
27
+ ```
28
+
29
+ ### 3. Run Inference
30
+
31
+ **Option A: Using the provided Python script**
32
+ ```bash
33
+ python3 inference.py -t "مرحبا بك في نظام التحويل النصي إلى كلام" -o output.wav
34
+ ```
35
+
36
+ **Option B: Using the bash script**
37
+ ```bash
38
+ chmod +x inference.sh
39
+ ./inference.sh "مرحبا بك" output.wav
40
+ ```
41
+
42
+ **Option C: Using piper directly**
43
+ ```bash
44
+ echo "مرحبا بك" | piper --model saudi_msa_epoch455.onnx --output_file output.wav
45
+ ```
46
+
47
+ ## Detailed Usage
48
+
49
+ ### Python Script (inference.py)
50
+
51
+ The Python script provides the most flexibility and error handling.
52
+
53
+ **Basic usage:**
54
+ ```bash
55
+ python3 inference.py -t "Arabic text here" -o output.wav
56
+ ```
57
+
58
+ **Read from stdin:**
59
+ ```bash
60
+ echo "مرحبا بك" | python3 inference.py -o output.wav
61
+ ```
62
+
63
+ **Read from file:**
64
+ ```bash
65
+ cat arabic_text.txt | python3 inference.py -o output.wav
66
+ ```
67
+
68
+ **Specify custom model path:**
69
+ ```bash
70
+ python3 inference.py -t "مرحبا بك" -m /path/to/model.onnx -o output.wav
71
+ ```
72
+
73
+ **Full options:**
74
+ ```bash
75
+ python3 inference.py --help
76
+
77
+ Options:
78
+ -t, --text TEXT Arabic text to synthesize
79
+ -m, --model PATH Path to ONNX model file
80
+ -o, --output PATH Output WAV file path (required)
81
+ -c, --config PATH Path to config JSON file (auto-detected)
82
+ ```
83
+
84
+ ### Bash Script (inference.sh)
85
+
86
+ Simple shell script for quick inference.
87
+
88
+ **Basic usage:**
89
+ ```bash
90
+ ./inference.sh "مرحبا بك" output.wav
91
+ ```
92
+
93
+ **Read from stdin:**
94
+ ```bash
95
+ echo "مرحبا بك" | ./inference.sh - output.wav
96
+ ```
97
+
98
+ **Custom model path:**
99
+ ```bash
100
+ MODEL_FILE=/path/to/model.onnx ./inference.sh "مرحبا بك" output.wav
101
+ ```
102
+
103
+ ### Direct Piper Usage
104
+
105
+ For advanced users who want direct control.
106
+
107
+ **Basic:**
108
+ ```bash
109
+ echo "مرحبا بك" | piper --model saudi_msa_epoch455.onnx --output_file output.wav
110
+ ```
111
+
112
+ **With custom config:**
113
+ ```bash
114
+ echo "مرحبا بك" | piper \
115
+ --model saudi_msa_epoch455.onnx \
116
+ --config saudi_msa_epoch455.onnx.json \
117
+ --output_file output.wav
118
+ ```
119
+
120
+ **Output to stdout (for piping):**
121
+ ```bash
122
+ echo "مرحبا بك" | piper --model saudi_msa_epoch455.onnx --output-raw | \
123
+ aplay -r 22050 -f S16_LE -t raw -
124
+ ```
125
+
126
+ ## Python API Usage
127
+
128
+ For integration into Python applications:
129
+
130
+ ```python
131
+ from piper import PiperVoice
132
+
133
+ # Load the model
134
+ voice = PiperVoice.load("saudi_msa_epoch455.onnx")
135
+
136
+ # Synthesize to file
137
+ with open("output.wav", "wb") as f:
138
+ voice.synthesize_stream_raw("مرحبا بك في نظام التحويل النصي إلى كلام", f)
139
+
140
+ # Or get audio data
141
+ audio_data = voice.synthesize("مرحبا بك")
142
+ ```
143
+
144
+ **Advanced usage:**
145
+ ```python
146
+ from piper import PiperVoice
147
+ import wave
148
+
149
+ # Load model
150
+ voice = PiperVoice.load("saudi_msa_epoch455.onnx")
151
+
152
+ # Synthesize with custom parameters
153
+ text = "مرحبا بك"
154
+
155
+ # Get raw audio
156
+ with open("output.wav", "wb") as f:
157
+ # Synthesize
158
+ voice.synthesize_stream_raw(text, f)
159
+
160
+ print("Audio generated successfully!")
161
+ ```
162
+
163
+ ## System Requirements
164
+
165
+ ### Minimum Requirements
166
+ - **OS:** Linux, macOS, or Windows
167
+ - **Python:** 3.8 or higher
168
+ - **RAM:** 2 GB
169
+ - **Storage:** 100 MB for model files
170
+
171
+ ### Recommended Requirements
172
+ - **OS:** Linux or macOS
173
+ - **Python:** 3.10 or higher
174
+ - **RAM:** 4 GB
175
+ - **Storage:** 1 GB
176
+
177
+ ## Installation on Different Systems
178
+
179
+ ### Ubuntu/Debian
180
+ ```bash
181
+ # Install system dependencies
182
+ sudo apt-get update
183
+ sudo apt-get install -y python3 python3-pip
184
+
185
+ # Install piper-tts
186
+ pip3 install piper-tts
187
+
188
+ # Download model
189
+ wget https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx
190
+ wget https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx.json
191
+ ```
192
+
193
+ ### macOS
194
+ ```bash
195
+ # Install Python (if not installed)
196
+ brew install python3
197
+
198
+ # Install piper-tts
199
+ pip3 install piper-tts
200
+
201
+ # Download model
202
+ curl -L -o saudi_msa_epoch455.onnx \
203
+ https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx
204
+ curl -L -o saudi_msa_epoch455.onnx.json \
205
+ https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx.json
206
+ ```
207
+
208
+ ### Windows
209
+ ```powershell
210
+ # Install Python from python.org
211
+
212
+ # Install piper-tts
213
+ pip install piper-tts
214
+
215
+ # Download model (using PowerShell)
216
+ Invoke-WebRequest -Uri "https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx" -OutFile "saudi_msa_epoch455.onnx"
217
+ Invoke-WebRequest -Uri "https://huggingface.co/ISTNetworks/saudi-msa-piper/resolve/main/saudi_msa_epoch455.onnx.json" -OutFile "saudi_msa_epoch455.onnx.json"
218
+ ```
219
+
220
+ ## Example Use Cases
221
+
222
+ ### Customer Service Greeting
223
+ ```bash
224
+ python3 inference.py -t "حياك الله عميلنا العزيز، كيف اقدر اساعدك اليوم؟" -o greeting.wav
225
+ ```
226
+
227
+ ### Banking Message
228
+ ```bash
229
+ python3 inference.py -t "تراني راسلت الفرع الرئيسي باكر الصبح، وان شا الله بيردون علينا قبل الظهر" -o banking.wav
230
+ ```
231
+
232
+ ### Batch Processing
233
+ ```bash
234
+ # Process multiple texts
235
+ while IFS= read -r line; do
236
+ filename=$(echo "$line" | md5sum | cut -d' ' -f1).wav
237
+ python3 inference.py -t "$line" -o "$filename"
238
+ done < texts.txt
239
+ ```
240
+
241
+ ### Web Service Integration
242
+ ```python
243
+ from flask import Flask, request, send_file
244
+ from piper import PiperVoice
245
+ import tempfile
246
+
247
+ app = Flask(__name__)
248
+ voice = PiperVoice.load("saudi_msa_epoch455.onnx")
249
+
250
+ @app.route('/synthesize', methods=['POST'])
251
+ def synthesize():
252
+ text = request.json.get('text')
253
+
254
+ # Create temporary file
255
+ with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as f:
256
+ voice.synthesize_stream_raw(text, f)
257
+ temp_path = f.name
258
+
259
+ return send_file(temp_path, mimetype='audio/wav')
260
+
261
+ if __name__ == '__main__':
262
+ app.run(host='0.0.0.0', port=5000)
263
+ ```
264
+
265
+ ## Troubleshooting
266
+
267
+ ### Model file not found
268
+ ```bash
269
+ # Make sure you're in the correct directory
270
+ ls -lh saudi_msa_epoch455.onnx
271
+
272
+ # Or specify full path
273
+ python3 inference.py -m /full/path/to/saudi_msa_epoch455.onnx -t "مرحبا" -o output.wav
274
+ ```
275
+
276
+ ### Config file not found
277
+ ```bash
278
+ # The config file should have the same name as the model with .json extension
279
+ # saudi_msa_epoch455.onnx -> saudi_msa_epoch455.onnx.json
280
+
281
+ # Or specify manually
282
+ python3 inference.py -t "مرحبا" -c config.json -o output.wav
283
+ ```
284
+
285
+ ### piper-tts not installed
286
+ ```bash
287
+ pip install piper-tts
288
+
289
+ # If that fails, try:
290
+ pip install --upgrade pip
291
+ pip install piper-tts
292
+ ```
293
+
294
+ ### Permission denied
295
+ ```bash
296
+ chmod +x inference.sh
297
+ chmod +x inference.py
298
+ ```
299
+
300
+ ## Performance Tips
301
+
302
+ 1. **First run is slower:** The model loads into memory on first use
303
+ 2. **Batch processing:** Load the model once and reuse for multiple texts
304
+ 3. **Memory usage:** The model uses ~500 MB RAM when loaded
305
+ 4. **CPU vs GPU:** This model runs on CPU; no GPU required
306
+
307
+ ## File Structure
308
+
309
+ After downloading, you should have:
310
+ ```
311
+ saudi-msa-piper/
312
+ ├── saudi_msa_epoch455.onnx # Main model file (61 MB)
313
+ ├── saudi_msa_epoch455.onnx.json # Config file (5 KB)
314
+ ├── inference.py # Python inference script
315
+ ├── inference.sh # Bash inference script
316
+ ├── INFERENCE_GUIDE.md # This guide
317
+ └── requirements.txt # Python dependencies
318
+ ```
319
+
320
+ ## Support
321
+
322
+ For issues or questions:
323
+ - Repository: https://huggingface.co/ISTNetworks/saudi-msa-piper
324
+ - Piper TTS: https://github.com/rhasspy/piper
325
+
326
+ ## License
327
+
328
+ This model is based on Piper TTS (GPL-3.0 license).