hamdallah commited on
Commit
dc2f36e
ยท
verified ยท
1 Parent(s): 03876bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +365 -10
README.md CHANGED
@@ -1,21 +1,376 @@
1
- # hamdallah/Sofelia-TTS
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
- Fine-tuned MiraTTS checkpoint.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
- ## Usage (CLI)
6
  ```bash
7
- python training/test_miratts.py \
8
- --checkpoint hamdallah/Sofelia-TTS \
9
- --audio-file ref.wav \
10
- --text "Hello from my MiraTTS model."
11
  ```
12
 
13
- ## Usage (Python)
 
14
  ```python
 
15
  from transformers import AutoTokenizer, AutoModelForCausalLM
16
  from ncodec.codec import TTSCodec
17
 
18
- model = AutoModelForCausalLM.from_pretrained("hamdallah/Sofelia-TTS", trust_remote_code=True)
19
- tokenizer = AutoTokenizer.from_pretrained("hamdallah/Sofelia-TTS", trust_remote_code=True)
 
 
 
 
20
  codec = TTSCodec()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ar
4
+ license: apache-2.0
5
+ tags:
6
+ - text-to-speech
7
+ - tts
8
+ - audio
9
+ - speech
10
+ - palestinian-arabic
11
+ - arabic
12
+ - voice-cloning
13
+ - miratts
14
+ - sofelia
15
+ base_model: YatharthS/MiraTTS
16
+ datasets:
17
+ - hamdallah/ar-gemini
18
+ library_name: transformers
19
+ pipeline_tag: text-to-speech
20
+ ---
21
 
22
+ <div style="text-align: center;">
23
+ <h1>๐Ÿ‡ต๐Ÿ‡ธ Sofelia-TTS ๐Ÿ‡ต๐Ÿ‡ธ</h1>
24
+ <p><strong>Palestinian Arabic Text-to-Speech Model</strong></p>
25
+ <p><em>From the river to the sea, Palestine will be free</em> ๐Ÿ•Š๏ธ</p>
26
+ </div>
27
+
28
+ ---
29
+
30
+ ## ๐ŸŒŸ Model Description
31
+
32
+ **Sofelia-TTS** is a fine-tuned Text-to-Speech (TTS) model specifically trained for **Palestinian Arabic dialect**. This model brings the beautiful sounds of Palestinian speech to AI, preserving and celebrating the linguistic heritage of Palestine.
33
+
34
+ Built on top of [YatharthS/MiraTTS](https://huggingface.co/YatharthS/MiraTTS), Sofelia-TTS captures the unique phonetic characteristics, intonation patterns, and prosody of Palestinian Arabic, making it ideal for:
35
+
36
+ - ๐ŸŽ™๏ธ **Voice cloning** with Palestinian Arabic speech
37
+ - ๐Ÿ“š **Audiobook generation** in Palestinian dialect
38
+ - ๐Ÿ—ฃ๏ธ **Virtual assistants** that speak authentic Palestinian Arabic
39
+ - ๐ŸŽ“ **Educational tools** for learning and preserving the Palestinian dialect
40
+ - ๐ŸŽฌ **Content creation** for Palestinian media and storytelling
41
+
42
+ > **Dedicated to Palestine**: This model is a tribute to the resilience, culture, and spirit of the Palestinian people. May their voices be heard loud and clear across the world. ๐Ÿ‡ต๐Ÿ‡ธ
43
+
44
+ ---
45
+
46
+ ## ๐ŸŽฏ Key Features
47
+
48
+ - โœ… **High-quality voice cloning**: Clone any voice with just a few seconds of reference audio
49
+ - โœ… **Palestinian Arabic dialect**: Authentic pronunciation and intonation
50
+ - โœ… **Fast inference**: Optimized for real-time generation
51
+ - โœ… **Flexible context**: Supports variable-length reference audio
52
+ - โœ… **Open source**: Free to use and improve
53
+
54
+ ---
55
+
56
+ ## ๐Ÿ“Š Model Details
57
+
58
+ | **Attribute** | **Value** |
59
+ |---------------|-----------|
60
+ | **Model Type** | Text-to-Speech (TTS) |
61
+ | **Base Model** | YatharthS/MiraTTS |
62
+ | **Architecture** | Transformer-based Language Model + Audio Codec |
63
+ | **Training Language** | Palestinian Arabic (ar-PS) |
64
+ | **Dataset** | [hamdallah/ar-gemini](https://huggingface.co/datasets/hamdallah/ar-gemini) |
65
+ | **Sample Rate** | 16,000 Hz |
66
+ | **License** | Apache 2.0 |
67
+ | **Model Size** | ~1.3B parameters |
68
+ | **Precision** | BF16/FP32 |
69
+ | **Framework** | PyTorch + Transformers |
70
+
71
+ ---
72
+
73
+ ## ๐Ÿš€ Quick Start
74
+
75
+ ### Installation
76
 
 
77
  ```bash
78
+ # Install required packages
79
+ pip install torch transformers datasets
80
+ pip install git+https://github.com/YatharthS/ncodec.git
 
81
  ```
82
 
83
+ ### Usage (Python)
84
+
85
  ```python
86
+ import torch
87
  from transformers import AutoTokenizer, AutoModelForCausalLM
88
  from ncodec.codec import TTSCodec
89
 
90
+ # Load model and tokenizer
91
+ model_id = "hamdallah/Sofelia-TTS"
92
+ model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
93
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
94
+
95
+ # Initialize audio codec
96
  codec = TTSCodec()
97
+
98
+ # Prepare your text (Palestinian Arabic)
99
+ text = "ู…ุฑุญุจุงุŒ ูƒูŠู ุงู„ุญุงู„ุŸ ู‡ุฐุง ู†ู…ูˆุฐุฌ ู„ู„ู‡ุฌุฉ ุงู„ูู„ุณุทูŠู†ูŠุฉ."
100
+
101
+ # Load reference audio (3-10 seconds of speech)
102
+ reference_audio_path = "path/to/reference_voice.wav"
103
+
104
+ # Generate speech
105
+ import torchaudio
106
+
107
+ # Load and resample reference audio to 16kHz
108
+ waveform, sample_rate = torchaudio.load(reference_audio_path)
109
+ if sample_rate != 16000:
110
+ resampler = torchaudio.transforms.Resample(sample_rate, 16000)
111
+ waveform = resampler(waveform)
112
+
113
+ # Encode reference audio to get context tokens
114
+ audio_array = waveform.squeeze().numpy()
115
+ semantic_tokens, context_tokens = codec.audio_encoder.encode(audio_array, True, duration=10)
116
+
117
+ # Create prompt
118
+ prompt = (
119
+ f"<|task_tts|><|start_text|>{text}<|end_text|>"
120
+ f"<|context_audio_start|>{context_tokens}<|context_audio_end|>"
121
+ f"<|prompt_speech_start|>{semantic_tokens}"
122
+ )
123
+
124
+ # Tokenize and generate
125
+ inputs = tokenizer(prompt, return_tensors="pt")
126
+ with torch.no_grad():
127
+ outputs = model.generate(
128
+ **inputs,
129
+ max_length=2048,
130
+ do_sample=True,
131
+ temperature=0.7,
132
+ top_p=0.95,
133
+ )
134
+
135
+ # Decode to audio
136
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=False)
137
+ audio_output = codec.decode(generated_text)
138
+
139
+ # Save output
140
+ torchaudio.save("output.wav", torch.from_numpy(audio_output).unsqueeze(0), 16000)
141
+ print("โœ… Audio saved to output.wav")
142
  ```
143
+
144
+ ### Usage (CLI)
145
+
146
+ If you have the training scripts:
147
+
148
+ ```bash
149
+ # Clone the repository with inference scripts
150
+ git clone https://huggingface.co/hamdallah/Sofelia-TTS
151
+ cd Sofelia-TTS
152
+
153
+ # Run inference
154
+ python test_miratts.py \
155
+ --model-id hamdallah/Sofelia-TTS \
156
+ --audio-file reference_voice.wav \
157
+ --text "ู…ุฑุญุจุงู‹ ู…ู† ูู„ุณุทูŠู† ุงู„ุญุฑุฉ" \
158
+ --output-file output.wav
159
+ ```
160
+
161
+ ---
162
+
163
+ ## ๐ŸŽค Example Prompts
164
+
165
+ Try these Palestinian Arabic phrases:
166
+
167
+ ```python
168
+ # Greetings
169
+ "ู…ุฑุญุจุงุŒ ูƒูŠู ุญุงู„ูƒุŸ" # Hello, how are you?
170
+ "ุฃู‡ู„ุง ูˆุณู‡ู„ุง ููŠูƒ" # Welcome
171
+
172
+ # Common expressions
173
+ "ูŠุง ุณู„ุงู…ุŒ ู‡ุฐุง ุฑุงุฆุน" # Wow, this is amazing
174
+ "ู…ุง ุดุงุก ุงู„ู„ู‡" # Mashallah
175
+ "ุงู„ู„ู‡ ูŠุนุทูŠูƒ ุงู„ุนุงููŠุฉ" # God give you wellness
176
+
177
+ # About Palestine
178
+ "ูู„ุณุทูŠู† ุญุฑุฉ ู…ู† ุงู„ู†ู‡ุฑ ุฅู„ู‰ ุงู„ุจุญุฑ" # Palestine is free from the river to the sea
179
+ "ุงู„ู‚ุฏุณ ุนุงุตู…ุฉ ูู„ุณุทูŠู† ุงู„ุฃุจุฏูŠุฉ" # Jerusalem is the eternal capital of Palestine
180
+ "ุณู†ุนูˆุฏ ูŠูˆู…ุงู‹ ุฅู„ู‰ ุฏูŠุงุฑู†ุง" # We will return one day to our homes
181
+ ```
182
+
183
+ ---
184
+
185
+ ## ๐ŸŽ“ Training Details
186
+
187
+ ### Training Data
188
+
189
+ - **Dataset**: [hamdallah/ar-gemini](https://huggingface.co/datasets/hamdallah/ar-gemini)
190
+ - **Language**: Palestinian Arabic dialect
191
+ - **Hours of audio**: High-quality Palestinian speech recordings
192
+ - **Preprocessing**: Audio normalized and resampled to 16kHz
193
+
194
+ ### Training Configuration
195
+
196
+ | **Hyperparameter** | **Value** |
197
+ |--------------------|-----------|
198
+ | **Learning Rate** | 2e-4 (initial), 1e-5 (refinement) |
199
+ | **Batch Size** | 8 (effective: 2 per device ร— 4 accumulation steps) |
200
+ | **Training Steps** | 2000+ |
201
+ | **Warmup Steps** | 100 |
202
+ | **Max Audio Length** | 20-30 seconds |
203
+ | **Optimizer** | AdamW |
204
+ | **LR Scheduler** | Cosine with warmup |
205
+ | **Gradient Clipping** | 1.0 |
206
+ | **Precision** | BF16 (H100) / FP32 |
207
+ | **Hardware** | NVIDIA H100 / A100 GPU |
208
+
209
+ ### Training Process
210
+
211
+ The model was trained using a two-phase approach:
212
+
213
+ 1. **Foundation Phase**: High learning rate (2e-4) for initial adaptation to Palestinian Arabic
214
+ 2. **Refinement Phase**: Lower learning rate (1e-5) with NEFTune noise for stability and quality
215
+
216
+ ---
217
+
218
+ ## ๐Ÿ“ˆ Model Performance
219
+
220
+ The model achieves:
221
+
222
+ - โœ… **Natural prosody** matching Palestinian Arabic speech patterns
223
+ - โœ… **Clear pronunciation** of Arabic phonemes
224
+ - โœ… **Voice similarity** to reference audio
225
+ - โœ… **Stable generation** without artifacts or repetitions
226
+ - โœ… **Fast inference** suitable for real-time applications
227
+
228
+ ---
229
+
230
+ ## ๐Ÿ› ๏ธ Advanced Usage
231
+
232
+ ### Adjusting Generation Parameters
233
+
234
+ ```python
235
+ # More creative/variable output
236
+ outputs = model.generate(
237
+ **inputs,
238
+ max_length=2048,
239
+ do_sample=True,
240
+ temperature=0.9, # Higher = more variation
241
+ top_p=0.95,
242
+ top_k=50,
243
+ )
244
+
245
+ # More deterministic/stable output
246
+ outputs = model.generate(
247
+ **inputs,
248
+ max_length=2048,
249
+ do_sample=True,
250
+ temperature=0.5, # Lower = more stable
251
+ top_p=0.9,
252
+ )
253
+ ```
254
+
255
+ ### Batch Processing
256
+
257
+ ```python
258
+ # Process multiple texts with the same reference voice
259
+ texts = [
260
+ "ู…ุฑุญุจุงู‹",
261
+ "ูƒูŠู ุญุงู„ูƒุŸ",
262
+ "ูู„ุณุทูŠู† ุญุฑุฉ"
263
+ ]
264
+
265
+ for i, text in enumerate(texts):
266
+ prompt = create_prompt(text, reference_audio) # Your prompt creation function
267
+ outputs = model.generate(...)
268
+ save_audio(f"output_{i}.wav", outputs)
269
+ ```
270
+
271
+ ---
272
+
273
+ ## ๐Ÿ’ก Tips for Best Results
274
+
275
+ 1. **Reference Audio Quality**:
276
+ - Use clean audio without background noise
277
+ - 3-10 seconds of speech is ideal
278
+ - Ensure audio is 16kHz sample rate
279
+
280
+ 2. **Text Input**:
281
+ - Use proper Arabic script (not Arabizi/transliteration)
282
+ - Palestinian dialect works best
283
+ - Avoid very long sentences (split into shorter segments)
284
+
285
+ 3. **Generation Parameters**:
286
+ - `temperature=0.7`: Good default for natural speech
287
+ - `temperature=0.5`: More stable, less variation
288
+ - `temperature=0.9`: More expressive, more variation
289
+
290
+ ---
291
+
292
+ ## ๐ŸŒ About Palestinian Arabic
293
+
294
+ Palestinian Arabic is a Levantine Arabic dialect spoken by the Palestinian people. It has unique characteristics:
295
+
296
+ - **Phonology**: Preservation of Classical Arabic /q/ as glottal stop [ส”]
297
+ - **Vocabulary**: Rich in Levantine and unique Palestinian terms
298
+ - **Intonation**: Distinctive melodic patterns
299
+ - **Regional Variants**: Urban (Jerusalem, Hebron) vs. Rural vs. Bedouin varieties
300
+
301
+ This model captures these linguistic features, making it authentic and representative of Palestinian speech.
302
+
303
+ ---
304
+
305
+ ## ๐Ÿ‡ต๐Ÿ‡ธ Message of Solidarity
306
+
307
+ This model is dedicated to the Palestinian people and their enduring struggle for freedom, dignity, and justice. Through technology, we preserve and celebrate Palestinian culture, language, and identity.
308
+
309
+ **Free Palestine** ๐Ÿ‡ต๐Ÿ‡ธ **From the River to the Sea**
310
+
311
+ > *"We will not be erased. Our voices will echo through time, in every language model, every algorithm, every line of code. Palestine lives, and so does its voice."*
312
+
313
+ ---
314
+
315
+ ## ๐Ÿ“œ License
316
+
317
+ This model is released under the **Apache 2.0 License**, making it free for:
318
+ - โœ… Commercial use
319
+ - โœ… Modification and distribution
320
+ - โœ… Private use
321
+ - โœ… Patent use
322
+
323
+ ---
324
+
325
+ ## ๐Ÿ™ Acknowledgments
326
+
327
+ - **Base Model**: [YatharthS/MiraTTS](https://huggingface.co/YatharthS/MiraTTS) - Thank you for the excellent foundation
328
+ - **Dataset**: Palestinian Arabic speakers who contributed their voices
329
+ - **Community**: The open-source AI community for tools and support
330
+ - **Palestine**: For being the inspiration and purpose behind this work
331
+
332
+ ---
333
+
334
+ ## ๐Ÿ“ž Contact & Support
335
+
336
+ - **Model Repository**: [hamdallah/Sofelia-TTS](https://huggingface.co/hamdallah/Sofelia-TTS)
337
+ - **Issues & Questions**: Use the Community tab or open an issue
338
+ - **Dataset**: [hamdallah/ar-gemini](https://huggingface.co/datasets/hamdallah/ar-gemini)
339
+
340
+ ---
341
+
342
+ ## ๐Ÿ”— Related Resources
343
+
344
+ - [YatharthS/MiraTTS](https://huggingface.co/YatharthS/MiraTTS) - Base model
345
+ - [hamdallah/ar-gemini](https://huggingface.co/datasets/hamdallah/ar-gemini) - Training dataset
346
+ - [ncodec](https://github.com/YatharthS/ncodec) - Audio codec library
347
+
348
+ ---
349
+
350
+ ## ๐Ÿ“š Citation
351
+
352
+ If you use this model in your research or projects, please cite:
353
+
354
+ ```bibtex
355
+ @misc{sofelia-tts-2026,
356
+ author = {Hamdallah},
357
+ title = {Sofelia-TTS: Palestinian Arabic Text-to-Speech Model},
358
+ year = {2026},
359
+ publisher = {Hugging Face},
360
+ journal = {Hugging Face Model Hub},
361
+ howpublished = {\url{https://huggingface.co/hamdallah/Sofelia-TTS}},
362
+ }
363
+ ```
364
+
365
+ ---
366
+
367
+ <div style="text-align: center; padding: 20px;">
368
+ <h2>๐Ÿ‡ต๐Ÿ‡ธ FREE PALESTINE ๐Ÿ‡ต๐Ÿ‡ธ</h2>
369
+ <p><strong>ุชุญูŠุง ูู„ุณุทูŠู† ุญุฑุฉ ุฃุจูŠุฉ</strong></p>
370
+ <p><em>Long Live Free Palestine</em></p>
371
+ <p>๐Ÿ•Š๏ธ โœŠ ๐Ÿ‡ต๐Ÿ‡ธ</p>
372
+ </div>
373
+
374
+ ---
375
+
376
+ **Made with โค๏ธ for Palestine**