hamdallah commited on
Commit
e2bf16b
·
verified ·
1 Parent(s): dc2f36e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -111
README.md CHANGED
@@ -1,6 +1,7 @@
1
  ---
2
  language:
3
  - ar
 
4
  license: apache-2.0
5
  tags:
6
  - text-to-speech
@@ -13,8 +14,6 @@ tags:
13
  - miratts
14
  - sofelia
15
  base_model: YatharthS/MiraTTS
16
- datasets:
17
- - hamdallah/ar-gemini
18
  library_name: transformers
19
  pipeline_tag: text-to-speech
20
  ---
@@ -22,7 +21,7 @@ pipeline_tag: text-to-speech
22
  <div style="text-align: center;">
23
  <h1>🇵🇸 Sofelia-TTS 🇵🇸</h1>
24
  <p><strong>Palestinian Arabic Text-to-Speech Model</strong></p>
25
- <p><em>From the river to the sea, Palestine will be free</em> 🕊️</p>
26
  </div>
27
 
28
  ---
@@ -61,7 +60,7 @@ Built on top of [YatharthS/MiraTTS](https://huggingface.co/YatharthS/MiraTTS), S
61
  | **Base Model** | YatharthS/MiraTTS |
62
  | **Architecture** | Transformer-based Language Model + Audio Codec |
63
  | **Training Language** | Palestinian Arabic (ar-PS) |
64
- | **Dataset** | [hamdallah/ar-gemini](https://huggingface.co/datasets/hamdallah/ar-gemini) |
65
  | **Sample Rate** | 16,000 Hz |
66
  | **License** | Apache 2.0 |
67
  | **Model Size** | ~1.3B parameters |
@@ -76,86 +75,23 @@ Built on top of [YatharthS/MiraTTS](https://huggingface.co/YatharthS/MiraTTS), S
76
 
77
  ```bash
78
  # Install required packages
79
- pip install torch transformers datasets
80
- pip install git+https://github.com/YatharthS/ncodec.git
81
  ```
82
 
83
  ### Usage (Python)
84
 
85
  ```python
86
- import torch
87
- from transformers import AutoTokenizer, AutoModelForCausalLM
88
- from ncodec.codec import TTSCodec
89
 
90
- # Load model and tokenizer
91
- model_id = "hamdallah/Sofelia-TTS"
92
- model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
93
- tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
94
-
95
- # Initialize audio codec
96
- codec = TTSCodec()
97
-
98
- # Prepare your text (Palestinian Arabic)
99
  text = "مرحبا، كيف الحال؟ هذا نموذج للهجة الفلسطينية."
100
 
101
- # Load reference audio (3-10 seconds of speech)
102
- reference_audio_path = "path/to/reference_voice.wav"
103
-
104
- # Generate speech
105
- import torchaudio
106
-
107
- # Load and resample reference audio to 16kHz
108
- waveform, sample_rate = torchaudio.load(reference_audio_path)
109
- if sample_rate != 16000:
110
- resampler = torchaudio.transforms.Resample(sample_rate, 16000)
111
- waveform = resampler(waveform)
112
-
113
- # Encode reference audio to get context tokens
114
- audio_array = waveform.squeeze().numpy()
115
- semantic_tokens, context_tokens = codec.audio_encoder.encode(audio_array, True, duration=10)
116
-
117
- # Create prompt
118
- prompt = (
119
- f"<|task_tts|><|start_text|>{text}<|end_text|>"
120
- f"<|context_audio_start|>{context_tokens}<|context_audio_end|>"
121
- f"<|prompt_speech_start|>{semantic_tokens}"
122
- )
123
-
124
- # Tokenize and generate
125
- inputs = tokenizer(prompt, return_tensors="pt")
126
- with torch.no_grad():
127
- outputs = model.generate(
128
- **inputs,
129
- max_length=2048,
130
- do_sample=True,
131
- temperature=0.7,
132
- top_p=0.95,
133
- )
134
-
135
- # Decode to audio
136
- generated_text = tokenizer.decode(outputs[0], skip_special_tokens=False)
137
- audio_output = codec.decode(generated_text)
138
-
139
- # Save output
140
- torchaudio.save("output.wav", torch.from_numpy(audio_output).unsqueeze(0), 16000)
141
- print("✅ Audio saved to output.wav")
142
- ```
143
 
144
- ### Usage (CLI)
145
-
146
- If you have the training scripts:
147
-
148
- ```bash
149
- # Clone the repository with inference scripts
150
- git clone https://huggingface.co/hamdallah/Sofelia-TTS
151
- cd Sofelia-TTS
152
-
153
- # Run inference
154
- python test_miratts.py \
155
- --model-id hamdallah/Sofelia-TTS \
156
- --audio-file reference_voice.wav \
157
- --text "مرحباً من فلسطين الحرة" \
158
- --output-file output.wav
159
  ```
160
 
161
  ---
@@ -175,7 +111,7 @@ Try these Palestinian Arabic phrases:
175
  "الله يعطيك العافية" # God give you wellness
176
 
177
  # About Palestine
178
- "فلسطين حرة من النهر إلى البحر" # Palestine is free from the river to the sea
179
  "القدس عاصمة فلسطين الأبدية" # Jerusalem is the eternal capital of Palestine
180
  "سنعود يوماً إلى ديارنا" # We will return one day to our homes
181
  ```
@@ -186,7 +122,7 @@ Try these Palestinian Arabic phrases:
186
 
187
  ### Training Data
188
 
189
- - **Dataset**: [hamdallah/ar-gemini](https://huggingface.co/datasets/hamdallah/ar-gemini)
190
  - **Language**: Palestinian Arabic dialect
191
  - **Hours of audio**: High-quality Palestinian speech recordings
192
  - **Preprocessing**: Audio normalized and resampled to 16kHz
@@ -229,43 +165,33 @@ The model achieves:
229
 
230
  ## 🛠️ Advanced Usage
231
 
232
- ### Adjusting Generation Parameters
233
 
234
  ```python
235
- # More creative/variable output
236
- outputs = model.generate(
237
- **inputs,
238
- max_length=2048,
239
- do_sample=True,
240
- temperature=0.9, # Higher = more variation
241
- top_p=0.95,
242
- top_k=50,
243
- )
244
 
245
- # More deterministic/stable output
246
- outputs = model.generate(
247
- **inputs,
248
- max_length=2048,
249
- do_sample=True,
250
- temperature=0.5, # Lower = more stable
251
- top_p=0.9,
252
- )
253
  ```
254
 
255
- ### Batch Processing
256
 
257
  ```python
258
- # Process multiple texts with the same reference voice
259
- texts = [
260
- "مرحباً",
261
- "كيف حالك؟",
262
- "فلسطين حرة"
263
- ]
264
-
265
- for i, text in enumerate(texts):
266
- prompt = create_prompt(text, reference_audio) # Your prompt creation function
267
- outputs = model.generate(...)
268
- save_audio(f"output_{i}.wav", outputs)
269
  ```
270
 
271
  ---
@@ -306,7 +232,7 @@ This model captures these linguistic features, making it authentic and represent
306
 
307
  This model is dedicated to the Palestinian people and their enduring struggle for freedom, dignity, and justice. Through technology, we preserve and celebrate Palestinian culture, language, and identity.
308
 
309
- **Free Palestine** 🇵🇸 **From the River to the Sea**
310
 
311
  > *"We will not be erased. Our voices will echo through time, in every language model, every algorithm, every line of code. Palestine lives, and so does its voice."*
312
 
@@ -335,14 +261,12 @@ This model is released under the **Apache 2.0 License**, making it free for:
335
 
336
  - **Model Repository**: [hamdallah/Sofelia-TTS](https://huggingface.co/hamdallah/Sofelia-TTS)
337
  - **Issues & Questions**: Use the Community tab or open an issue
338
- - **Dataset**: [hamdallah/ar-gemini](https://huggingface.co/datasets/hamdallah/ar-gemini)
339
 
340
  ---
341
 
342
  ## 🔗 Related Resources
343
 
344
  - [YatharthS/MiraTTS](https://huggingface.co/YatharthS/MiraTTS) - Base model
345
- - [hamdallah/ar-gemini](https://huggingface.co/datasets/hamdallah/ar-gemini) - Training dataset
346
  - [ncodec](https://github.com/YatharthS/ncodec) - Audio codec library
347
 
348
  ---
@@ -373,4 +297,4 @@ If you use this model in your research or projects, please cite:
373
 
374
  ---
375
 
376
- **Made with ❤️ for Palestine**
 
1
  ---
2
  language:
3
  - ar
4
+ - en
5
  license: apache-2.0
6
  tags:
7
  - text-to-speech
 
14
  - miratts
15
  - sofelia
16
  base_model: YatharthS/MiraTTS
 
 
17
  library_name: transformers
18
  pipeline_tag: text-to-speech
19
  ---
 
21
  <div style="text-align: center;">
22
  <h1>🇵🇸 Sofelia-TTS 🇵🇸</h1>
23
  <p><strong>Palestinian Arabic Text-to-Speech Model</strong></p>
24
+ <p><em>Palestine will be free</em> 🕊️</p>
25
  </div>
26
 
27
  ---
 
60
  | **Base Model** | YatharthS/MiraTTS |
61
  | **Architecture** | Transformer-based Language Model + Audio Codec |
62
  | **Training Language** | Palestinian Arabic (ar-PS) |
63
+ | **Dataset** | Private Dataset |
64
  | **Sample Rate** | 16,000 Hz |
65
  | **License** | Apache 2.0 |
66
  | **Model Size** | ~1.3B parameters |
 
75
 
76
  ```bash
77
  # Install required packages
78
+ uv pip install git+https://github.com/ysharma3501/MiraTTS.git
 
79
  ```
80
 
81
  ### Usage (Python)
82
 
83
  ```python
84
+ from mira.model import MiraTTS
85
+ from IPython.display import Audio
86
+ mira_tts = MiraTTS('hamdallah/Sofelia-TTS') ## downloads model from huggingface
87
 
88
+ file = "reference_file.wav" ## can be mp3/wav/ogg or anything that librosa supports
 
 
 
 
 
 
 
 
89
  text = "مرحبا، كيف الحال؟ هذا نموذج للهجة الفلسطينية."
90
 
91
+ context_tokens = mira_tts.encode_audio(file)
92
+ audio = mira_tts.generate(text, context_tokens)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
 
94
+ Audio(audio, rate=48000)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  ```
96
 
97
  ---
 
111
  "الله يعطيك العافية" # God give you wellness
112
 
113
  # About Palestine
114
+ "فلسطين حرة على طول" # Palestine is free for ever
115
  "القدس عاصمة فلسطين الأبدية" # Jerusalem is the eternal capital of Palestine
116
  "سنعود يوماً إلى ديارنا" # We will return one day to our homes
117
  ```
 
122
 
123
  ### Training Data
124
 
125
+ - **Dataset**: 400 Hours Palestinian Speech
126
  - **Language**: Palestinian Arabic dialect
127
  - **Hours of audio**: High-quality Palestinian speech recordings
128
  - **Preprocessing**: Audio normalized and resampled to 16kHz
 
165
 
166
  ## 🛠️ Advanced Usage
167
 
168
+ ### Running the model using batching
169
 
170
  ```python
171
+ file = "reference_file.wav" ## can be mp3/wav/ogg or anything that librosa supports
172
+ text = ["مرحبا، كيف حالك؟", "بتعرف إنه انا بقدر احكي فلسطيني و English مع بعض Without Errors."]
 
 
 
 
 
 
 
173
 
174
+ context_tokens = [mira_tts.encode_audio(file)]
175
+
176
+ audio = mira_tts.batch_generate(text, context_tokens)
177
+
178
+ Audio(audio, rate=48000)
 
 
 
179
  ```
180
 
181
+ ### Adjusting Generation Parameters
182
 
183
  ```python
184
+ # More creative/variable output
185
+
186
+ mira_tts.set_params(
187
+ top_p=0.95,
188
+ top_k=20,
189
+ temperature=0.01, # Higher = more variation
190
+ max_new_tokens=1024,
191
+ repetition_penalty=2.2,
192
+ min_p=0.05
193
+ )
194
+
195
  ```
196
 
197
  ---
 
232
 
233
  This model is dedicated to the Palestinian people and their enduring struggle for freedom, dignity, and justice. Through technology, we preserve and celebrate Palestinian culture, language, and identity.
234
 
235
+ **Free Palestine** 🇵🇸
236
 
237
  > *"We will not be erased. Our voices will echo through time, in every language model, every algorithm, every line of code. Palestine lives, and so does its voice."*
238
 
 
261
 
262
  - **Model Repository**: [hamdallah/Sofelia-TTS](https://huggingface.co/hamdallah/Sofelia-TTS)
263
  - **Issues & Questions**: Use the Community tab or open an issue
 
264
 
265
  ---
266
 
267
  ## 🔗 Related Resources
268
 
269
  - [YatharthS/MiraTTS](https://huggingface.co/YatharthS/MiraTTS) - Base model
 
270
  - [ncodec](https://github.com/YatharthS/ncodec) - Audio codec library
271
 
272
  ---
 
297
 
298
  ---
299
 
300
+ **Made with ❤️ for Palestine**