Update README.md

Browse files

Files changed (1) hide show

README.md +35 -111

README.md CHANGED Viewed

@@ -1,6 +1,7 @@
 ---
 language:
 - ar
 license: apache-2.0
 tags:
 - text-to-speech
@@ -13,8 +14,6 @@ tags:
 - miratts
 - sofelia
 base_model: YatharthS/MiraTTS
-datasets:
-- hamdallah/ar-gemini
 library_name: transformers
 pipeline_tag: text-to-speech
 ---
@@ -22,7 +21,7 @@ pipeline_tag: text-to-speech
 <div style="text-align: center;">
   <h1>🇵🇸 Sofelia-TTS 🇵🇸</h1>
   <p><strong>Palestinian Arabic Text-to-Speech Model</strong></p>
-  <p><em>From the river to the sea, Palestine will be free</em> 🕊️</p>
 </div>
 ---
@@ -61,7 +60,7 @@ Built on top of [YatharthS/MiraTTS](https://huggingface.co/YatharthS/MiraTTS), S
 | **Base Model** | YatharthS/MiraTTS |
 | **Architecture** | Transformer-based Language Model + Audio Codec |
 | **Training Language** | Palestinian Arabic (ar-PS) |
-| **Dataset** | [hamdallah/ar-gemini](https://huggingface.co/datasets/hamdallah/ar-gemini) |
 | **Sample Rate** | 16,000 Hz |
 | **License** | Apache 2.0 |
 | **Model Size** | ~1.3B parameters |
@@ -76,86 +75,23 @@ Built on top of [YatharthS/MiraTTS](https://huggingface.co/YatharthS/MiraTTS), S
 ```bash
 # Install required packages
-pip install torch transformers datasets
-pip install git+https://github.com/YatharthS/ncodec.git
 ```
 ### Usage (Python)
 ```python
-import torch
-from transformers import AutoTokenizer, AutoModelForCausalLM
-from ncodec.codec import TTSCodec
-# Load model and tokenizer
-model_id = "hamdallah/Sofelia-TTS"
-model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
-tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
-# Initialize audio codec
-codec = TTSCodec()
-# Prepare your text (Palestinian Arabic)
 text = "مرحبا، كيف الحال؟ هذا نموذج للهجة الفلسطينية."
-# Load reference audio (3-10 seconds of speech)
-reference_audio_path = "path/to/reference_voice.wav"
-# Generate speech
-import torchaudio
-# Load and resample reference audio to 16kHz
-waveform, sample_rate = torchaudio.load(reference_audio_path)
-if sample_rate != 16000:
-    resampler = torchaudio.transforms.Resample(sample_rate, 16000)
-    waveform = resampler(waveform)
-# Encode reference audio to get context tokens
-audio_array = waveform.squeeze().numpy()
-semantic_tokens, context_tokens = codec.audio_encoder.encode(audio_array, True, duration=10)
-# Create prompt
-prompt = (
-    f"<|task_tts|><|start_text|>{text}<|end_text|>"
-    f"<|context_audio_start|>{context_tokens}<|context_audio_end|>"
-    f"<|prompt_speech_start|>{semantic_tokens}"
-)
-# Tokenize and generate
-inputs = tokenizer(prompt, return_tensors="pt")
-with torch.no_grad():
-    outputs = model.generate(
-        **inputs,
-        max_length=2048,
-        do_sample=True,
-        temperature=0.7,
-        top_p=0.95,
-    )
-# Decode to audio
-generated_text = tokenizer.decode(outputs[0], skip_special_tokens=False)
-audio_output = codec.decode(generated_text)
-# Save output
-torchaudio.save("output.wav", torch.from_numpy(audio_output).unsqueeze(0), 16000)
-print("✅ Audio saved to output.wav")
-```
-### Usage (CLI)
-If you have the training scripts:
-```bash
-# Clone the repository with inference scripts
-git clone https://huggingface.co/hamdallah/Sofelia-TTS
-cd Sofelia-TTS
-# Run inference
-python test_miratts.py \
-  --model-id hamdallah/Sofelia-TTS \
-  --audio-file reference_voice.wav \
-  --text "مرحباً من فلسطين الحرة" \
-  --output-file output.wav
 ```
 ---
@@ -175,7 +111,7 @@ Try these Palestinian Arabic phrases:
 "الله يعطيك العافية"  # God give you wellness
 # About Palestine
-"فلسطين حرة من النهر إلى البحر"  # Palestine is free from the river to the sea
 "القدس عاصمة فلسطين الأبدية"     # Jerusalem is the eternal capital of Palestine
 "سنعود يوماً إلى ديارنا"         # We will return one day to our homes
 ```
@@ -186,7 +122,7 @@ Try these Palestinian Arabic phrases:
 ### Training Data
-- **Dataset**: [hamdallah/ar-gemini](https://huggingface.co/datasets/hamdallah/ar-gemini)
 - **Language**: Palestinian Arabic dialect
 - **Hours of audio**: High-quality Palestinian speech recordings
 - **Preprocessing**: Audio normalized and resampled to 16kHz
@@ -229,43 +165,33 @@ The model achieves:
 ## 🛠️ Advanced Usage
-### Adjusting Generation Parameters
 ```python
-# More creative/variable output
-outputs = model.generate(
-    **inputs,
-    max_length=2048,
-    do_sample=True,
-    temperature=0.9,  # Higher = more variation
-    top_p=0.95,
-    top_k=50,
-)
-# More deterministic/stable output
-outputs = model.generate(
-    **inputs,
-    max_length=2048,
-    do_sample=True,
-    temperature=0.5,  # Lower = more stable
-    top_p=0.9,
-)
 ```
-### Batch Processing
 ```python
-# Process multiple texts with the same reference voice
-texts = [
-    "مرحباً",
-    "كيف حالك؟",
-    "فلسطين حرة"
-]
-for i, text in enumerate(texts):
-    prompt = create_prompt(text, reference_audio)  # Your prompt creation function
-    outputs = model.generate(...)
-    save_audio(f"output_{i}.wav", outputs)
 ```
 ---
@@ -306,7 +232,7 @@ This model captures these linguistic features, making it authentic and represent
 This model is dedicated to the Palestinian people and their enduring struggle for freedom, dignity, and justice. Through technology, we preserve and celebrate Palestinian culture, language, and identity.
-**Free Palestine** 🇵🇸 **From the River to the Sea**
 > *"We will not be erased. Our voices will echo through time, in every language model, every algorithm, every line of code. Palestine lives, and so does its voice."*
@@ -335,14 +261,12 @@ This model is released under the **Apache 2.0 License**, making it free for:
 - **Model Repository**: [hamdallah/Sofelia-TTS](https://huggingface.co/hamdallah/Sofelia-TTS)
 - **Issues & Questions**: Use the Community tab or open an issue
-- **Dataset**: [hamdallah/ar-gemini](https://huggingface.co/datasets/hamdallah/ar-gemini)
 ---
 ## 🔗 Related Resources
 - [YatharthS/MiraTTS](https://huggingface.co/YatharthS/MiraTTS) - Base model
-- [hamdallah/ar-gemini](https://huggingface.co/datasets/hamdallah/ar-gemini) - Training dataset
 - [ncodec](https://github.com/YatharthS/ncodec) - Audio codec library
 ---
@@ -373,4 +297,4 @@ If you use this model in your research or projects, please cite:
 ---
-**Made with ❤️ for Palestine**

 ---
 language:
 - ar
+- en
 license: apache-2.0
 tags:
 - text-to-speech
 - miratts
 - sofelia
 base_model: YatharthS/MiraTTS
 library_name: transformers
 pipeline_tag: text-to-speech
 ---
 <div style="text-align: center;">
   <h1>🇵🇸 Sofelia-TTS 🇵🇸</h1>
   <p><strong>Palestinian Arabic Text-to-Speech Model</strong></p>
+  <p><em>Palestine will be free</em> 🕊️</p>
 </div>
 ---
 | **Base Model** | YatharthS/MiraTTS |
 | **Architecture** | Transformer-based Language Model + Audio Codec |
 | **Training Language** | Palestinian Arabic (ar-PS) |
+| **Dataset** | Private Dataset |
 | **Sample Rate** | 16,000 Hz |
 | **License** | Apache 2.0 |
 | **Model Size** | ~1.3B parameters |
 ```bash
 # Install required packages
+uv pip install git+https://github.com/ysharma3501/MiraTTS.git
 ```
 ### Usage (Python)
 ```python
+from mira.model import MiraTTS
+from IPython.display import Audio
+mira_tts = MiraTTS('hamdallah/Sofelia-TTS') ## downloads model from huggingface
+file = "reference_file.wav" ## can be mp3/wav/ogg or anything that librosa supports
 text = "مرحبا، كيف الحال؟ هذا نموذج للهجة الفلسطينية."
+context_tokens = mira_tts.encode_audio(file)
+audio = mira_tts.generate(text, context_tokens)
+Audio(audio, rate=48000)
 ```
 ---
 "الله يعطيك العافية"  # God give you wellness
 # About Palestine
+"فلسطين حرة على طول"  # Palestine is free for ever
 "القدس عاصمة فلسطين الأبدية"     # Jerusalem is the eternal capital of Palestine
 "سنعود يوماً إلى ديارنا"         # We will return one day to our homes
 ```
 ### Training Data
+- **Dataset**: 400 Hours Palestinian Speech
 - **Language**: Palestinian Arabic dialect
 - **Hours of audio**: High-quality Palestinian speech recordings
 - **Preprocessing**: Audio normalized and resampled to 16kHz
 ## 🛠️ Advanced Usage
+### Running the model using batching
 ```python
+file = "reference_file.wav" ## can be mp3/wav/ogg or anything that librosa supports
+text = ["مرحبا، كيف حالك؟", "بتعرف إنه انا بقدر احكي فلسطيني و English مع بعض Without Errors."]
+context_tokens = [mira_tts.encode_audio(file)]
+audio = mira_tts.batch_generate(text, context_tokens)
+Audio(audio, rate=48000)
 ```
+### Adjusting Generation Parameters
 ```python
+# More creative/variable output
+mira_tts.set_params(
+    top_p=0.95,
+    top_k=20,
+    temperature=0.01, # Higher = more variation
+    max_new_tokens=1024,
+    repetition_penalty=2.2,
+    min_p=0.05
+)
 ```
 ---
 This model is dedicated to the Palestinian people and their enduring struggle for freedom, dignity, and justice. Through technology, we preserve and celebrate Palestinian culture, language, and identity.
+**Free Palestine** 🇵🇸
 > *"We will not be erased. Our voices will echo through time, in every language model, every algorithm, every line of code. Palestine lives, and so does its voice."*
 - **Model Repository**: [hamdallah/Sofelia-TTS](https://huggingface.co/hamdallah/Sofelia-TTS)
 - **Issues & Questions**: Use the Community tab or open an issue
 ---
 ## 🔗 Related Resources
 - [YatharthS/MiraTTS](https://huggingface.co/YatharthS/MiraTTS) - Base model
 - [ncodec](https://github.com/YatharthS/ncodec) - Audio codec library
 ---
 ---
+**Made with ❤️ for Palestine**