Update README.md

Browse files

Files changed (1) hide show

README.md +365 -10

README.md CHANGED Viewed

@@ -1,21 +1,376 @@
-# hamdallah/Sofelia-TTS
-Fine-tuned MiraTTS checkpoint.
-## Usage (CLI)
 ```bash
-python training/test_miratts.py \
-  --checkpoint hamdallah/Sofelia-TTS \
-  --audio-file ref.wav \
-  --text "Hello from my MiraTTS model."
 ```
-## Usage (Python)
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 from ncodec.codec import TTSCodec
-model = AutoModelForCausalLM.from_pretrained("hamdallah/Sofelia-TTS", trust_remote_code=True)
-tokenizer = AutoTokenizer.from_pretrained("hamdallah/Sofelia-TTS", trust_remote_code=True)
 codec = TTSCodec()
 ```

+---
+language:
+- ar
+license: apache-2.0
+tags:
+- text-to-speech
+- tts
+- audio
+- speech
+- palestinian-arabic
+- arabic
+- voice-cloning
+- miratts
+- sofelia
+base_model: YatharthS/MiraTTS
+datasets:
+- hamdallah/ar-gemini
+library_name: transformers
+pipeline_tag: text-to-speech
+---
+<div style="text-align: center;">
+  <h1>🇵🇸 Sofelia-TTS 🇵🇸</h1>
+  <p><strong>Palestinian Arabic Text-to-Speech Model</strong></p>
+  <p><em>From the river to the sea, Palestine will be free</em> 🕊️</p>
+</div>
+---
+## 🌟 Model Description
+**Sofelia-TTS** is a fine-tuned Text-to-Speech (TTS) model specifically trained for **Palestinian Arabic dialect**. This model brings the beautiful sounds of Palestinian speech to AI, preserving and celebrating the linguistic heritage of Palestine.
+Built on top of [YatharthS/MiraTTS](https://huggingface.co/YatharthS/MiraTTS), Sofelia-TTS captures the unique phonetic characteristics, intonation patterns, and prosody of Palestinian Arabic, making it ideal for:
+- 🎙️ **Voice cloning** with Palestinian Arabic speech
+- 📚 **Audiobook generation** in Palestinian dialect
+- 🗣️ **Virtual assistants** that speak authentic Palestinian Arabic
+- 🎓 **Educational tools** for learning and preserving the Palestinian dialect
+- 🎬 **Content creation** for Palestinian media and storytelling
+> **Dedicated to Palestine**: This model is a tribute to the resilience, culture, and spirit of the Palestinian people. May their voices be heard loud and clear across the world. 🇵🇸
+---
+## 🎯 Key Features
+- ✅ **High-quality voice cloning**: Clone any voice with just a few seconds of reference audio
+- ✅ **Palestinian Arabic dialect**: Authentic pronunciation and intonation
+- ✅ **Fast inference**: Optimized for real-time generation
+- ✅ **Flexible context**: Supports variable-length reference audio
+- ✅ **Open source**: Free to use and improve
+---
+## 📊 Model Details
+| **Attribute** | **Value** |
+|---------------|-----------|
+| **Model Type** | Text-to-Speech (TTS) |
+| **Base Model** | YatharthS/MiraTTS |
+| **Architecture** | Transformer-based Language Model + Audio Codec |
+| **Training Language** | Palestinian Arabic (ar-PS) |
+| **Dataset** | [hamdallah/ar-gemini](https://huggingface.co/datasets/hamdallah/ar-gemini) |
+| **Sample Rate** | 16,000 Hz |
+| **License** | Apache 2.0 |
+| **Model Size** | ~1.3B parameters |
+| **Precision** | BF16/FP32 |
+| **Framework** | PyTorch + Transformers |
+---
+## 🚀 Quick Start
+### Installation
 ```bash
+# Install required packages
+pip install torch transformers datasets
+pip install git+https://github.com/YatharthS/ncodec.git
 ```
+### Usage (Python)
 ```python
+import torch
 from transformers import AutoTokenizer, AutoModelForCausalLM
 from ncodec.codec import TTSCodec
+# Load model and tokenizer
+model_id = "hamdallah/Sofelia-TTS"
+model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+# Initialize audio codec
 codec = TTSCodec()
+# Prepare your text (Palestinian Arabic)
+text = "مرحبا، كيف الحال؟ هذا نموذج للهجة الفلسطينية."
+# Load reference audio (3-10 seconds of speech)
+reference_audio_path = "path/to/reference_voice.wav"
+# Generate speech
+import torchaudio
+# Load and resample reference audio to 16kHz
+waveform, sample_rate = torchaudio.load(reference_audio_path)
+if sample_rate != 16000:
+    resampler = torchaudio.transforms.Resample(sample_rate, 16000)
+    waveform = resampler(waveform)
+# Encode reference audio to get context tokens
+audio_array = waveform.squeeze().numpy()
+semantic_tokens, context_tokens = codec.audio_encoder.encode(audio_array, True, duration=10)
+# Create prompt
+prompt = (
+    f"<|task_tts|><|start_text|>{text}<|end_text|>"
+    f"<|context_audio_start|>{context_tokens}<|context_audio_end|>"
+    f"<|prompt_speech_start|>{semantic_tokens}"
+)
+# Tokenize and generate
+inputs = tokenizer(prompt, return_tensors="pt")
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_length=2048,
+        do_sample=True,
+        temperature=0.7,
+        top_p=0.95,
+    )
+# Decode to audio
+generated_text = tokenizer.decode(outputs[0], skip_special_tokens=False)
+audio_output = codec.decode(generated_text)
+# Save output
+torchaudio.save("output.wav", torch.from_numpy(audio_output).unsqueeze(0), 16000)
+print("✅ Audio saved to output.wav")
 ```
+### Usage (CLI)
+If you have the training scripts:
+```bash
+# Clone the repository with inference scripts
+git clone https://huggingface.co/hamdallah/Sofelia-TTS
+cd Sofelia-TTS
+# Run inference
+python test_miratts.py \
+  --model-id hamdallah/Sofelia-TTS \
+  --audio-file reference_voice.wav \
+  --text "مرحباً من فلسطين الحرة" \
+  --output-file output.wav
+```
+---
+## 🎤 Example Prompts
+Try these Palestinian Arabic phrases:
+```python
+# Greetings
+"مرحبا، كيف حالك؟"  # Hello, how are you?
+"أهلا وسهلا فيك"      # Welcome
+# Common expressions
+"يا سلام، هذا رائع"   # Wow, this is amazing
+"ما شاء الله"         # Mashallah
+"الله يعطيك العافية"  # God give you wellness
+# About Palestine
+"فلسطين حرة من النهر إلى البحر"  # Palestine is free from the river to the sea
+"القدس عاصمة فلسطين الأبدية"     # Jerusalem is the eternal capital of Palestine
+"سنعود يوماً إلى ديارنا"         # We will return one day to our homes
+```
+---
+## 🎓 Training Details
+### Training Data
+- **Dataset**: [hamdallah/ar-gemini](https://huggingface.co/datasets/hamdallah/ar-gemini)
+- **Language**: Palestinian Arabic dialect
+- **Hours of audio**: High-quality Palestinian speech recordings
+- **Preprocessing**: Audio normalized and resampled to 16kHz
+### Training Configuration
+| **Hyperparameter** | **Value** |
+|--------------------|-----------|
+| **Learning Rate** | 2e-4 (initial), 1e-5 (refinement) |
+| **Batch Size** | 8 (effective: 2 per device × 4 accumulation steps) |
+| **Training Steps** | 2000+ |
+| **Warmup Steps** | 100 |
+| **Max Audio Length** | 20-30 seconds |
+| **Optimizer** | AdamW |
+| **LR Scheduler** | Cosine with warmup |
+| **Gradient Clipping** | 1.0 |
+| **Precision** | BF16 (H100) / FP32 |
+| **Hardware** | NVIDIA H100 / A100 GPU |
+### Training Process
+The model was trained using a two-phase approach:
+1. **Foundation Phase**: High learning rate (2e-4) for initial adaptation to Palestinian Arabic
+2. **Refinement Phase**: Lower learning rate (1e-5) with NEFTune noise for stability and quality
+---
+## 📈 Model Performance
+The model achieves:
+- ✅ **Natural prosody** matching Palestinian Arabic speech patterns
+- ✅ **Clear pronunciation** of Arabic phonemes
+- ✅ **Voice similarity** to reference audio
+- ✅ **Stable generation** without artifacts or repetitions
+- ✅ **Fast inference** suitable for real-time applications
+---
+## 🛠️ Advanced Usage
+### Adjusting Generation Parameters
+```python
+# More creative/variable output
+outputs = model.generate(
+    **inputs,
+    max_length=2048,
+    do_sample=True,
+    temperature=0.9,  # Higher = more variation
+    top_p=0.95,
+    top_k=50,
+)
+# More deterministic/stable output
+outputs = model.generate(
+    **inputs,
+    max_length=2048,
+    do_sample=True,
+    temperature=0.5,  # Lower = more stable
+    top_p=0.9,
+)
+```
+### Batch Processing
+```python
+# Process multiple texts with the same reference voice
+texts = [
+    "مرحباً",
+    "كيف حالك؟",
+    "فلسطين حرة"
+]
+for i, text in enumerate(texts):
+    prompt = create_prompt(text, reference_audio)  # Your prompt creation function
+    outputs = model.generate(...)
+    save_audio(f"output_{i}.wav", outputs)
+```
+---
+## 💡 Tips for Best Results
+1. **Reference Audio Quality**:
+   - Use clean audio without background noise
+   - 3-10 seconds of speech is ideal
+   - Ensure audio is 16kHz sample rate
+2. **Text Input**:
+   - Use proper Arabic script (not Arabizi/transliteration)
+   - Palestinian dialect works best
+   - Avoid very long sentences (split into shorter segments)
+3. **Generation Parameters**:
+   - `temperature=0.7`: Good default for natural speech
+   - `temperature=0.5`: More stable, less variation
+   - `temperature=0.9`: More expressive, more variation
+---
+## 🌍 About Palestinian Arabic
+Palestinian Arabic is a Levantine Arabic dialect spoken by the Palestinian people. It has unique characteristics:
+- **Phonology**: Preservation of Classical Arabic /q/ as glottal stop [ʔ]
+- **Vocabulary**: Rich in Levantine and unique Palestinian terms
+- **Intonation**: Distinctive melodic patterns
+- **Regional Variants**: Urban (Jerusalem, Hebron) vs. Rural vs. Bedouin varieties
+This model captures these linguistic features, making it authentic and representative of Palestinian speech.
+---
+## 🇵🇸 Message of Solidarity
+This model is dedicated to the Palestinian people and their enduring struggle for freedom, dignity, and justice. Through technology, we preserve and celebrate Palestinian culture, language, and identity.
+**Free Palestine** 🇵🇸 **From the River to the Sea**
+> *"We will not be erased. Our voices will echo through time, in every language model, every algorithm, every line of code. Palestine lives, and so does its voice."*
+---
+## 📜 License
+This model is released under the **Apache 2.0 License**, making it free for:
+- ✅ Commercial use
+- ✅ Modification and distribution
+- ✅ Private use
+- ✅ Patent use
+---
+## 🙏 Acknowledgments
+- **Base Model**: [YatharthS/MiraTTS](https://huggingface.co/YatharthS/MiraTTS) - Thank you for the excellent foundation
+- **Dataset**: Palestinian Arabic speakers who contributed their voices
+- **Community**: The open-source AI community for tools and support
+- **Palestine**: For being the inspiration and purpose behind this work
+---
+## 📞 Contact & Support
+- **Model Repository**: [hamdallah/Sofelia-TTS](https://huggingface.co/hamdallah/Sofelia-TTS)
+- **Issues & Questions**: Use the Community tab or open an issue
+- **Dataset**: [hamdallah/ar-gemini](https://huggingface.co/datasets/hamdallah/ar-gemini)
+---
+## 🔗 Related Resources
+- [YatharthS/MiraTTS](https://huggingface.co/YatharthS/MiraTTS) - Base model
+- [hamdallah/ar-gemini](https://huggingface.co/datasets/hamdallah/ar-gemini) - Training dataset
+- [ncodec](https://github.com/YatharthS/ncodec) - Audio codec library
+---
+## 📚 Citation
+If you use this model in your research or projects, please cite:
+```bibtex
+@misc{sofelia-tts-2026,
+  author = {Hamdallah},
+  title = {Sofelia-TTS: Palestinian Arabic Text-to-Speech Model},
+  year = {2026},
+  publisher = {Hugging Face},
+  journal = {Hugging Face Model Hub},
+  howpublished = {\url{https://huggingface.co/hamdallah/Sofelia-TTS}},
+}
+```
+---
+<div style="text-align: center; padding: 20px;">
+  <h2>🇵🇸 FREE PALESTINE 🇵🇸</h2>
+  <p><strong>تحيا فلسطين حرة أبية</strong></p>
+  <p><em>Long Live Free Palestine</em></p>
+  <p>🕊️ ✊ 🇵🇸</p>
+</div>
+---
+**Made with ❤️ for Palestine**