Ramendan
/

BayanSynthTTS-checkpoints

@@ -1,138 +1,124 @@
 ---
 language:
-- ar
 tags:
-- text-to-speech
-- arabic
-- cosyvoice
-- lora
-license: apache-2.0
 ---
-# BayanSynthTTS Checkpoints
-Arabic TTS LoRA checkpoint fine-tuned on CosyVoice3.
-For the full library and usage instructions see the [BayanSynthTTS GitHub repo](
-https://github.com/Ramendan/BayanSynthTTS).
-## Checkpoint
-- `epoch_28_whole.pt` — LLM LoRA, epoch 28 (~1.9 GB)
-Place it at `checkpoints/llm/epoch_28_whole.pt` inside the BayanSynthTTS directory, then run `python scripts/setup_models.py` to download the base CosyVoice3 weights automatically.
 ## Audio Demos
-All samples were generated with this checkpoint. No post-processing applied.
-### 1. Basic synthesis (auto-tashkeel on)
-Input: `مرحباً أنا بيانسينث، نظام لتوليد الكلام العربي`
-*(Hello, I am BayanSynth, an Arabic speech synthesis system)*
-<audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/01_basic.wav" type="audio/wav"></audio>
 ---
 ### 2. Pre-diacritized text (mishkal off)
-Input: `إِنَّ اللُّغَةَ الْعَرَبِيَّةَ كَنْزٌ مِنَ الثَّقَافَةِ وَالتُّرَاثِ.`
-*(The Arabic language is a treasure of culture and heritage.)*
-<audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/02_prediacritized.wav" type="audio/wav"></audio>
 ---
-### 3. Voice cloning
-Reference voice (muffled-talking.wav trimmed to 10 s):
-<audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/ref_voice_muffled.wav" type="audio/wav"></audio>
-Input: `هَذَا الصَّوْتُ مُسْتَنْسَخٌ مِنْ مَقْطَعٍ صَوْتِيٍّ قَصِيرٍ.`
-*(This voice is cloned from a short audio clip.)*
-Cloned output:
-<audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/03_voice_cloning.wav" type="audio/wav"></audio>
 ---
-### 4. Longer passage (AI topic, 3 sentences, speed=0.88)
-Input: `الذكاء الاصطناعي هو أحد أبرز التطورات التكنولوجية في عصرنا الحديث. يعتمد على تحليل كميات ضخمة من البيانات لاستخلاص أنماط معقدة. ومن أبرز تطبيقاته نظم التعرف على الصوت وترجمة اللغات وتوليد النصوص.`
-*(Artificial intelligence is one of the most prominent technological advances of our era. It relies on analyzing massive amounts of data to extract complex patterns. Among its most notable applications: speech recognition, language translation, and text generation.)*
-<audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/04_long_text.wav" type="audio/wav"></audio>
 ---
-### 5. Speed control
-Slow (0.80x) — `مَرْحَباً بِكُمْ فِي بَيَانْسِينْثِ. هَذَا تَوْلِيدٌ بِسُرْعَةٍ مُخَفَّضَةٍ لِلتَّوْضِيحِ.`
-*(Welcome to BayanSynth. This is synthesis at reduced speed for demonstration.)*
-<audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/05_slow_speed.wav" type="audio/wav"></audio>
-Fast (1.20x) — `مَرْحَباً بِكُمْ فِي بَيَانْسِينْثِ. هَذَا تَوْلِيدٌ بِسُرْعَةٍ مُرْتَفَعَةٍ لِلتَّوْضِيحِ.`
-*(Welcome to BayanSynth. This is synthesis at elevated speed for demonstration.)*
-<audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/06_fast_speed.wav" type="audio/wav"></audio>
 ---
-### 6. Instruct prompt: warm newsreader style
-Input: `مَرْحَباً بِكُمْ. هَذَا مِثَالٌ عَلَى اسْتِخْدَامِ التَّوْجِيهِ لِضَبْطِ أُسْلُوبِ الصَّوْتِ.`
-*(Welcome. This is an example of using an instruct prompt to control voice style.)*
-Instruct: *"Speak in a warm, clear newsreader style with careful diction."*
-<audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/12_instruct.wav" type="audio/wav"></audio>
 ---
-### 7. Phonetics test: halqiyyat, tanwin, shaddah
-Input: `الْجَوْدَةُ الْعَالِيَةُ لِتَقْنِيَّاتِ الذَّكَاءِ الاصْطِنَاعِيِّ تُسَاهِمُ فِي بِنَاءِ مُسْتَقْبَلٍ بَاهِرٍ لِلْأَجْيَالِ.`
-*(The high quality of AI technologies contributes to building a brilliant future for generations.)*
-seed=42:
-<audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/07_phonetics.wav" type="audio/wav"></audio>
-seed=17 (different prosody):
-<audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/10_phonetics_s2.wav" type="audio/wav"></audio>
 ---
-### 8. Flow and rhythm test
-Input: `إِنَّ نِظَامَ بَيَانِسِينْث يَهْدِفُ إِلَى تَقْدِيمِ تَجْرِبَةٍ صَوْتِيَّةٍ فَرِيدَةٍ، تَجْمَعُ بَيْنَ دِقَّةِ النُّطْقِ وَجَمَالِ الْأَدَاءِ.`
-*(BayanSynth aims to deliver a unique voice experience that combines precise pronunciation with beauty of delivery.)*
-seed=42:
-<audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/08_flow.wav" type="audio/wav"></audio>
-seed=99 (different prosody):
-<audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/11_flow_s2.wav" type="audio/wav"></audio>
 ---
-### 9. Tashkeel disambiguation challenge
-Words `عَلِ��َ / عَالِم / عَلَم / عِلْم` in a single sentence:
-*(he knew / scholar / flag / knowledge)*
-`عَلِمَ الْعَالِمُ أَنَّ الْعَلَمَ يَعْلُو بِالْعِلْمِ، فَاسْتَعْلَمَ عَنْ عُلُومِ الْأَوَّلِينَ.`
-*(The scholar knew that the flag rises with knowledge, so he inquired about the sciences of the ancients.)*
-<audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/09_challenge.wav" type="audio/wav"></audio>
----
-## License
-Apache 2.0. LoRA checkpoint trained on Common Voice Arabic data is released under CC-BY 4.0.

 ---
+license: apache-2.0
 language:
+  - ar
 tags:
+  - tts
+  - arabic
+  - cosyvoice
+  - lora
+  - speech-synthesis
 ---
+# BayanSynthTTS — Arabic TTS Checkpoints
+Fine-tuned LoRA weights for **CosyVoice 3** (Arabic).
+Trained on ~4 h of diacritized Arabic speech.
+**GitHub:** [Ramendan/BayanSynthTTS](https://github.com/Ramendan/BayanSynthTTS)
+---
 ## Audio Demos
+### 1. Basic synthesis (pre-diacritized)
+> مَرْحَبًا، أَنَا بَيَانْسِينْث، نِظَامٌ لِتَوْلِيدِ الْكَلَامِ الْعَرَبِيِّ.
+>
+> *Hello, I am BayanSynth, a system for generating Arabic speech.*
+<audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/01_basic.wav"></audio>
 ---
 ### 2. Pre-diacritized text (mishkal off)
+> إِنَّ اللُّغَةَ الْعَرَبِيَّةَ كَنْزٌ مِنَ الثَّقَافَةِ وَالتُّرَاثِ.
+>
+> *The Arabic language is a treasure of culture and heritage.*
+<audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/02_prediacritized.wav"></audio>
 ---
+### 3. Longer passage (auto-tashkeel, speed 0.88)
+> الذكاء الاصطناعي هو أحد أبرز التطورات التكنولوجية في عصرنا الحديث. يعتمد على تحليل كميات ضخمة من البيانات لاستخلاص أنماط معقدة. ومن أبرز تطبيقاته نظم التعرف على الصوت وترجمة اللغات وتوليد النصوص.
+>
+> *Artificial intelligence is one of the most prominent technological advances of our era. It relies on analyzing massive amounts of data to extract complex patterns. Among its most notable applications: speech recognition, language translation, and text generation.*
+<audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/04_long_text.wav"></audio>
 ---
+### 4. Phonetics test (seed=42)
+> الْجَوْدَةُ الْعَالِيَةُ لِتَقْنِيَّاتِ الذَّكَاءِ الاصْطِنَاعِيِّ تُسَاهِمُ فِي بِنَاءِ مُسْتَقْبَلٍ بَاهِرٍ لِلْأَجْيَالِ.
+>
+> *The high quality of AI technologies contributes to building a brilliant future for generations to come.*
+<audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/10_phonetics_s2.wav"></audio>
 ---
+### 5. Flow & rhythm (seed=42)
+> إِنَّ نِظَامَ بَيَانِسِينْث يَهْدِفُ إِلَى تَقْدِيمِ تَجْرِبَةٍ صَوْتِيَّةٍ فَرِيدَةٍ، تَجْمَعُ بَيْنَ دِقَّةِ النُّطْقِ وَجَمَالِ الْأَدَاءِ.
+>
+> *BayanSynth aims to deliver a unique voice experience that combines precise pronunciation with beauty of delivery.*
+<audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/08_flow.wav"></audio>
 ---
+### 6. Flow, alternate seed (seed=99)
+Same text, different prosody:
+<audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/11_flow_s2.wav"></audio>
 ---
+### 7. Challenge: tashkeel disambiguation
+> عَلِمَ الْعَالِمُ أَنَّ الْعَلَمَ يَعْلُو بِالْعِلْمِ، فَاسْتَعْلَمَ عَنْ عُلُومِ الْأَوَّلِينَ.
+>
+> *The scholar knew that the flag rises with knowledge, so he inquired about the sciences of the ancients.*
+<audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/09_challenge.wav"></audio>
 ---
+### 8. Instruct prompt: warm newsreader style
+> مَرْحَباً بِكُمْ. هَذَا مِثَالٌ عَلَى اسْتِخْدَامِ التَّوْجِيهِ لِضَبْطِ أُسْلُوبِ الصَّوْتِ.
+>
+> *Welcome. This is an example of using an instruct prompt to control voice style.*
+<audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/12_instruct.wav"></audio>
 ---
+## Files
+| File | Description |
+|------|-------------|
+| `epoch_28_whole.pt` | LoRA weights (LLM, 629 keys) — main checkpoint |
+| `samples/*.wav` | Pre-generated audio demos |
+## Usage
+```bash
+pip install bayansynthtts
+```
+```python
+from bayansynthtts import BayanSynthTTS
+tts = BayanSynthTTS()
+audio = tts.synthesize(
+    "مَرْحَبًا، أَنَا بَيَانْسِينْث، نِظَامٌ لِتَوْلِيدِ الْكَلَامِ الْعَرَبِيِّ.",
+    auto_tashkeel=False,
+)
+tts.save_wav(audio, "output.wav")
+```