Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,138 +1,124 @@
|
|
| 1 |
---
|
|
|
|
| 2 |
language:
|
| 3 |
-
- ar
|
| 4 |
tags:
|
| 5 |
-
-
|
| 6 |
-
- arabic
|
| 7 |
-
- cosyvoice
|
| 8 |
-
- lora
|
| 9 |
-
|
| 10 |
---
|
| 11 |
|
| 12 |
-
# BayanSynthTTS Checkpoints
|
| 13 |
-
|
| 14 |
-
Arabic TTS LoRA checkpoint fine-tuned on CosyVoice3.
|
| 15 |
-
For the full library and usage instructions see the [BayanSynthTTS GitHub repo](
|
| 16 |
-
https://github.com/Ramendan/BayanSynthTTS).
|
| 17 |
|
| 18 |
-
|
|
|
|
| 19 |
|
| 20 |
-
|
| 21 |
|
| 22 |
-
|
| 23 |
|
| 24 |
## Audio Demos
|
| 25 |
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
### 1. Basic synthesis (auto-tashkeel on)
|
| 29 |
|
| 30 |
-
|
| 31 |
-
|
|
|
|
| 32 |
|
| 33 |
-
<audio controls
|
| 34 |
|
| 35 |
---
|
| 36 |
|
| 37 |
### 2. Pre-diacritized text (mishkal off)
|
| 38 |
|
| 39 |
-
|
| 40 |
-
|
|
|
|
| 41 |
|
| 42 |
-
<audio controls
|
| 43 |
|
| 44 |
---
|
| 45 |
|
| 46 |
-
### 3.
|
| 47 |
|
| 48 |
-
|
|
|
|
|
|
|
| 49 |
|
| 50 |
-
<audio controls
|
| 51 |
-
|
| 52 |
-
Input: `ููุฐูุง ุงูุตููููุชู ู
ูุณูุชูููุณูุฎู ู
ููู ู
ูููุทูุนู ุตูููุชูููู ููุตููุฑู.`
|
| 53 |
-
*(This voice is cloned from a short audio clip.)*
|
| 54 |
-
|
| 55 |
-
Cloned output:
|
| 56 |
-
|
| 57 |
-
<audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/03_voice_cloning.wav" type="audio/wav"></audio>
|
| 58 |
|
| 59 |
---
|
| 60 |
|
| 61 |
-
### 4.
|
| 62 |
|
| 63 |
-
|
| 64 |
-
|
|
|
|
| 65 |
|
| 66 |
-
<audio controls
|
| 67 |
|
| 68 |
---
|
| 69 |
|
| 70 |
-
### 5.
|
| 71 |
-
|
| 72 |
-
Slow (0.80x) โ `ู
ูุฑูุญูุจุงู ุจูููู
ู ููู ุจูููุงููุณููููุซู. ููุฐูุง ุชููููููุฏู ุจูุณูุฑูุนูุฉู ู
ูุฎููููุถูุฉู ูููุชููููุถููุญู.`
|
| 73 |
-
*(Welcome to BayanSynth. This is synthesis at reduced speed for demonstration.)*
|
| 74 |
|
| 75 |
-
|
|
|
|
|
|
|
| 76 |
|
| 77 |
-
|
| 78 |
-
*(Welcome to BayanSynth. This is synthesis at elevated speed for demonstration.)*
|
| 79 |
-
|
| 80 |
-
<audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/06_fast_speed.wav" type="audio/wav"></audio>
|
| 81 |
|
| 82 |
---
|
| 83 |
|
| 84 |
-
### 6.
|
| 85 |
|
| 86 |
-
|
| 87 |
-
*(Welcome. This is an example of using an instruct prompt to control voice style.)*
|
| 88 |
-
Instruct: *"Speak in a warm, clear newsreader style with careful diction."*
|
| 89 |
|
| 90 |
-
<audio controls
|
| 91 |
|
| 92 |
---
|
| 93 |
|
| 94 |
-
### 7.
|
| 95 |
-
|
| 96 |
-
Input: `ุงููุฌูููุฏูุฉู ุงููุนูุงููููุฉู ููุชููููููููุงุชู ุงูุฐููููุงุกู ุงูุงุตูุทูููุงุนูููู ุชูุณูุงููู
ู ููู ุจูููุงุกู ู
ูุณูุชูููุจููู ุจูุงููุฑู ููููุฃูุฌูููุงูู.`
|
| 97 |
-
*(The high quality of AI technologies contributes to building a brilliant future for generations.)*
|
| 98 |
-
|
| 99 |
-
seed=42:
|
| 100 |
|
| 101 |
-
|
|
|
|
|
|
|
| 102 |
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
<audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/10_phonetics_s2.wav" type="audio/wav"></audio>
|
| 106 |
|
| 107 |
---
|
| 108 |
|
| 109 |
-
### 8.
|
| 110 |
-
|
| 111 |
-
Input: `ุฅูููู ููุธูุงู
ู ุจูููุงููุณููููุซ ููููุฏููู ุฅูููู ุชูููุฏููู
ู ุชูุฌูุฑูุจูุฉู ุตูููุชููููุฉู ููุฑููุฏูุฉูุ ุชูุฌูู
ูุนู ุจููููู ุฏููููุฉู ุงููููุทููู ููุฌูู
ูุงูู ุงููุฃูุฏูุงุกู.`
|
| 112 |
-
*(BayanSynth aims to deliver a unique voice experience that combines precise pronunciation with beauty of delivery.)*
|
| 113 |
|
| 114 |
-
|
|
|
|
|
|
|
| 115 |
|
| 116 |
-
<audio controls
|
| 117 |
-
|
| 118 |
-
seed=99 (different prosody):
|
| 119 |
-
|
| 120 |
-
<audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/11_flow_s2.wav" type="audio/wav"></audio>
|
| 121 |
|
| 122 |
---
|
| 123 |
|
| 124 |
-
##
|
| 125 |
|
| 126 |
-
|
| 127 |
-
|
|
|
|
|
|
|
| 128 |
|
| 129 |
-
|
| 130 |
-
*(The scholar knew that the flag rises with knowledge, so he inquired about the sciences of the ancients.)*
|
| 131 |
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
|
| 136 |
-
|
|
|
|
| 137 |
|
| 138 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
language:
|
| 4 |
+
- ar
|
| 5 |
tags:
|
| 6 |
+
- tts
|
| 7 |
+
- arabic
|
| 8 |
+
- cosyvoice
|
| 9 |
+
- lora
|
| 10 |
+
- speech-synthesis
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# BayanSynthTTS โ Arabic TTS Checkpoints
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
+
Fine-tuned LoRA weights for **CosyVoice 3** (Arabic).
|
| 16 |
+
Trained on ~4 h of diacritized Arabic speech.
|
| 17 |
|
| 18 |
+
**GitHub:** [Ramendan/BayanSynthTTS](https://github.com/Ramendan/BayanSynthTTS)
|
| 19 |
|
| 20 |
+
---
|
| 21 |
|
| 22 |
## Audio Demos
|
| 23 |
|
| 24 |
+
### 1. Basic synthesis (pre-diacritized)
|
|
|
|
|
|
|
| 25 |
|
| 26 |
+
> ู
ูุฑูุญูุจูุงุ ุฃูููุง ุจูููุงููุณููููุซุ ููุธูุงู
ู ููุชููููููุฏู ุงููููููุงู
ู ุงููุนูุฑูุจูููู.
|
| 27 |
+
>
|
| 28 |
+
> *Hello, I am BayanSynth, a system for generating Arabic speech.*
|
| 29 |
|
| 30 |
+
<audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/01_basic.wav"></audio>
|
| 31 |
|
| 32 |
---
|
| 33 |
|
| 34 |
### 2. Pre-diacritized text (mishkal off)
|
| 35 |
|
| 36 |
+
> ุฅูููู ุงููููุบูุฉู ุงููุนูุฑูุจููููุฉู ููููุฒู ู
ููู ุงูุซููููุงููุฉู ููุงูุชููุฑูุงุซู.
|
| 37 |
+
>
|
| 38 |
+
> *The Arabic language is a treasure of culture and heritage.*
|
| 39 |
|
| 40 |
+
<audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/02_prediacritized.wav"></audio>
|
| 41 |
|
| 42 |
---
|
| 43 |
|
| 44 |
+
### 3. Longer passage (auto-tashkeel, speed 0.88)
|
| 45 |
|
| 46 |
+
> ุงูุฐูุงุก ุงูุงุตุทูุงุนู ูู ุฃุญุฏ ุฃุจุฑุฒ ุงูุชุทูุฑุงุช ุงูุชูููููุฌูุฉ ูู ุนุตุฑูุง ุงูุญุฏูุซ. ูุนุชู
ุฏ ุนูู ุชุญููู ูู
ูุงุช ุถุฎู
ุฉ ู
ู ุงูุจูุงูุงุช ูุงุณุชุฎูุงุต ุฃูู
ุงุท ู
ุนูุฏุฉ. ูู
ู ุฃุจุฑุฒ ุชุทุจููุงุชู ูุธู
ุงูุชุนุฑู ุนูู ุงูุตูุช ูุชุฑุฌู
ุฉ ุงููุบุงุช ูุชูููุฏ ุงููุตูุต.
|
| 47 |
+
>
|
| 48 |
+
> *Artificial intelligence is one of the most prominent technological advances of our era. It relies on analyzing massive amounts of data to extract complex patterns. Among its most notable applications: speech recognition, language translation, and text generation.*
|
| 49 |
|
| 50 |
+
<audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/04_long_text.wav"></audio>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
---
|
| 53 |
|
| 54 |
+
### 4. Phonetics test (seed=42)
|
| 55 |
|
| 56 |
+
> ุงููุฌูููุฏูุฉู ุงููุนูุงููููุฉู ููุชููููููููุงุชู ุงูุฐููููุงุกู ุงูุงุตูุทูููุงุนูููู ุชูุณูุงููู
ู ููู ุจูููุงุกู ู
ูุณูุชูููุจููู ุจูุงููุฑู ููููุฃูุฌูููุงูู.
|
| 57 |
+
>
|
| 58 |
+
> *The high quality of AI technologies contributes to building a brilliant future for generations to come.*
|
| 59 |
|
| 60 |
+
<audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/10_phonetics_s2.wav"></audio>
|
| 61 |
|
| 62 |
---
|
| 63 |
|
| 64 |
+
### 5. Flow & rhythm (seed=42)
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
+
> ุฅูููู ููุธูุงู
ู ุจูููุงููุณููููุซ ููููุฏููู ุฅูููู ุชูููุฏููู
ู ุชูุฌูุฑูุจูุฉู ุตูููุชููููุฉู ููุฑููุฏูุฉูุ ุชูุฌูู
ูุนู ุจููููู ุฏููููุฉู ุงููููุทููู ููุฌูู
ูุงูู ุงููุฃูุฏูุงุกู.
|
| 67 |
+
>
|
| 68 |
+
> *BayanSynth aims to deliver a unique voice experience that combines precise pronunciation with beauty of delivery.*
|
| 69 |
|
| 70 |
+
<audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/08_flow.wav"></audio>
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
---
|
| 73 |
|
| 74 |
+
### 6. Flow, alternate seed (seed=99)
|
| 75 |
|
| 76 |
+
Same text, different prosody:
|
|
|
|
|
|
|
| 77 |
|
| 78 |
+
<audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/11_flow_s2.wav"></audio>
|
| 79 |
|
| 80 |
---
|
| 81 |
|
| 82 |
+
### 7. Challenge: tashkeel disambiguation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
|
| 84 |
+
> ุนูููู
ู ุงููุนูุงููู
ู ุฃูููู ุงููุนูููู
ู ููุนูููู ุจูุงููุนูููู
ูุ ููุงุณูุชูุนูููู
ู ุนููู ุนููููู
ู ุงููุฃููููููููู.
|
| 85 |
+
>
|
| 86 |
+
> *The scholar knew that the flag rises with knowledge, so he inquired about the sciences of the ancients.*
|
| 87 |
|
| 88 |
+
<audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/09_challenge.wav"></audio>
|
|
|
|
|
|
|
| 89 |
|
| 90 |
---
|
| 91 |
|
| 92 |
+
### 8. Instruct prompt: warm newsreader style
|
|
|
|
|
|
|
|
|
|
| 93 |
|
| 94 |
+
> ู
ูุฑูุญูุจุงู ุจูููู
ู. ููุฐูุง ู
ูุซูุงูู ุนูููู ุงุณูุชูุฎูุฏูุงู
ู ุงูุชููููุฌูููู ููุถูุจูุทู ุฃูุณููููุจู ุงูุตููููุชู.
|
| 95 |
+
>
|
| 96 |
+
> *Welcome. This is an example of using an instruct prompt to control voice style.*
|
| 97 |
|
| 98 |
+
<audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/12_instruct.wav"></audio>
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
|
| 100 |
---
|
| 101 |
|
| 102 |
+
## Files
|
| 103 |
|
| 104 |
+
| File | Description |
|
| 105 |
+
|------|-------------|
|
| 106 |
+
| `epoch_28_whole.pt` | LoRA weights (LLM, 629 keys) โ main checkpoint |
|
| 107 |
+
| `samples/*.wav` | Pre-generated audio demos |
|
| 108 |
|
| 109 |
+
## Usage
|
|
|
|
| 110 |
|
| 111 |
+
```bash
|
| 112 |
+
pip install bayansynthtts
|
| 113 |
+
```
|
| 114 |
|
| 115 |
+
```python
|
| 116 |
+
from bayansynthtts import BayanSynthTTS
|
| 117 |
|
| 118 |
+
tts = BayanSynthTTS()
|
| 119 |
+
audio = tts.synthesize(
|
| 120 |
+
"ู
ูุฑูุญูุจูุงุ ุฃูููุง ุจูููุงููุณููููุซุ ููุธูุงู
ู ููุชููููููุฏู ุงููููููุงู
ู ุงููุนูุฑูุจูููู.",
|
| 121 |
+
auto_tashkeel=False,
|
| 122 |
+
)
|
| 123 |
+
tts.save_wav(audio, "output.wav")
|
| 124 |
+
```
|