Ramendan commited on
Commit
8e0ef58
ยท
verified ยท
1 Parent(s): 3d4381a

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +67 -81
README.md CHANGED
@@ -1,138 +1,124 @@
1
  ---
 
2
  language:
3
- - ar
4
  tags:
5
- - text-to-speech
6
- - arabic
7
- - cosyvoice
8
- - lora
9
- license: apache-2.0
10
  ---
11
 
12
- # BayanSynthTTS Checkpoints
13
-
14
- Arabic TTS LoRA checkpoint fine-tuned on CosyVoice3.
15
- For the full library and usage instructions see the [BayanSynthTTS GitHub repo](
16
- https://github.com/Ramendan/BayanSynthTTS).
17
 
18
- ## Checkpoint
 
19
 
20
- - `epoch_28_whole.pt` โ€” LLM LoRA, epoch 28 (~1.9 GB)
21
 
22
- Place it at `checkpoints/llm/epoch_28_whole.pt` inside the BayanSynthTTS directory, then run `python scripts/setup_models.py` to download the base CosyVoice3 weights automatically.
23
 
24
  ## Audio Demos
25
 
26
- All samples were generated with this checkpoint. No post-processing applied.
27
-
28
- ### 1. Basic synthesis (auto-tashkeel on)
29
 
30
- Input: `ู…ุฑุญุจุงู‹ ุฃู†ุง ุจูŠุงู†ุณูŠู†ุซุŒ ู†ุธุงู… ู„ุชูˆู„ูŠุฏ ุงู„ูƒู„ุงู… ุงู„ุนุฑุจูŠ`
31
- *(Hello, I am BayanSynth, an Arabic speech synthesis system)*
 
32
 
33
- <audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/01_basic.wav" type="audio/wav"></audio>
34
 
35
  ---
36
 
37
  ### 2. Pre-diacritized text (mishkal off)
38
 
39
- Input: `ุฅูู†ูŽู‘ ุงู„ู„ูู‘ุบูŽุฉูŽ ุงู„ู’ุนูŽุฑูŽุจููŠูŽู‘ุฉูŽ ูƒูŽู†ู’ุฒูŒ ู…ูู†ูŽ ุงู„ุซูŽู‘ู‚ูŽุงููŽุฉู ูˆูŽุงู„ุชูู‘ุฑูŽุงุซู.`
40
- *(The Arabic language is a treasure of culture and heritage.)*
 
41
 
42
- <audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/02_prediacritized.wav" type="audio/wav"></audio>
43
 
44
  ---
45
 
46
- ### 3. Voice cloning
47
 
48
- Reference voice (muffled-talking.wav trimmed to 10 s):
 
 
49
 
50
- <audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/ref_voice_muffled.wav" type="audio/wav"></audio>
51
-
52
- Input: `ู‡ูŽุฐูŽุง ุงู„ุตูŽู‘ูˆู’ุชู ู…ูุณู’ุชูŽู†ู’ุณูŽุฎูŒ ู…ูู†ู’ ู…ูŽู‚ู’ุทูŽุนู ุตูŽูˆู’ุชููŠูู‘ ู‚ูŽุตููŠุฑู.`
53
- *(This voice is cloned from a short audio clip.)*
54
-
55
- Cloned output:
56
-
57
- <audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/03_voice_cloning.wav" type="audio/wav"></audio>
58
 
59
  ---
60
 
61
- ### 4. Longer passage (AI topic, 3 sentences, speed=0.88)
62
 
63
- Input: `ุงู„ุฐูƒุงุก ุงู„ุงุตุทู†ุงุนูŠ ู‡ูˆ ุฃุญุฏ ุฃุจุฑุฒ ุงู„ุชุทูˆุฑุงุช ุงู„ุชูƒู†ูˆู„ูˆุฌูŠุฉ ููŠ ุนุตุฑู†ุง ุงู„ุญุฏูŠุซ. ูŠุนุชู…ุฏ ุนู„ู‰ ุชุญู„ูŠู„ ูƒู…ูŠุงุช ุถุฎู…ุฉ ู…ู† ุงู„ุจูŠุงู†ุงุช ู„ุงุณุชุฎู„ุงุต ุฃู†ู…ุงุท ู…ุนู‚ุฏุฉ. ูˆู…ู† ุฃุจุฑุฒ ุชุทุจูŠู‚ุงุชู‡ ู†ุธู… ุงู„ุชุนุฑู ุนู„ู‰ ุงู„ุตูˆุช ูˆุชุฑุฌู…ุฉ ุงู„ู„ุบุงุช ูˆุชูˆู„ูŠุฏ ุงู„ู†ุตูˆุต.`
64
- *(Artificial intelligence is one of the most prominent technological advances of our era. It relies on analyzing massive amounts of data to extract complex patterns. Among its most notable applications: speech recognition, language translation, and text generation.)*
 
65
 
66
- <audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/04_long_text.wav" type="audio/wav"></audio>
67
 
68
  ---
69
 
70
- ### 5. Speed control
71
-
72
- Slow (0.80x) โ€” `ู…ูŽุฑู’ุญูŽุจุงู‹ ุจููƒูู…ู’ ูููŠ ุจูŽูŠูŽุงู†ู’ุณููŠู†ู’ุซู. ู‡ูŽุฐูŽุง ุชูŽูˆู’ู„ููŠุฏูŒ ุจูุณูุฑู’ุนูŽุฉู ู…ูุฎูŽููŽู‘ุถูŽุฉู ู„ูู„ุชูŽู‘ูˆู’ุถููŠุญู.`
73
- *(Welcome to BayanSynth. This is synthesis at reduced speed for demonstration.)*
74
 
75
- <audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/05_slow_speed.wav" type="audio/wav"></audio>
 
 
76
 
77
- Fast (1.20x) โ€” `ู…ูŽุฑู’ุญูŽุจุงู‹ ุจููƒูู…ู’ ูููŠ ุจูŽูŠูŽุงู†ู’ุณููŠู†ู’ุซู. ู‡ูŽุฐูŽุง ุชูŽูˆู’ู„ููŠุฏูŒ ุจูุณูุฑู’ุนูŽุฉู ู…ูุฑู’ุชูŽููŽุนูŽุฉู ู„ูู„ุชูŽู‘ูˆู’ุถููŠุญู.`
78
- *(Welcome to BayanSynth. This is synthesis at elevated speed for demonstration.)*
79
-
80
- <audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/06_fast_speed.wav" type="audio/wav"></audio>
81
 
82
  ---
83
 
84
- ### 6. Instruct prompt: warm newsreader style
85
 
86
- Input: `ู…ูŽุฑู’ุญูŽุจุงู‹ ุจููƒูู…ู’. ู‡ูŽุฐูŽุง ู…ูุซูŽุงู„ูŒ ุนูŽู„ูŽู‰ ุงุณู’ุชูุฎู’ุฏูŽุงู…ู ุงู„ุชูŽู‘ูˆู’ุฌููŠู‡ู ู„ูุถูŽุจู’ุทู ุฃูุณู’ู„ููˆุจู ุงู„ุตูŽู‘ูˆู’ุชู.`
87
- *(Welcome. This is an example of using an instruct prompt to control voice style.)*
88
- Instruct: *"Speak in a warm, clear newsreader style with careful diction."*
89
 
90
- <audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/12_instruct.wav" type="audio/wav"></audio>
91
 
92
  ---
93
 
94
- ### 7. Phonetics test: halqiyyat, tanwin, shaddah
95
-
96
- Input: `ุงู„ู’ุฌูŽูˆู’ุฏูŽุฉู ุงู„ู’ุนูŽุงู„ููŠูŽุฉู ู„ูุชูŽู‚ู’ู†ููŠูŽู‘ุงุชู ุงู„ุฐูŽู‘ูƒูŽุงุกู ุงู„ุงุตู’ุทูู†ูŽุงุนููŠูู‘ ุชูุณูŽุงู‡ูู…ู ูููŠ ุจูู†ูŽุงุกู ู…ูุณู’ุชูŽู‚ู’ุจูŽู„ู ุจูŽุงู‡ูุฑู ู„ูู„ู’ุฃูŽุฌู’ูŠูŽุงู„ู.`
97
- *(The high quality of AI technologies contributes to building a brilliant future for generations.)*
98
-
99
- seed=42:
100
 
101
- <audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/07_phonetics.wav" type="audio/wav"></audio>
 
 
102
 
103
- seed=17 (different prosody):
104
-
105
- <audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/10_phonetics_s2.wav" type="audio/wav"></audio>
106
 
107
  ---
108
 
109
- ### 8. Flow and rhythm test
110
-
111
- Input: `ุฅูู†ูŽู‘ ู†ูุธูŽุงู…ูŽ ุจูŽูŠูŽุงู†ูุณููŠู†ู’ุซ ูŠูŽู‡ู’ุฏููู ุฅูู„ูŽู‰ ุชูŽู‚ู’ุฏููŠู…ู ุชูŽุฌู’ุฑูุจูŽุฉู ุตูŽูˆู’ุชููŠูŽู‘ุฉู ููŽุฑููŠุฏูŽุฉูุŒ ุชูŽุฌู’ู…ูŽุนู ุจูŽูŠู’ู†ูŽ ุฏูู‚ูŽู‘ุฉู ุงู„ู†ูู‘ุทู’ู‚ู ูˆูŽุฌูŽู…ูŽุงู„ู ุงู„ู’ุฃูŽุฏูŽุงุกู.`
112
- *(BayanSynth aims to deliver a unique voice experience that combines precise pronunciation with beauty of delivery.)*
113
 
114
- seed=42:
 
 
115
 
116
- <audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/08_flow.wav" type="audio/wav"></audio>
117
-
118
- seed=99 (different prosody):
119
-
120
- <audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/11_flow_s2.wav" type="audio/wav"></audio>
121
 
122
  ---
123
 
124
- ### 9. Tashkeel disambiguation challenge
125
 
126
- Words `ุนูŽู„ู๏ฟฝ๏ฟฝูŽ / ุนูŽุงู„ูู… / ุนูŽู„ูŽู… / ุนูู„ู’ู…` in a single sentence:
127
- *(he knew / scholar / flag / knowledge)*
 
 
128
 
129
- `ุนูŽู„ูู…ูŽ ุงู„ู’ุนูŽุงู„ูู…ู ุฃูŽู†ูŽู‘ ุงู„ู’ุนูŽู„ูŽู…ูŽ ูŠูŽุนู’ู„ููˆ ุจูุงู„ู’ุนูู„ู’ู…ูุŒ ููŽุงุณู’ุชูŽุนู’ู„ูŽู…ูŽ ุนูŽู†ู’ ุนูู„ููˆู…ู ุงู„ู’ุฃูŽูˆูŽู‘ู„ููŠู†ูŽ.`
130
- *(The scholar knew that the flag rises with knowledge, so he inquired about the sciences of the ancients.)*
131
 
132
- <audio controls><source src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/09_challenge.wav" type="audio/wav"></audio>
133
-
134
- ---
135
 
136
- ## License
 
137
 
138
- Apache 2.0. LoRA checkpoint trained on Common Voice Arabic data is released under CC-BY 4.0.
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
  language:
4
+ - ar
5
  tags:
6
+ - tts
7
+ - arabic
8
+ - cosyvoice
9
+ - lora
10
+ - speech-synthesis
11
  ---
12
 
13
+ # BayanSynthTTS โ€” Arabic TTS Checkpoints
 
 
 
 
14
 
15
+ Fine-tuned LoRA weights for **CosyVoice 3** (Arabic).
16
+ Trained on ~4 h of diacritized Arabic speech.
17
 
18
+ **GitHub:** [Ramendan/BayanSynthTTS](https://github.com/Ramendan/BayanSynthTTS)
19
 
20
+ ---
21
 
22
  ## Audio Demos
23
 
24
+ ### 1. Basic synthesis (pre-diacritized)
 
 
25
 
26
+ > ู…ูŽุฑู’ุญูŽุจู‹ุงุŒ ุฃูŽู†ูŽุง ุจูŽูŠูŽุงู†ู’ุณููŠู†ู’ุซุŒ ู†ูุธูŽุงู…ูŒ ู„ูุชูŽูˆู’ู„ููŠุฏู ุงู„ู’ูƒูŽู„ูŽุงู…ู ุงู„ู’ุนูŽุฑูŽุจููŠูู‘.
27
+ >
28
+ > *Hello, I am BayanSynth, a system for generating Arabic speech.*
29
 
30
+ <audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/01_basic.wav"></audio>
31
 
32
  ---
33
 
34
  ### 2. Pre-diacritized text (mishkal off)
35
 
36
+ > ุฅูู†ูŽู‘ ุงู„ู„ูู‘ุบูŽุฉูŽ ุงู„ู’ุนูŽุฑูŽุจููŠูŽู‘ุฉูŽ ูƒูŽู†ู’ุฒูŒ ู…ูู†ูŽ ุงู„ุซูŽู‘ู‚ูŽุงููŽุฉู ูˆูŽุงู„ุชูู‘ุฑูŽุงุซู.
37
+ >
38
+ > *The Arabic language is a treasure of culture and heritage.*
39
 
40
+ <audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/02_prediacritized.wav"></audio>
41
 
42
  ---
43
 
44
+ ### 3. Longer passage (auto-tashkeel, speed 0.88)
45
 
46
+ > ุงู„ุฐูƒุงุก ุงู„ุงุตุทู†ุงุนูŠ ู‡ูˆ ุฃุญุฏ ุฃุจุฑุฒ ุงู„ุชุทูˆุฑุงุช ุงู„ุชูƒู†ูˆู„ูˆุฌูŠุฉ ููŠ ุนุตุฑู†ุง ุงู„ุญุฏูŠุซ. ูŠุนุชู…ุฏ ุนู„ู‰ ุชุญู„ูŠู„ ูƒู…ูŠุงุช ุถุฎู…ุฉ ู…ู† ุงู„ุจูŠุงู†ุงุช ู„ุงุณุชุฎู„ุงุต ุฃู†ู…ุงุท ู…ุนู‚ุฏุฉ. ูˆู…ู† ุฃุจุฑุฒ ุชุทุจูŠู‚ุงุชู‡ ู†ุธู… ุงู„ุชุนุฑู ุนู„ู‰ ุงู„ุตูˆุช ูˆุชุฑุฌู…ุฉ ุงู„ู„ุบุงุช ูˆุชูˆู„ูŠุฏ ุงู„ู†ุตูˆุต.
47
+ >
48
+ > *Artificial intelligence is one of the most prominent technological advances of our era. It relies on analyzing massive amounts of data to extract complex patterns. Among its most notable applications: speech recognition, language translation, and text generation.*
49
 
50
+ <audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/04_long_text.wav"></audio>
 
 
 
 
 
 
 
51
 
52
  ---
53
 
54
+ ### 4. Phonetics test (seed=42)
55
 
56
+ > ุงู„ู’ุฌูŽูˆู’ุฏูŽุฉู ุงู„ู’ุนูŽุงู„ููŠูŽุฉู ู„ูุชูŽู‚ู’ู†ููŠูŽู‘ุงุชู ุงู„ุฐูŽู‘ูƒูŽุงุกู ุงู„ุงุตู’ุทูู†ูŽุงุนููŠูู‘ ุชูุณูŽุงู‡ูู…ู ูููŠ ุจูู†ูŽุงุกู ู…ูุณู’ุชูŽู‚ู’ุจูŽู„ู ุจูŽุงู‡ูุฑู ู„ูู„ู’ุฃูŽุฌู’ูŠูŽุงู„ู.
57
+ >
58
+ > *The high quality of AI technologies contributes to building a brilliant future for generations to come.*
59
 
60
+ <audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/10_phonetics_s2.wav"></audio>
61
 
62
  ---
63
 
64
+ ### 5. Flow & rhythm (seed=42)
 
 
 
65
 
66
+ > ุฅูู†ูŽู‘ ู†ูุธูŽุงู…ูŽ ุจูŽูŠูŽุงู†ูุณููŠู†ู’ุซ ูŠูŽู‡ู’ุฏููู ุฅูู„ูŽู‰ ุชูŽู‚ู’ุฏููŠู…ู ุชูŽุฌู’ุฑูุจูŽุฉู ุตูŽูˆู’ุชููŠูŽู‘ุฉู ููŽุฑููŠุฏูŽุฉูุŒ ุชูŽุฌู’ู…ูŽุนู ุจูŽูŠู’ู†ูŽ ุฏูู‚ูŽู‘ุฉู ุงู„ู†ูู‘ุทู’ู‚ู ูˆูŽุฌูŽู…ูŽุงู„ู ุงู„ู’ุฃูŽุฏูŽุงุกู.
67
+ >
68
+ > *BayanSynth aims to deliver a unique voice experience that combines precise pronunciation with beauty of delivery.*
69
 
70
+ <audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/08_flow.wav"></audio>
 
 
 
71
 
72
  ---
73
 
74
+ ### 6. Flow, alternate seed (seed=99)
75
 
76
+ Same text, different prosody:
 
 
77
 
78
+ <audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/11_flow_s2.wav"></audio>
79
 
80
  ---
81
 
82
+ ### 7. Challenge: tashkeel disambiguation
 
 
 
 
 
83
 
84
+ > ุนูŽู„ูู…ูŽ ุงู„ู’ุนูŽุงู„ูู…ู ุฃูŽู†ูŽู‘ ุงู„ู’ุนูŽู„ูŽู…ูŽ ูŠูŽุนู’ู„ููˆ ุจูุงู„ู’ุนูู„ู’ู…ูุŒ ููŽุงุณู’ุชูŽุนู’ู„ูŽู…ูŽ ุนูŽู†ู’ ุนูู„ููˆู…ู ุงู„ู’ุฃูŽูˆูŽู‘ู„ููŠู†ูŽ.
85
+ >
86
+ > *The scholar knew that the flag rises with knowledge, so he inquired about the sciences of the ancients.*
87
 
88
+ <audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/09_challenge.wav"></audio>
 
 
89
 
90
  ---
91
 
92
+ ### 8. Instruct prompt: warm newsreader style
 
 
 
93
 
94
+ > ู…ูŽุฑู’ุญูŽุจุงู‹ ุจููƒูู…ู’. ู‡ูŽุฐูŽุง ู…ูุซูŽุงู„ูŒ ุนูŽู„ูŽู‰ ุงุณู’ุชูุฎู’ุฏูŽุงู…ู ุงู„ุชูŽู‘ูˆู’ุฌููŠู‡ู ู„ูุถูŽุจู’ุทู ุฃูุณู’ู„ููˆุจู ุงู„ุตูŽู‘ูˆู’ุชู.
95
+ >
96
+ > *Welcome. This is an example of using an instruct prompt to control voice style.*
97
 
98
+ <audio controls src="https://huggingface.co/Ramendan/BayanSynthTTS-checkpoints/resolve/main/samples/12_instruct.wav"></audio>
 
 
 
 
99
 
100
  ---
101
 
102
+ ## Files
103
 
104
+ | File | Description |
105
+ |------|-------------|
106
+ | `epoch_28_whole.pt` | LoRA weights (LLM, 629 keys) โ€” main checkpoint |
107
+ | `samples/*.wav` | Pre-generated audio demos |
108
 
109
+ ## Usage
 
110
 
111
+ ```bash
112
+ pip install bayansynthtts
113
+ ```
114
 
115
+ ```python
116
+ from bayansynthtts import BayanSynthTTS
117
 
118
+ tts = BayanSynthTTS()
119
+ audio = tts.synthesize(
120
+ "ู…ูŽุฑู’ุญูŽุจู‹ุงุŒ ุฃูŽู†ูŽุง ุจูŽูŠูŽุงู†ู’ุณููŠู†ู’ุซุŒ ู†ูุธูŽุงู…ูŒ ู„ูุชูŽูˆู’ู„ููŠุฏู ุงู„ู’ูƒูŽู„ูŽุงู…ู ุงู„ู’ุนูŽุฑูŽุจููŠูู‘.",
121
+ auto_tashkeel=False,
122
+ )
123
+ tts.save_wav(audio, "output.wav")
124
+ ```