fadi77 commited on
Commit
ec38903
·
verified ·
1 Parent(s): 06bd211

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -10
README.md CHANGED
@@ -14,6 +14,27 @@ hardware: H100
14
 
15
  This is an Arabic text-to-speech model based on StyleTTS2 architecture, specifically adapted for Arabic language synthesis. The model achieves good quality Arabic speech synthesis, though not yet state-of-the-art, and further experimentation is needed to optimize performance for Arabic language specifically. All training objectives from the original StyleTTS2 were maintained, except for the WavLM objectives which were removed as they were primarily designed for English speech.
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ## Model Details
18
 
19
  ### Model Description
@@ -130,16 +151,6 @@ The model combines:
130
  }
131
  ```
132
 
133
- ## Example
134
-
135
- Here is an example output from the model:
136
-
137
- #### Sample 1
138
- <audio controls>
139
- <source src="https://huggingface.co/fadi77/StyleTTS2-LibriTTS-arabic/resolve/main/synthesized_audio.wav" type="audio/wav">
140
- Your browser does not support the audio element.
141
- </audio>
142
-
143
  ## Model Card Contact
144
 
145
  GitHub: [@Fadi987](https://github.com/Fadi987)
 
14
 
15
  This is an Arabic text-to-speech model based on StyleTTS2 architecture, specifically adapted for Arabic language synthesis. The model achieves good quality Arabic speech synthesis, though not yet state-of-the-art, and further experimentation is needed to optimize performance for Arabic language specifically. All training objectives from the original StyleTTS2 were maintained, except for the WavLM objectives which were removed as they were primarily designed for English speech.
16
 
17
+ ## Efficiency and Performance
18
+
19
+ A key strength of this model lies in its efficiency and performance characteristics:
20
+
21
+ - **Compact Architecture**: Achieves impressive quality with <100M parameters
22
+ - **Limited Training Data**: Trained on only 22 hours of single-speaker audio
23
+ - **Transfer Learning**: Successfully fine-tuned from LibriTTS multi-speaker model to single-speaker Arabic
24
+ - **Resource Efficient**: Good quality achieved despite limited computational resources
25
+
26
+ Note: According to the StyleTTS2 authors, performance should improve further when training a single-speaker model from scratch rather than fine-tuning. This wasn't attempted in our case due to computational resource constraints, suggesting potential for even better results with more extensive training.
27
+
28
+ ## Example
29
+
30
+ Here is an example output from the model:
31
+
32
+ #### Sample 1
33
+ <audio controls>
34
+ <source src="https://huggingface.co/fadi77/StyleTTS2-LibriTTS-arabic/resolve/main/synthesized_audio.wav" type="audio/wav">
35
+ Your browser does not support the audio element.
36
+ </audio>
37
+
38
  ## Model Details
39
 
40
  ### Model Description
 
151
  }
152
  ```
153
 
 
 
 
 
 
 
 
 
 
 
154
  ## Model Card Contact
155
 
156
  GitHub: [@Fadi987](https://github.com/Fadi987)