fadi77
/

StyleTTS2-LibriTTS-arabic

Model card Files Files and versions

fadi77 commited on Apr 19, 2025

Commit

ec38903

·

verified ·

1 Parent(s): 06bd211

Update README.md

Files changed (1) hide show

README.md +21 -10

README.md CHANGED Viewed

@@ -14,6 +14,27 @@ hardware: H100
 This is an Arabic text-to-speech model based on StyleTTS2 architecture, specifically adapted for Arabic language synthesis. The model achieves good quality Arabic speech synthesis, though not yet state-of-the-art, and further experimentation is needed to optimize performance for Arabic language specifically. All training objectives from the original StyleTTS2 were maintained, except for the WavLM objectives which were removed as they were primarily designed for English speech.
 ## Model Details
 ### Model Description
@@ -130,16 +151,6 @@ The model combines:
 }
 ```
-## Example
-Here is an example output from the model:
-#### Sample 1
-<audio controls>
-  <source src="https://huggingface.co/fadi77/StyleTTS2-LibriTTS-arabic/resolve/main/synthesized_audio.wav" type="audio/wav">
-  Your browser does not support the audio element.
-</audio>
 ## Model Card Contact
 GitHub: [@Fadi987](https://github.com/Fadi987)

 This is an Arabic text-to-speech model based on StyleTTS2 architecture, specifically adapted for Arabic language synthesis. The model achieves good quality Arabic speech synthesis, though not yet state-of-the-art, and further experimentation is needed to optimize performance for Arabic language specifically. All training objectives from the original StyleTTS2 were maintained, except for the WavLM objectives which were removed as they were primarily designed for English speech.
+## Efficiency and Performance
+A key strength of this model lies in its efficiency and performance characteristics:
+- **Compact Architecture**: Achieves impressive quality with <100M parameters
+- **Limited Training Data**: Trained on only 22 hours of single-speaker audio
+- **Transfer Learning**: Successfully fine-tuned from LibriTTS multi-speaker model to single-speaker Arabic
+- **Resource Efficient**: Good quality achieved despite limited computational resources
+Note: According to the StyleTTS2 authors, performance should improve further when training a single-speaker model from scratch rather than fine-tuning. This wasn't attempted in our case due to computational resource constraints, suggesting potential for even better results with more extensive training.
+## Example
+Here is an example output from the model:
+#### Sample 1
+<audio controls>
+  <source src="https://huggingface.co/fadi77/StyleTTS2-LibriTTS-arabic/resolve/main/synthesized_audio.wav" type="audio/wav">
+  Your browser does not support the audio element.
+</audio>
 ## Model Details
 ### Model Description
 }
 ```
 ## Model Card Contact
 GitHub: [@Fadi987](https://github.com/Fadi987)