fadi77
/

StyleTTS2-LibriTTS-arabic

Model card Files Files and versions

fadi77 commited on Apr 19, 2025

Commit

72201df

·

verified ·

1 Parent(s): 49ae1d1

Update README.md

Files changed (1) hide show

README.md +11 -27

README.md CHANGED Viewed

@@ -89,10 +89,7 @@ pip install -r requirements.txt
 python inference.py --config config.yml --model model.pth --text "الإِتْقَانُ يَحْتَاجُ إِلَى الْعَمَلِ وَالْمُثَابَرَة"
 ```
-Make sure to:
-- Set the config path to point to the configuration file from this Hugging Face repository
-- Install espeak-ng on your system as it's required for the phonemizer to work
-- Use properly diacritized Arabic text for best results
 ### Out-of-Scope Use
@@ -109,36 +106,23 @@ The model is specifically designed for Arabic text-to-speech synthesis and may n
 - Dataset: [fadi77/arabic-audiobook-dataset-24khz](https://huggingface.co/datasets/fadi77/arabic-audiobook-dataset-24khz)
 - The PL-BERT component was trained on fully diacritized Wikipedia Arabic text
-### Training Infrastructure
-- **Hardware:** Single NVIDIA H100 GPU
-- **Training Duration:** 20 epochs
-- **Validation Metrics:** Identical to original StyleTTS2 training methodology
-### Training Procedure
-#### Training Hyperparameters
 - **Number of epochs:** 20
 - **Diffusion training:** Started from epoch 5
-- **Training objectives:** All original StyleTTS2 objectives maintained, except WavLM adversarial training
-- **Validation methodology:** Identical to original StyleTTS2 training process
-- **Notable modifications:**
-  - Removed WavLM adversarial training component
-  - Custom PL-BERT trained for Arabic language
-## Technical Specifications
-### Model Architecture and Objective
-The model combines:
-1. A custom-trained Arabic PL-BERT model for text understanding
-2. StyleTTS2 architecture for speech synthesis
-3. Modified training procedure without WavLM adversarial component
 ### Compute Infrastructure
 - **Hardware Type:** NVIDIA H100 GPU
-- **Training Time:** Full training completed in 20 epochs
 ## Citation

 python inference.py --config config.yml --model model.pth --text "الإِتْقَانُ يَحْتَاجُ إِلَى الْعَمَلِ وَالْمُثَابَرَة"
 ```
+Make sure use properly diacritized Arabic text for best results
 ### Out-of-Scope Use
 - Dataset: [fadi77/arabic-audiobook-dataset-24khz](https://huggingface.co/datasets/fadi77/arabic-audiobook-dataset-24khz)
 - The PL-BERT component was trained on fully diacritized Wikipedia Arabic text
+### Training Hyperparameters
 - **Number of epochs:** 20
 - **Diffusion training:** Started from epoch 5
+### Objectives
+- **Training objectives:** All original StyleTTS2 objectives maintained, except WavLM adversarial training
+- **Validation objectives:** Identical to original StyleTTS2 validation process
 ### Compute Infrastructure
 - **Hardware Type:** NVIDIA H100 GPU
+### Notable Modifications from Original StyleTTS2 in Architecture and Objectives
+The architecture of the model follows that of StyleTTS2 with the following exceptions:
+  - Removed WavLM adversarial training component
+  - Custom PL-BERT trained for Arabic language
 ## Citation