Update README.md
Browse files
README.md
CHANGED
|
@@ -89,10 +89,7 @@ pip install -r requirements.txt
|
|
| 89 |
python inference.py --config config.yml --model model.pth --text "ุงูุฅูุชูููุงูู ููุญูุชูุงุฌู ุฅูููู ุงููุนูู
ููู ููุงููู
ูุซูุงุจูุฑูุฉ"
|
| 90 |
```
|
| 91 |
|
| 92 |
-
Make sure
|
| 93 |
-
- Set the config path to point to the configuration file from this Hugging Face repository
|
| 94 |
-
- Install espeak-ng on your system as it's required for the phonemizer to work
|
| 95 |
-
- Use properly diacritized Arabic text for best results
|
| 96 |
|
| 97 |
### Out-of-Scope Use
|
| 98 |
|
|
@@ -109,36 +106,23 @@ The model is specifically designed for Arabic text-to-speech synthesis and may n
|
|
| 109 |
- Dataset: [fadi77/arabic-audiobook-dataset-24khz](https://huggingface.co/datasets/fadi77/arabic-audiobook-dataset-24khz)
|
| 110 |
- The PL-BERT component was trained on fully diacritized Wikipedia Arabic text
|
| 111 |
|
| 112 |
-
### Training
|
| 113 |
-
- **Hardware:** Single NVIDIA H100 GPU
|
| 114 |
-
- **Training Duration:** 20 epochs
|
| 115 |
-
- **Validation Metrics:** Identical to original StyleTTS2 training methodology
|
| 116 |
-
|
| 117 |
-
### Training Procedure
|
| 118 |
-
|
| 119 |
-
#### Training Hyperparameters
|
| 120 |
|
| 121 |
- **Number of epochs:** 20
|
| 122 |
- **Diffusion training:** Started from epoch 5
|
| 123 |
-
- **Training objectives:** All original StyleTTS2 objectives maintained, except WavLM adversarial training
|
| 124 |
-
- **Validation methodology:** Identical to original StyleTTS2 training process
|
| 125 |
-
- **Notable modifications:**
|
| 126 |
-
- Removed WavLM adversarial training component
|
| 127 |
-
- Custom PL-BERT trained for Arabic language
|
| 128 |
-
|
| 129 |
-
## Technical Specifications
|
| 130 |
-
|
| 131 |
-
### Model Architecture and Objective
|
| 132 |
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
3. Modified training procedure without WavLM adversarial component
|
| 137 |
|
| 138 |
### Compute Infrastructure
|
| 139 |
-
|
| 140 |
- **Hardware Type:** NVIDIA H100 GPU
|
| 141 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 142 |
|
| 143 |
## Citation
|
| 144 |
|
|
|
|
| 89 |
python inference.py --config config.yml --model model.pth --text "ุงูุฅูุชูููุงูู ููุญูุชูุงุฌู ุฅูููู ุงููุนูู
ููู ููุงููู
ูุซูุงุจูุฑูุฉ"
|
| 90 |
```
|
| 91 |
|
| 92 |
+
Make sure use properly diacritized Arabic text for best results
|
|
|
|
|
|
|
|
|
|
| 93 |
|
| 94 |
### Out-of-Scope Use
|
| 95 |
|
|
|
|
| 106 |
- Dataset: [fadi77/arabic-audiobook-dataset-24khz](https://huggingface.co/datasets/fadi77/arabic-audiobook-dataset-24khz)
|
| 107 |
- The PL-BERT component was trained on fully diacritized Wikipedia Arabic text
|
| 108 |
|
| 109 |
+
### Training Hyperparameters
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 110 |
|
| 111 |
- **Number of epochs:** 20
|
| 112 |
- **Diffusion training:** Started from epoch 5
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
|
| 114 |
+
### Objectives
|
| 115 |
+
- **Training objectives:** All original StyleTTS2 objectives maintained, except WavLM adversarial training
|
| 116 |
+
- **Validation objectives:** Identical to original StyleTTS2 validation process
|
|
|
|
| 117 |
|
| 118 |
### Compute Infrastructure
|
|
|
|
| 119 |
- **Hardware Type:** NVIDIA H100 GPU
|
| 120 |
+
|
| 121 |
+
### Notable Modifications from Original StyleTTS2 in Architecture and Objectives
|
| 122 |
+
The architecture of the model follows that of StyleTTS2 with the following exceptions:
|
| 123 |
+
- Removed WavLM adversarial training component
|
| 124 |
+
- Custom PL-BERT trained for Arabic language
|
| 125 |
+
|
| 126 |
|
| 127 |
## Citation
|
| 128 |
|