Updated README
Browse files
README.md
CHANGED
|
@@ -9,9 +9,9 @@ tags:
|
|
| 9 |
inference: false
|
| 10 |
datasets:
|
| 11 |
- bookbot/sw-TZ-Victoria
|
| 12 |
-
- bookbot/sw-TZ-Victoria-syllables
|
| 13 |
- bookbot/sw-TZ-Victoria-v2
|
| 14 |
-
- bookbot/sw-TZ-VictoriaNeural
|
| 15 |
---
|
| 16 |
|
| 17 |
# LightSpeech MFA SW v4
|
|
@@ -19,9 +19,9 @@ datasets:
|
|
| 19 |
LightSpeech MFA SW v4 is a text-to-mel-spectrogram model based on the [LightSpeech](https://arxiv.org/abs/2102.04040) architecture. This model was fine-tuned from [LightSpeech MFA SW v1](https://huggingface.co/bookbot/lightspeech-mfa-sw-v1) and trained on real and synthetic audio datasets. The list of speakers include:
|
| 20 |
|
| 21 |
- sw-TZ-Victoria
|
| 22 |
-
- sw-TZ-Victoria-syllables
|
| 23 |
- sw-TZ-Victoria-v2
|
| 24 |
-
- sw-TZ-VictoriaNeural
|
| 25 |
|
| 26 |
We trained an acoustic Swahili model on our speech corpus using [Montreal Forced Aligner v3.0.0](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) and used it as the duration extractor. That model, and consequently our model, uses the IPA phone set for Swahili. We used [gruut](https://github.com/rhasspy/gruut) for phonemization purposes. We followed these [steps](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/mfa_extraction) to perform duration extraction.
|
| 27 |
|
|
|
|
| 9 |
inference: false
|
| 10 |
datasets:
|
| 11 |
- bookbot/sw-TZ-Victoria
|
| 12 |
+
- bookbot/sw-TZ-Victoria-syllables-word
|
| 13 |
- bookbot/sw-TZ-Victoria-v2
|
| 14 |
+
- bookbot/sw-TZ-VictoriaNeural-upsampled-48kHz
|
| 15 |
---
|
| 16 |
|
| 17 |
# LightSpeech MFA SW v4
|
|
|
|
| 19 |
LightSpeech MFA SW v4 is a text-to-mel-spectrogram model based on the [LightSpeech](https://arxiv.org/abs/2102.04040) architecture. This model was fine-tuned from [LightSpeech MFA SW v1](https://huggingface.co/bookbot/lightspeech-mfa-sw-v1) and trained on real and synthetic audio datasets. The list of speakers include:
|
| 20 |
|
| 21 |
- sw-TZ-Victoria
|
| 22 |
+
- sw-TZ-Victoria-syllables-word
|
| 23 |
- sw-TZ-Victoria-v2
|
| 24 |
+
- sw-TZ-VictoriaNeural-upsampled-48kHz
|
| 25 |
|
| 26 |
We trained an acoustic Swahili model on our speech corpus using [Montreal Forced Aligner v3.0.0](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) and used it as the duration extractor. That model, and consequently our model, uses the IPA phone set for Swahili. We used [gruut](https://github.com/rhasspy/gruut) for phonemization purposes. We followed these [steps](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/mfa_extraction) to perform duration extraction.
|
| 27 |
|