Update README.md
Browse files
README.md
CHANGED
|
@@ -88,7 +88,7 @@ Note: Although this example uses StyleTTS2, the model is compatible with other T
|
|
| 88 |
|
| 89 |
### Training data
|
| 90 |
|
| 91 |
-
The model was trained on a phonemized Catalan corpus (any phonemizer can be used). The dataset includes sentences from speakers across Catalonia, Balearic Islands, and Valencia. It uses a consistent phoneme token set with boundary markers and masking tokens.
|
| 92 |
|
| 93 |
Tokenizer: custom (split using whitespaces)
|
| 94 |
Phoneme masking strategy: word-level and phoneme-level masking and replacement
|
|
|
|
| 88 |
|
| 89 |
### Training data
|
| 90 |
|
| 91 |
+
The model was trained on a phonemized Catalan corpus (any phonemizer can be used) extracted from the [CATalog](https://huggingface.co/datasets/projecte-aina/CATalog) corpus. The dataset includes sentences from speakers across Catalonia, Balearic Islands, and Valencia. It uses a consistent phoneme token set with boundary markers and masking tokens.
|
| 92 |
|
| 93 |
Tokenizer: custom (split using whitespaces)
|
| 94 |
Phoneme masking strategy: word-level and phoneme-level masking and replacement
|