Update README.md
Browse files
README.md
CHANGED
|
@@ -38,10 +38,10 @@ bnb_optimizer=false
|
|
| 38 |
|
| 39 |
# Pre processing
|
| 40 |
Data extracted from the datasource has been preprocessed in its transcription.
|
| 41 |
-
From my understanding, punctuation is important because it's used to teach to have pauses and proper intonation so it has been preserved.
|
| 42 |
Original italian "text" field was even containing direct dialogue escapes (long hyphen) that has also be preserved but it contained also
|
| 43 |
a hypen that was used to split a word in a new line (I don't know which process was used on original dataset to create the text transcription)
|
| 44 |
-
and so I removed that hypens merging the two part of the word, otherwise the training was done on artifacts that didn't impacted the speech.
|
| 45 |
I'm only talking about Italian data on cml-tts, I don't know if other languages are affected by this too.
|
| 46 |
|
| 47 |
|
|
|
|
| 38 |
|
| 39 |
# Pre processing
|
| 40 |
Data extracted from the datasource has been preprocessed in its transcription.
|
| 41 |
+
From my understanding, punctuation is important because it's used to teach to have pauses and proper intonation so it has been preserved.
|
| 42 |
Original italian "text" field was even containing direct dialogue escapes (long hyphen) that has also be preserved but it contained also
|
| 43 |
a hypen that was used to split a word in a new line (I don't know which process was used on original dataset to create the text transcription)
|
| 44 |
+
and so I removed that hypens merging the two part of the word, otherwise the training was done on artifacts that didn't impacted the speech.
|
| 45 |
I'm only talking about Italian data on cml-tts, I don't know if other languages are affected by this too.
|
| 46 |
|
| 47 |
|