DigitalLearningGmbH
/

educa-ai-voice-preview

Model card Files Files and versions

LenDigLearn commited on about 1 month ago

Commit

b2f8b87

·

verified ·

1 Parent(s): 86b71bf

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -47,8 +47,8 @@ Be advised that this is a preview model meant to showcase the base model's capab
 #### Pre-training
-We pre-trained the model in two stages, first training on billions of tokens of mixed audio and text data using a next-token-prediction objective.
-Then, we trained on tens of thousands of hours of German and English speech mixed with a little text instruction data to preserve the text understanding capability of the model.
 We used the following datasets, as well as some in-house datasets:
 - [HuggingFaceFW/fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb)

 #### Pre-training
+We pre-trained the model in two stages, first training on billions of tokens of mixed speech and text data using a next-token-prediction objective.
+Then, we trained on tens of thousands of hours of German and English TTS data mixed with a little text instruction data to preserve the text understanding capability of the model.
 We used the following datasets, as well as some in-house datasets:
 - [HuggingFaceFW/fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb)