LenDigLearn commited on
Commit
b2f8b87
·
verified ·
1 Parent(s): 86b71bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -47,8 +47,8 @@ Be advised that this is a preview model meant to showcase the base model's capab
47
 
48
  #### Pre-training
49
 
50
- We pre-trained the model in two stages, first training on billions of tokens of mixed audio and text data using a next-token-prediction objective.
51
- Then, we trained on tens of thousands of hours of German and English speech mixed with a little text instruction data to preserve the text understanding capability of the model.
52
 
53
  We used the following datasets, as well as some in-house datasets:
54
  - [HuggingFaceFW/fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb)
 
47
 
48
  #### Pre-training
49
 
50
+ We pre-trained the model in two stages, first training on billions of tokens of mixed speech and text data using a next-token-prediction objective.
51
+ Then, we trained on tens of thousands of hours of German and English TTS data mixed with a little text instruction data to preserve the text understanding capability of the model.
52
 
53
  We used the following datasets, as well as some in-house datasets:
54
  - [HuggingFaceFW/fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb)