Update README.md
Browse files
README.md
CHANGED
|
@@ -47,8 +47,8 @@ Be advised that this is a preview model meant to showcase the base model's capab
|
|
| 47 |
|
| 48 |
#### Pre-training
|
| 49 |
|
| 50 |
-
We pre-trained the model in two stages, first training on billions of tokens of mixed
|
| 51 |
-
Then, we trained on tens of thousands of hours of German and English
|
| 52 |
|
| 53 |
We used the following datasets, as well as some in-house datasets:
|
| 54 |
- [HuggingFaceFW/fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb)
|
|
|
|
| 47 |
|
| 48 |
#### Pre-training
|
| 49 |
|
| 50 |
+
We pre-trained the model in two stages, first training on billions of tokens of mixed speech and text data using a next-token-prediction objective.
|
| 51 |
+
Then, we trained on tens of thousands of hours of German and English TTS data mixed with a little text instruction data to preserve the text understanding capability of the model.
|
| 52 |
|
| 53 |
We used the following datasets, as well as some in-house datasets:
|
| 54 |
- [HuggingFaceFW/fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb)
|