checkpoints
Browse files
README.md
CHANGED
|
@@ -22,8 +22,9 @@ tags:
|
|
| 22 |
|
| 23 |
This is a base (not instruction-tuned) large language model, continually pre-trained on Norwegian data starting from the English [OLMo2-13B](https://huggingface.co/allenai/OLMo-2-1124-13B) model.
|
| 24 |
|
| 25 |
-
Our training data mixture included [HPLTv3](https://huggingface.co/datasets/HPLT/HPLT3.0) Bokmål and Nynorsk, FinePDF Bokmål and Nynorsk, MADLAD400 Norwegian, OLMo-Mix, Northern Sami dataset.
|
|
|
|
| 26 |
|
| 27 |
-
Training was conducted as a part of the [HPLT project](https://hplt-project.org/)
|
| 28 |
|
| 29 |
_This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101070350 and from UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee [grant number 10052546]_
|
|
|
|
| 22 |
|
| 23 |
This is a base (not instruction-tuned) large language model, continually pre-trained on Norwegian data starting from the English [OLMo2-13B](https://huggingface.co/allenai/OLMo-2-1124-13B) model.
|
| 24 |
|
| 25 |
+
Our training data mixture included [HPLTv3](https://huggingface.co/datasets/HPLT/HPLT3.0) Bokmål and Nynorsk, FinePDF Bokmål and Nynorsk, MADLAD400 Norwegian, OLMo-Mix, Northern Sami dataset.
|
| 26 |
+
The model was trained for 33 000 steps on around 300 billion tokens. Intermediate checkpoints are published here as branches.
|
| 27 |
|
| 28 |
+
Training was conducted as a part of the [HPLT project](https://hplt-project.org/).
|
| 29 |
|
| 30 |
_This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101070350 and from UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee [grant number 10052546]_
|