danish-foundation-models
/

dfm-decoder-open-v0-7b-pt

@@ -12,12 +12,12 @@ pipeline_tag: text-generation
 ---
 # Munin-7B-Open-pt
-Munin-7B-open-pt is a 7 billion parameter language model continually pre-trained from [Comma v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t/) using 30B tokens using a mix of the [Dynaword](https://huggingface.co/datasets/danish-foundation-models/danish-dynaword) and [the Comma v0.1 dataset](https://huggingface.co/datasets/common-pile/comma_v0.1_training_dataset), both comprising only public domain and openly licensed data.
-Munin-7B-open-pt is a base model that can be used a the starting point for fine-tuning and post-training. It has not been instruction-tuned and cannot directly be expected to function as a chat model.
 ## Training details
-Munin-7B-open-pt has been trained using the [maester](https://github.com/rlrs/maester/tree/main/3aca26960eaa1a16250b3feda40303c240ba4ca1) framework developed as part of the [Danish Foundation Models project](https://foundationmodels.dk/). All training was performed on a single 8x Nvidia B200 node (the first of its kind in Denmark).
 The training was performed in three stages, with data mix (open-stageK.py) and maester (open-stageK.toml) configuration files available in each subfolder. The datasets can be created using the create_dataset.py script provided in this repository.
@@ -25,9 +25,9 @@ The characteristics of the three pre-training stages are detailed in the followi
 | Stage | Batch size | Steps | HF path | Data mix | Comments |
 |-|-|-|-|-|-|
-| stage1 | 262,144 tok | 37,852| [subfolder="stage1"](https://huggingface.co/danish-foundation-models/munin-7b-open-pt/tree/main/stage1)  | 2/3 [DynaWord](https://huggingface.co/datasets/danish-foundation-models/danish-dynaword/tree/9e230b35e31a510e5ab909112ad5bfc9463b2c23); <br> 1/3 [Common-Pile](https://huggingface.co/common-pile/comma_v0.1_training_dataset/5afc546db324e7f39f297ba757c9a60547151e7c/) | Excludes depbank, jvj, nordjyllandnews, synne for DynaWord; <br> uses subsets and weighting from [Comma-v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t) cooldown phase for Common-Pile ; LR swchedule with 1000 steps warmup, constant 1e-5, 1000 steps cooldown |
-| stage2 | 524,288 tok | 18926 | [subfolder="stage2"](https://huggingface.co/danish-foundation-models/munin-7b-open-pt/tree/main/stage2)  | 2/3 [DynaWord](https://huggingface.co/datasets/danish-foundation-models/danish-dynaword/tree/9e230b35e31a510e5ab909112ad5bfc9463b2c23); <br> 1/3 [Common-Pile](https://huggingface.co/common-pile/comma_v0.1_training_dataset/5afc546db324e7f39f297ba757c9a60547151e7c/) | Excludes depbank, jvj, nordjyllandnews, synne for DynaWord; <br> uses subsets and weighting from [Comma-v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t) cooldown phase for Common-Pile; LR swchedule with 500 steps warmup, constant 1e-5, 500 steps cooldown |
-| stage3 | 524,288 tok | 18926 | [subfolder="stage3"](https://huggingface.co/danish-foundation-models/munin-7b-open-pt/tree/main/stage3)  | 2/3 [DynaWord](https://huggingface.co/datasets/danish-foundation-models/danish-dynaword/tree/9e230b35e31a510e5ab909112ad5bfc9463b2c23); <br> 1/3 [Common-Pile](https://huggingface.co/common-pile/comma_v0.1_training_dataset/5afc546db324e7f39f297ba757c9a60547151e7c/) | Excludes depbank, jvj, nordjyllandnews, synne for DynaWord; <br> uses subsets and weighting from [Comma-v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t) cooldown phase for Common-Pile; LR swchedule with 500 steps warmup, square root decay from 1e-5 |
 ## Limitations

 ---
 # Munin-7B-Open-pt
+Munin-7B-open-pt is a 7 billion parameter language model continually pre-trained from [Comma v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t) using 30B tokens using a mix of the [DynaWord](https://huggingface.co/datasets/danish-foundation-models/danish-dynaword) and the [Comma v0.1 dataset](https://huggingface.co/datasets/common-pile/comma_v0.1_training_dataset), both comprising only public domain and openly licensed data.
+Munin-7B-open-pt is a base model that can be used as the starting point for fine-tuning and post-training. It has not been instruction-tuned and cannot directly be expected to function as a chat model.
 ## Training details
+Munin-7B-open-pt has been trained using the [maester](https://github.com/rlrs/maester/tree/main/3aca26960eaa1a16250b3feda40303c240ba4ca1) framework developed as part of the [Danish Foundation Models project](https://foundationmodels.dk/). All training was performed on a single 8x NVIDIA B200 node (the first of its kind in Denmark).
 The training was performed in three stages, with data mix (open-stageK.py) and maester (open-stageK.toml) configuration files available in each subfolder. The datasets can be created using the create_dataset.py script provided in this repository.
 | Stage | Batch size | Steps | HF path | Data mix | Comments |
 |-|-|-|-|-|-|
+| stage1 | 262,144 tok | 37,852| [subfolder="stage1"](https://huggingface.co/danish-foundation-models/munin-7b-open-pt/tree/main/stage1)  | 2/3 [DynaWord](https://huggingface.co/datasets/danish-foundation-models/danish-dynaword/tree/9e230b35e31a510e5ab909112ad5bfc9463b2c23); <br> 1/3 [Common-Pile](https://huggingface.co/common-pile/comma_v0.1_training_dataset/5afc546db324e7f39f297ba757c9a60547151e7c) | Excludes depbank, jvj, nordjyllandnews, synne for DynaWord; <br> uses subsets and weighting from [Comma-v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t) cooldown phase for Common-Pile ; LR schedule with 1000 steps warmup, constant 1e-5, 1000 steps cooldown |
+| stage2 | 524,288 tok | 18,926 | [subfolder="stage2"](https://huggingface.co/danish-foundation-models/munin-7b-open-pt/tree/main/stage2)  | 2/3 [DynaWord](https://huggingface.co/datasets/danish-foundation-models/danish-dynaword/tree/9e230b35e31a510e5ab909112ad5bfc9463b2c23); <br> 1/3 [Common-Pile](https://huggingface.co/common-pile/comma_v0.1_training_dataset/5afc546db324e7f39f297ba757c9a60547151e7c) | Excludes depbank, jvj, nordjyllandnews, synne for DynaWord; <br> uses subsets and weighting from [Comma-v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t) cooldown phase for Common-Pile; LR schedule with 500 steps warmup, constant 1e-5, 500 steps cooldown |
+| stage3 | 524,288 tok | 18,926 | [subfolder="stage3"](https://huggingface.co/danish-foundation-models/munin-7b-open-pt/tree/main/stage3)  | 2/3 [DynaWord](https://huggingface.co/datasets/danish-foundation-models/danish-dynaword/tree/9e230b35e31a510e5ab909112ad5bfc9463b2c23); <br> 1/3 [Common-Pile](https://huggingface.co/common-pile/comma_v0.1_training_dataset/5afc546db324e7f39f297ba757c9a60547151e7c) | Excludes depbank, jvj, nordjyllandnews, synne for DynaWord; <br> uses subsets and weighting from [Comma-v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t) cooldown phase for Common-Pile; LR schedule with 500 steps warmup, square root decay from 1e-5 |
 ## Limitations