Commit ·
8786370
1
Parent(s): 48d7bf7
Update README.md (#1)
Browse files- Update README.md (885481cd528c11319e8a6645fb0879d07e5156ef)
Co-authored-by: Kenneth C. Enevoldsen <KennethEnevoldsen@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -13,13 +13,23 @@ pipeline_tag: text-generation
|
|
| 13 |
|
| 14 |
# Munin-7B-Open-pt
|
| 15 |
|
| 16 |
-
Munin-7B-open-pt is a 7
|
|
|
|
|
|
|
| 17 |
|
| 18 |
-
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
## Training details
|
| 22 |
|
|
|
|
|
|
|
|
|
|
| 23 |
Munin-7B-open-pt has been trained using the [maester](https://github.com/rlrs/maester/tree/main/3aca26960eaa1a16250b3feda40303c240ba4ca1) framework developed as part of [Danish Foundation Models](https://foundationmodels.dk/). All training was performed on a single 8x NVIDIA B200 node (the first of its kind in Denmark) as part of the [SDU UCloud](https://cloud.sdu.dk/) research cloud.
|
| 24 |
|
| 25 |
The training was performed in three stages, with data mix (open-stageK.py) and maester (open-stageK.toml) configuration files available in each subfolder. The datasets can be created using the `create_dataset.py` script provided in this repository.
|
|
|
|
| 13 |
|
| 14 |
# Munin-7B-Open-pt
|
| 15 |
|
| 16 |
+
Munin-7B-open-pt is a 7-billion-parameter [open-source](https://opensource.org/ai/open-source-ai-definition) language model.
|
| 17 |
+
Munin-7B-open-pt is a base model that can serve as a starting point for fine-tuning and post-training.
|
| 18 |
+
It has not been instruction-tuned and cannot directly be expected to function as a chat model.
|
| 19 |
|
| 20 |
+
|
| 21 |
+
| Model | Model Weights | Training Data | Training Code |
|
| 22 |
+
|:------|:--------------|:--------------|:--------------|
|
| 23 |
+
| Llama | Public with custom license | Private | Private |
|
| 24 |
+
| Gemma | Public, openly licensed | Private | Private |
|
| 25 |
+
| Apertus | Public, openly licensed | Reproducible, license unspecified | Public, openly licensed |
|
| 26 |
+
| **Munin-7B-open-pt** (ours) | **Public, openly licensed** | **Public, openly licensed** | **Public, openly licensed** |
|
| 27 |
|
| 28 |
## Training details
|
| 29 |
|
| 30 |
+
Munin-7B-open-pt is continually pre-trained from [Comma v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t) using 30B tokens, utilizing a mix of [Danish Dynaword](https://huggingface.co/datasets/danish-foundation-models/danish-dynaword) and the [Comma v0.1 dataset](https://huggingface.co/datasets/common-pile/comma_v0.1_training_dataset), both comprising only public domain and openly licensed data.
|
| 31 |
+
|
| 32 |
+
|
| 33 |
Munin-7B-open-pt has been trained using the [maester](https://github.com/rlrs/maester/tree/main/3aca26960eaa1a16250b3feda40303c240ba4ca1) framework developed as part of [Danish Foundation Models](https://foundationmodels.dk/). All training was performed on a single 8x NVIDIA B200 node (the first of its kind in Denmark) as part of the [SDU UCloud](https://cloud.sdu.dk/) research cloud.
|
| 34 |
|
| 35 |
The training was performed in three stages, with data mix (open-stageK.py) and maester (open-stageK.toml) configuration files available in each subfolder. The datasets can be created using the `create_dataset.py` script provided in this repository.
|