Updated model name
Browse files
README.md
CHANGED
|
@@ -11,10 +11,10 @@ base_model:
|
|
| 11 |
pipeline_tag: text-generation
|
| 12 |
---
|
| 13 |
|
| 14 |
-
#
|
| 15 |
|
| 16 |
-
|
| 17 |
-
|
| 18 |
It has not been instruction-tuned and cannot directly be expected to function as a chat model.
|
| 19 |
|
| 20 |
| Model | Model Weights | Training Data | Training Code |
|
|
@@ -22,7 +22,7 @@ It has not been instruction-tuned and cannot directly be expected to function as
|
|
| 22 |
| Llama | Public with custom license | Private | Private |
|
| 23 |
| Gemma | Public, openly licensed | Private | Private |
|
| 24 |
| Apertus | Public, openly licensed | Reproducible, license unspecified | Public, openly licensed |
|
| 25 |
-
| **
|
| 26 |
|
| 27 |
## Evaluation
|
| 28 |
|
|
@@ -32,27 +32,27 @@ The following plots show the model size on the x-axis and an aggregate performan
|
|
| 32 |
|
| 33 |
<img src="./images/performance_plot_da.png" width="600"/>
|
| 34 |
|
| 35 |
-
|
| 36 |
|
| 37 |
Below we report results for Danish (see English below) for all EuroEval-supported tasks: sentiment classification, named entity recognition, linguistic acceptability, reading comprehension, summarization, and knowledge and common-sense reasoning. In addition, we evaluate the model on DaLA, a Danish linguistic acceptability dataset focusing on real-world common errors.
|
| 38 |
|
| 39 |
-
We compare
|
| 40 |
and two models from the Pleias family ([Pleias-350M-Preview](https://huggingface.co/PleIAs/Pleias-350m-Preview) and [Pleias-1.2B-Preview](https://huggingface.co/PleIAs/Pleias-1.2b-Preview)).
|
| 41 |
All comparison models were trained exclusively on open data, either in the public domain or under a permissive license.
|
| 42 |
|
| 43 |
The following tables show the performance on each dataset.
|
| 44 |
For each, we report the respective main metric from EuroEval and the confidence interval.
|
| 45 |
|
| 46 |
-
| Model
|
| 47 |
-
| ---------------------------- | ------------- | ------------- | ------------------ | ------------------------- | ----------------------- | -------------------------- | --------------------- | ------------------ | ---------------------------- | ------- |
|
| 48 |
-
| base (comma-v0.1-2t)
|
| 49 |
-
| **Training Stages**
|
| 50 |
-
|
|
| 51 |
-
|
|
| 52 |
-
|
|
| 53 |
-
| **Baselines**
|
| 54 |
-
| Pleias-350m-Preview
|
| 55 |
-
| Pleias-1.2b-Preview
|
| 56 |
|
| 57 |
### Performance on English
|
| 58 |
|
|
@@ -61,22 +61,22 @@ For each, we report the respective main metric from EuroEval and the confidence
|
|
| 61 |
The goal of this section is to demonstrate how the performance deteriorates for English when adapting the model for Danish. Generally, we seem to have only performance degradation
|
| 62 |
across tasks, with the exception of `squad`.
|
| 63 |
|
| 64 |
-
| Model
|
| 65 |
-
| ---------------------------- | ------------- | ------------ | --------------------------- | -------------------- | ------------ | --------------- | ------------------------- | ------- |
|
| 66 |
-
| base (comma-v0.1-2t)
|
| 67 |
-
| **Training Stages**
|
| 68 |
-
|
|
| 69 |
-
|
|
| 70 |
-
|
|
| 71 |
-
| **Baseline**
|
| 72 |
-
| Pleias-350m-Preview
|
| 73 |
-
| Pleias-1.2b-Preview
|
| 74 |
|
| 75 |
## Training details
|
| 76 |
|
| 77 |
-
|
| 78 |
|
| 79 |
-
|
| 80 |
|
| 81 |
The training was performed in three stages, with data mix (open-stageK.py) and maester (open-stageK.toml) configuration files available in each subfolder. The datasets can be created using the `create_dataset.py` script provided in this repository.
|
| 82 |
|
|
@@ -90,10 +90,10 @@ The characteristics of the three pre-training stages are detailed in the followi
|
|
| 90 |
|
| 91 |
## Limitations
|
| 92 |
|
| 93 |
-
|
| 94 |
It will likely have poor performance on other languages or programming languages.
|
| 95 |
|
| 96 |
-
As a base model,
|
| 97 |
|
| 98 |
## License
|
| 99 |
|
|
@@ -101,7 +101,7 @@ The model is made available under [Apache 2.0](https://www.apache.org/licenses/L
|
|
| 101 |
|
| 102 |
## Project partners & funding
|
| 103 |
|
| 104 |
-
The development of
|
| 105 |
|
| 106 |
Funding was provided by the [Danish Ministry of Digital Affairs](https://www.english.digmin.dk/) and the [Danish Ministry of Higher Education and Science](https://ufm.dk/en).
|
| 107 |
|
|
|
|
| 11 |
pipeline_tag: text-generation
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# DFM-Decoder-open-v0-7b-pt
|
| 15 |
|
| 16 |
+
DFM-Decoder-open-v0-7b-pt is a 7-billion-parameter [open-source](https://opensource.org/ai/open-source-ai-definition) language model.
|
| 17 |
+
DFM-Decoder-open-v0-7b-pt is a base model that can serve as a starting point for fine-tuning and post-training.
|
| 18 |
It has not been instruction-tuned and cannot directly be expected to function as a chat model.
|
| 19 |
|
| 20 |
| Model | Model Weights | Training Data | Training Code |
|
|
|
|
| 22 |
| Llama | Public with custom license | Private | Private |
|
| 23 |
| Gemma | Public, openly licensed | Private | Private |
|
| 24 |
| Apertus | Public, openly licensed | Reproducible, license unspecified | Public, openly licensed |
|
| 25 |
+
| **DFM-Decoder-open-v0-7b-pt** (ours) | **Public, openly licensed** | **Public, openly licensed** | **Public, openly licensed** |
|
| 26 |
|
| 27 |
## Evaluation
|
| 28 |
|
|
|
|
| 32 |
|
| 33 |
<img src="./images/performance_plot_da.png" width="600"/>
|
| 34 |
|
| 35 |
+
DFM-Decoder-open-v0-7b-pt was evaluated using the [EuroEval](https://euroeval.com/) framework, which includes benchmarks across seven task types covering more than 15 European languages.
|
| 36 |
|
| 37 |
Below we report results for Danish (see English below) for all EuroEval-supported tasks: sentiment classification, named entity recognition, linguistic acceptability, reading comprehension, summarization, and knowledge and common-sense reasoning. In addition, we evaluate the model on DaLA, a Danish linguistic acceptability dataset focusing on real-world common errors.
|
| 38 |
|
| 39 |
+
We compare DFM-Decoder-open-v0-7b-pt at various training stages with its base model [Comma v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t)
|
| 40 |
and two models from the Pleias family ([Pleias-350M-Preview](https://huggingface.co/PleIAs/Pleias-350m-Preview) and [Pleias-1.2B-Preview](https://huggingface.co/PleIAs/Pleias-1.2b-Preview)).
|
| 41 |
All comparison models were trained exclusively on open data, either in the public domain or under a permissive license.
|
| 42 |
|
| 43 |
The following tables show the performance on each dataset.
|
| 44 |
For each, we report the respective main metric from EuroEval and the confidence interval.
|
| 45 |
|
| 46 |
+
| Model | scala-da (MCC)| dala (MCC) | angry-tweets (MCC) | dansk (Micro F1, No Misc) | danske-talemaader (MCC) | danish-citizen-tests (MCC) | multi-wiki-qa-da (F1) | hellaswag-da (MCC) | nordjylland-news (BERTScore) | average |
|
| 47 |
+
| ----------------------------------- | ------------- | ------------- | ------------------ | ------------------------- | ----------------------- | -------------------------- | --------------------- | ------------------ | ---------------------------- | ------- |
|
| 48 |
+
| base (comma-v0.1-2t) | 0.9 ± 0.8 | 0.2 ± 0.6 | 39.8 ± 1.4 | 32.0 ± 2.8 | 3.6 ± 2.3 | 10.7 ± 4.1 | 66.4 ± 0.8 | 3.8 ± 1.0 | 60.2 ± 1.7 | 24.2 |
|
| 49 |
+
| **Training Stages** | | | | | | | | | | |
|
| 50 |
+
| dfm-decoder-open-v0-7b-pt (stage 1) | 13.3 ± 2.9 | 12.7 ± 2.2 | **47.7** ± 1.7 | 40.0 ± 2.4 | 18.1 ± 0.9 | 32.8 ± 1.4 | **76.6** ± 0.6 | 12.9 ± 1.0 | 66.3 ± 0.7 | 35.6 |
|
| 51 |
+
| dfm-decoder-open-v0-7b-pt (stage 2) | 15.8 ± 3.1 | 14.4 ± 2.9 | 47.4 ± 2.3 | 40.4 ± 2.4 | 24.1 ± 1.8 | 36.1 ± 1.8 | 75.2 ± 0.7 | 13.1 ± 1.1 | 66.5 ± 0.6 | 37.0 |
|
| 52 |
+
| dfm-decoder-open-v0-7b-pt (stage 3) | **16.5** ± 1.4| **15.7** ± 1.7| 46.3 ± 2.1 | **41.1** ± 2.8 | **24.6** ± 2.0 | **36.2** ± 1.7 | 76.0 ± 0.7 | **13.2** ± 1.2 | **66.6** ± 0.6 | **37.4** |
|
| 53 |
+
| **Baselines** | | | | | | | | | | |
|
| 54 |
+
| Pleias-350m-Preview | -1.0 ± 1.5 | -1.8 ± 1.8 | 10.6 ± 2.9 | 12.9 ± 1.8 | 0.7 ± 2.6 | 4.6 ± 2.3 | 11.6 ± 0.9 | -0.3 ± 0.7 | 56.3 ± 1.5 | 10.4 |
|
| 55 |
+
| Pleias-1.2b-Preview | 0.2 ± 1.1 | 0.7 ± 1.0 | 27.7 ± 2.9 | 27.3 ± 2.2 | -0.6 ± 1.9 | 8.6 ± 3.2 | 35.2 ± 1.3 | -0.0 ± 1.5 | 60.3 ± 0.9 | 17.7 |
|
| 56 |
|
| 57 |
### Performance on English
|
| 58 |
|
|
|
|
| 61 |
The goal of this section is to demonstrate how the performance deteriorates for English when adapting the model for Danish. Generally, we seem to have only performance degradation
|
| 62 |
across tasks, with the exception of `squad`.
|
| 63 |
|
| 64 |
+
| Model | scala-en (MCC) | sst5 (MCC) | conll-en (Micro F1 no misc) | life-in-the-uk (MCC) | squad (F1) | hellaswag (MCC) | cnn-dailymail (BERTScore) | average |
|
| 65 |
+
| ------------------------------------ | ------------- | ------------ | --------------------------- | -------------------- | ------------ | --------------- | ------------------------- | ------- |
|
| 66 |
+
| base (comma-v0.1-2t) | **29.7** ± 1.9 | **61.8** ± 2.1| **57.5** ± 2.8 | 41.6 ± 2.4 | **90.4** ± 0.4| **16.8** ± 0.6 | **63.3** ± 0.9 | **51.6** |
|
| 67 |
+
| **Training Stages** | | | | | | | | |
|
| 68 |
+
| dfm-decoder-open-v0-7b-pt (stage 1) | 17.1 ± 9.0 | 60.0 ± 1.7 | 56.6 ± 2.2 | 40.5 ± 1.7 | 90.1 ± 0.3 | 13.7 ± 0.7 | 59.6 ± 1.3 | 48.2 |
|
| 69 |
+
| dfm-decoder-open-v0-7b-pt (stage 2) | 27.7 ± 2.0 | 59.5 ± 1.6 | 56.6 ± 2.3 | 41.2 ± 1.7 | 90.2 ± 0.4 | 16.0 ± 0.9 | 60.3 ± 1.6 | 50.2 |
|
| 70 |
+
| dfm-decoder-open-v0-7b-pt (stage 3) | 29.0 ± 2.4 | 60.3 ± 1.4 | 56.9 ± 2.5 | **41.7** ± 1.8 | 89.9 ± 0.4 | 13.8 ± 0.9 | 59.2 ± 1.7 | 50.1 |
|
| 71 |
+
| **Baseline** | | | | | | | | |
|
| 72 |
+
| Pleias-350m-Preview | 0.7 ± 1.8 | 15.4 ± 7.3 | 31.8 ± 3.5 | -0.7 ± 2.1 | 31.1 ± 2.3 | 0.2 ± 1.4 | 53.8 ± 1.0 | 18.9 |
|
| 73 |
+
| Pleias-1.2b-Preview | 1.0 ± 2.4 | 48.2 ± 2.6 | 40.9 ± 3.3 | 2.6 ± 2.8 | 52.9 ± 2.5 | -0.1 ± 1.5 | 60.2 ± 1.6 | 29.4 |
|
| 74 |
|
| 75 |
## Training details
|
| 76 |
|
| 77 |
+
DFM-Decoder-open-v0-7b-pt is continually pre-trained from [Comma v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t) using 30B tokens, utilizing a mix of [Danish Dynaword](https://huggingface.co/datasets/danish-foundation-models/danish-dynaword) and the [Comma v0.1 dataset](https://huggingface.co/datasets/common-pile/comma_v0.1_training_dataset), both comprising only public domain and openly licensed data.
|
| 78 |
|
| 79 |
+
DFM-Decoder-open-v0-7b-pt has been trained using the [maester](https://github.com/rlrs/maester) framework developed as part of [Danish Foundation Models](https://foundationmodels.dk/). All training was performed on a single 8x NVIDIA B200 node (the first of its kind in Denmark) as part of the [SDU UCloud](https://cloud.sdu.dk/) research cloud.
|
| 80 |
|
| 81 |
The training was performed in three stages, with data mix (open-stageK.py) and maester (open-stageK.toml) configuration files available in each subfolder. The datasets can be created using the `create_dataset.py` script provided in this repository.
|
| 82 |
|
|
|
|
| 90 |
|
| 91 |
## Limitations
|
| 92 |
|
| 93 |
+
DFM-Decoder-open-v0-7b-pt was trained only on Danish and English-language data and code from the 15 programming languages covered by the [stack-edu classifiers](https://huggingface.co/collections/HuggingFaceTB/the-ultimate-collection-of-code-classifiers-67b5aa3eb8994a4b71453005).
|
| 94 |
It will likely have poor performance on other languages or programming languages.
|
| 95 |
|
| 96 |
+
As a base model, DFM-Decoder-open-v0-7b-pt has not been aligned for safety and may, for example, reflect social biases present in its training data or potentially provide toxic or harmful information.
|
| 97 |
|
| 98 |
## License
|
| 99 |
|
|
|
|
| 101 |
|
| 102 |
## Project partners & funding
|
| 103 |
|
| 104 |
+
The development of DFM-Decoder-open-v0-7b-pt was performed in a close collaboration between [Aarhus University](https://chc.au.dk/), the [Alexandra Institute](https://alexandra.dk/), and the [University of Southern Denmark](https://www.sdu.dk/en/forskning/machine-learning) as part of [Danish Foundation Models](https://foundationmodels.dk/).
|
| 105 |
|
| 106 |
Funding was provided by the [Danish Ministry of Digital Affairs](https://www.english.digmin.dk/) and the [Danish Ministry of Higher Education and Science](https://ufm.dk/en).
|
| 107 |
|