|
|
--- |
|
|
library_name: transformers |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
- fr |
|
|
- de |
|
|
- es |
|
|
- it |
|
|
- pt |
|
|
- ru |
|
|
- zh |
|
|
- ja |
|
|
base_model: |
|
|
- DigitalLearningGmbH/educa-ai-nemo-sft |
|
|
--- |
|
|
|
|
|
# Model Card for educa-ai-nemo-dpo |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
`educa-ai-nemo-dpo` is the preference-aligned version of our SFT model [DigitalLearningGmbH/educa-ai-nemo-sft](https://huggingface.co/DigitalLearningGmbH/educa-ai-nemo-sft), |
|
|
using our internal dataset which contains a unique mix of German and English preference data covering a multitude of domains. |
|
|
In its creation we have paid special attention to data points that can improve performance in German, especially the educational field (text analysis, supporting students in completing textual tasks, ...). |
|
|
|
|
|
This is a preliminary release and subject to changes or updates. |
|
|
|
|
|
- **Developed by:** [Digital Learning GmbH](https://huggingface.co/DigitalLearningGmbH) |
|
|
- **Funded by [optional]:** [Digital Learning GmbH](https://huggingface.co/DigitalLearningGmbH) |
|
|
- **Shared by [optional]:** [Digital Learning GmbH](https://huggingface.co/DigitalLearningGmbH) |
|
|
- **Model type:** Transformer Decoder LLM |
|
|
- **Language(s) (NLP):** English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese |
|
|
- **License:** [Apache License 2.0](https://choosealicense.com/licenses/apache-2.0/) |
|
|
- **Finetuned from model:** [DigitalLearningGmbH/educa-ai-nemo-sft](https://huggingface.co/DigitalLearningGmbH/educa-ai-nemo-sft) |
|
|
|
|
|
## Uses |
|
|
|
|
|
As stated before, this is a preliminary release and we are still benchmarking the model as well as improving our datasets for possible further training. |
|
|
As such, we do not recommend using this model in a production setting yet and are looking forward to engaging with the community regarding possible downstream uses and improvements. |
|
|
|
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
Refer to the [original model card](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) for an overview of the general risks associated with using this model. |
|
|
As this version is only fine-tuned using SFT without any preference alignment, the model may output harmful data. Use is at your own discretion, taking into account the potential risks. |
|
|
|
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
Refer to the [original model card](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) for code examples. |
|
|
Be aware that this model uses a slightly different chat template from the original: system prompts are placed before the first user prompt (before the first instance of `[INST]`). |
|
|
We include the updated template in the tokenizer config, so you can use `tokenizer.apply_chat_template`. |
|
|
|
|
|
## Training Details |
|
|
|
|
|
Instead of standard sigmoid DPO Loss, we used [DPO-Positive](https://arxiv.org/abs/2402.13228) as we found it improved training stability and overall performance with our dataset. |
|
|
|
|
|
### Training Data |
|
|
|
|
|
The model has been trained on a mix of some publically-available and permissively-licensed data as well as a majority of unique internal datasets which we have created. |
|
|
Our data encompasses examples of a length up to 16384 tokens, further enhancing the model's long-context capability. |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
We ran all benchmarks using [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness) with `--apply_chat_template`. |
|
|
For comparison, we performed the same benchmarks on the base model and Llama-3.1-8B-Instruct as well, in the exact same environment with the same parameters. |
|
|
|
|
|
### English Benchmarks |
|
|
|
|
|
| Benchmark | Llama-3.1-8B-Instruct | Mistral-Nemo-Instruct-2407 | educa-ai-nemo-dpo | |
|
|
| --- | --- | --- | --- | |
|
|
| hellaswag (acc_norm) | 72.6% | 71.9% | **77.6%** | |
|
|
| winogrande (acc) | 68.0% | 69.8% | **75.2%** | |
|
|
| openbookqa (acc_norm) | **49.0%** | 45.8% | 47.0% | |
|
|
| commonsense_qa (acc) | 64.9% | 74.4% | **75.4%** | |
|
|
| truthfulqa_mc1 (acc) | 40.4% | 39.66% | **41.5%** | |
|
|
| mmlu (acc) | 63.2% | 64.9% | **66.5%** | |
|
|
| triviaqa (exact_match) | 5.3% | 12.3% | **23.99%** | |
|
|
| agieval (acc) | 36.3% | 36.6% | **39.1%** | |
|
|
| arc_challenge (acc_norm) | 54.1% | 52.5% | **54.4%** | |
|
|
| arc_easy (acc_norm) | 75.7% | 74.1% | **76.0%** | |
|
|
| piqa (acc_norm) | 79.6% | 78.9% | **81.5%** | |
|
|
| leaderboard_bbh (acc_norm) | 37.4% | 49.1% | **53.0%** | |
|
|
| leaderboard_gpqa (acc_norm) | 28.5% | **30.6%** | 29.4% | |
|
|
| leaderboard_ifeval (inst_level_loose_acc) | **84.7%** | 72.8% | 75.1% | |
|
|
| leaderboard_mmlu_pro (acc) | 16.2% | **35.1%** | 33.67% | |
|
|
| leaderboard_musr (acc_norm) | 38.8% | 39.3% | **40.2%** | |
|
|
|
|
|
### Multilingual Benchmarks |
|
|
|
|
|
| Benchmark | Llama-3.1-8B-Instruct | Mistral-Nemo-Instruct-2407 | educa-ai-nemo-dpo | |
|
|
| --- | --- | --- | --- | |
|
|
| global_mmlu_full (acc) | | | | |
|
|
| - de | 48.2% | 55.8% | **57.5%** | |
|
|
| - en | 60.0% | 63.1% | **63.8%** | |
|
|
| - es | 54.7% | 58.1% | **58.9%** | |
|
|
| - fr | 48.3% | 56.3% | **58.1%** | |
|
|
| - it | 51.0% | 58.1% | **59.6%** | |
|
|
| - ja | 47.4% | 50.0% | **51.0%** | |
|
|
| - pt | 23.0% | 43.5% | **55.7%** | |
|
|
| - ru | 41.4% | 54.9% | **55.0%** | |
|
|
| - zh | 49.7% | 52.2% | **55.6%** | |
|
|
| arc_challenge_mt (acc_norm) | | | | |
|
|
| - de | 39.9% | 42.6% | **46.8%** | |
|
|
| - es | 42.8% | 45.6% | **47.3%** | |
|
|
| - it | 43.9% | 44.3% | **46.7%** | |
|
|
| - pt | 41.9% | 42.3% | **46.8%** | |
|
|
| xnli (acc) | | | | |
|
|
| - de | **48.1%** | 47.6% | 47.1% | |
|
|
| - en | 52.4% | 57.3% | **57.8%** | |
|
|
| - es | 46.3% | 45.0% | **47.0%** | |
|
|
| - fr | **51.6%** | 38.5% | 40.0% | |
|
|
| - ru | **48.1%** | 41.8% | 38.6% | |
|
|
| - zh | **40.3%** | 36.3% | 36.1% | |
|
|
| xquad (f1) | | | | |
|
|
| - de | 30.4% | 22.7% | **35.6%** | |
|
|
| - en | **35.0%** | 21.8% | 29.9% | |
|
|
| - es | **31.2%** | 17.6% | 29.6% | |
|
|
| - ru | **39.6%** | 24.6% | 37.3% | |
|
|
| - zh | **28.8%** | 10.0% | 16.7% | |
|
|
|
|
|
|
|
|
## Model Card Authors [optional] |
|
|
|
|
|
This model card was written by [Lennard Michael Strohmeyer](https://huggingface.co/LenDigLearn) |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
[Lennard Michael Strohmeyer](https://huggingface.co/LenDigLearn) |
|
|
|