DigitalLearningGmbH
/

educa-ai-nemo-sft

Text Generation

text-generation-inference

Model card Files Files and versions

LenDigLearn commited on Jan 23, 2025

Commit

1fc3a30

·

verified ·

1 Parent(s): d9e89fd

added preliminary benchmark numbers

Files changed (1) hide show

README.md +25 -1

README.md CHANGED Viewed

@@ -62,7 +62,31 @@ Our data encompasses examples of a length up to 16384 tokens, further enhancing
 ## Evaluation
-Evaluation results will be added soon.
 ## Model Card Authors [optional]

 ## Evaluation
+We performed benchmarks using lighteval. The accuracy numbers obtained this way differ greatly from the base model's official benchmarks and those performed with different benchmark suites.
+Thus, we have run the same benchmarks using lighteval on the [base model](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) under the exact same conditions as well for comparison.
+As of 2025-01-24, We are working on running these benchmarks again using a different suite as well as running more German-specific benchmarks.
+### English Benchmarks
+| Benchmark | Mistral-Nemo-Instruct 2407 | educa-ai-nemo-sft |
+| --- | --- | --- |
+| HellaSwag (0-shot) | **44.33%** | 38.65% |
+| WinoGrande (0-shot) | 55.49% | **58.56%** |
+| OpenBookQA (0-shot) | **40.60%** | 36.40% |
+| CommonSenseQA (0-shot) | 37.26% | **39.31%** |
+| TruthfulQA (0-shot) | 56.12% | **59.94%** |
+| MMLU (5-shot) | 30.10% | **37.91%** |
+### Multilingual Benchmarks (MMLU)
+| Language | Mistral-Nemo-Instruct 2407 | educa-ai-nemo-sft |
+| --- | --- | --- |
+| French | **30.32%** | 29.05% |
+| German | 27.69% | **41.82%** |
+| Spanish | 24.69% | **30.25%** |
+| Italian | 31.29% | **34.81%** |
+| Portuguese | 24.16% | **28.81%** |
+| Chinese | 34.80% | **37.85%** |
+| Japanese | 34.27% | **35.18%** |
 ## Model Card Authors [optional]