added preliminary benchmark numbers
Browse files
README.md
CHANGED
|
@@ -62,7 +62,31 @@ Our data encompasses examples of a length up to 16384 tokens, further enhancing
|
|
| 62 |
|
| 63 |
## Evaluation
|
| 64 |
|
| 65 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
|
| 68 |
## Model Card Authors [optional]
|
|
|
|
| 62 |
|
| 63 |
## Evaluation
|
| 64 |
|
| 65 |
+
We performed benchmarks using lighteval. The accuracy numbers obtained this way differ greatly from the base model's official benchmarks and those performed with different benchmark suites.
|
| 66 |
+
Thus, we have run the same benchmarks using lighteval on the [base model](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) under the exact same conditions as well for comparison.
|
| 67 |
+
As of 2025-01-24, We are working on running these benchmarks again using a different suite as well as running more German-specific benchmarks.
|
| 68 |
+
|
| 69 |
+
### English Benchmarks
|
| 70 |
+
| Benchmark | Mistral-Nemo-Instruct 2407 | educa-ai-nemo-sft |
|
| 71 |
+
| --- | --- | --- |
|
| 72 |
+
| HellaSwag (0-shot) | **44.33%** | 38.65% |
|
| 73 |
+
| WinoGrande (0-shot) | 55.49% | **58.56%** |
|
| 74 |
+
| OpenBookQA (0-shot) | **40.60%** | 36.40% |
|
| 75 |
+
| CommonSenseQA (0-shot) | 37.26% | **39.31%** |
|
| 76 |
+
| TruthfulQA (0-shot) | 56.12% | **59.94%** |
|
| 77 |
+
| MMLU (5-shot) | 30.10% | **37.91%** |
|
| 78 |
+
|
| 79 |
+
|
| 80 |
+
### Multilingual Benchmarks (MMLU)
|
| 81 |
+
| Language | Mistral-Nemo-Instruct 2407 | educa-ai-nemo-sft |
|
| 82 |
+
| --- | --- | --- |
|
| 83 |
+
| French | **30.32%** | 29.05% |
|
| 84 |
+
| German | 27.69% | **41.82%** |
|
| 85 |
+
| Spanish | 24.69% | **30.25%** |
|
| 86 |
+
| Italian | 31.29% | **34.81%** |
|
| 87 |
+
| Portuguese | 24.16% | **28.81%** |
|
| 88 |
+
| Chinese | 34.80% | **37.85%** |
|
| 89 |
+
| Japanese | 34.27% | **35.18%** |
|
| 90 |
|
| 91 |
|
| 92 |
## Model Card Authors [optional]
|