added multilingual benchmarks
Browse files
README.md
CHANGED
|
@@ -90,7 +90,36 @@ For comparison, we performed the same benchmarks on the base model as well, in t
|
|
| 90 |
|
| 91 |
### Multilingual Benchmarks
|
| 92 |
|
| 93 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
|
| 95 |
## Model Card Authors [optional]
|
| 96 |
|
|
|
|
| 90 |
|
| 91 |
### Multilingual Benchmarks
|
| 92 |
|
| 93 |
+
| Benchmark | Mistral-Nemo-Instruct-2407 | educa-ai-nemo-dpo |
|
| 94 |
+
| --- | --- | --- |
|
| 95 |
+
| global_mmlu_full (acc) | | |
|
| 96 |
+
| * de | 55.8% | **57.5%** |
|
| 97 |
+
| * en | 63.1% | **63.8%** |
|
| 98 |
+
| * es | 58.1% | **58.9%** |
|
| 99 |
+
| * fr | 56.3% | **58.1%** |
|
| 100 |
+
| * it | 58.1% | **59.6%** |
|
| 101 |
+
| * ja | 50.0% | **51.0%** |
|
| 102 |
+
| * pt | 43.5% | **55.7%** |
|
| 103 |
+
| * ru | 54.9% | **55.0%** |
|
| 104 |
+
| * zh | 52.2% | **55.6%** |
|
| 105 |
+
| arc_challenge_mt (acc_norm) | | |
|
| 106 |
+
| * de | 42.6% | **46.8%** |
|
| 107 |
+
| * es | 45.6% | **47.3%** |
|
| 108 |
+
| * it | 44.3% | **46.7%** |
|
| 109 |
+
| * pt | 42.3% | **46.8%** |
|
| 110 |
+
| xnli (acc) | | |
|
| 111 |
+
| * de | **47.6%** | 47.1% |
|
| 112 |
+
| * en | 57.3% | **57.8%** |
|
| 113 |
+
| * es | 45.0% | **47.0%** |
|
| 114 |
+
| * fr | 38.5% | **40.0%** |
|
| 115 |
+
| * ru | **41.8%** | 38.6% |
|
| 116 |
+
| * zh | **36.3%** | 36.1% |
|
| 117 |
+
| xquad (f1) | | |
|
| 118 |
+
| * de | 22.7% | **35.6%** |
|
| 119 |
+
| * en | 21.8% | **29.9%** |
|
| 120 |
+
| * es | 17.6% | **29.6%** |
|
| 121 |
+
| * ru | 24.6% | **37.3%** |
|
| 122 |
+
| * zh | 10.0% | **16.7%** |
|
| 123 |
|
| 124 |
## Model Card Authors [optional]
|
| 125 |
|