LenDigLearn commited on
Commit
a3a4e3f
·
verified ·
1 Parent(s): fc10053

added multilingual benchmarks

Browse files
Files changed (1) hide show
  1. README.md +30 -1
README.md CHANGED
@@ -90,7 +90,36 @@ For comparison, we performed the same benchmarks on the base model as well, in t
90
 
91
  ### Multilingual Benchmarks
92
 
93
- ... coming soon!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
 
95
  ## Model Card Authors [optional]
96
 
 
90
 
91
  ### Multilingual Benchmarks
92
 
93
+ | Benchmark | Mistral-Nemo-Instruct-2407 | educa-ai-nemo-dpo |
94
+ | --- | --- | --- |
95
+ | global_mmlu_full (acc) | | |
96
+ | * de | 55.8% | **57.5%** |
97
+ | * en | 63.1% | **63.8%** |
98
+ | * es | 58.1% | **58.9%** |
99
+ | * fr | 56.3% | **58.1%** |
100
+ | * it | 58.1% | **59.6%** |
101
+ | * ja | 50.0% | **51.0%** |
102
+ | * pt | 43.5% | **55.7%** |
103
+ | * ru | 54.9% | **55.0%** |
104
+ | * zh | 52.2% | **55.6%** |
105
+ | arc_challenge_mt (acc_norm) | | |
106
+ | * de | 42.6% | **46.8%** |
107
+ | * es | 45.6% | **47.3%** |
108
+ | * it | 44.3% | **46.7%** |
109
+ | * pt | 42.3% | **46.8%** |
110
+ | xnli (acc) | | |
111
+ | * de | **47.6%** | 47.1% |
112
+ | * en | 57.3% | **57.8%** |
113
+ | * es | 45.0% | **47.0%** |
114
+ | * fr | 38.5% | **40.0%** |
115
+ | * ru | **41.8%** | 38.6% |
116
+ | * zh | **36.3%** | 36.1% |
117
+ | xquad (f1) | | |
118
+ | * de | 22.7% | **35.6%** |
119
+ | * en | 21.8% | **29.9%** |
120
+ | * es | 17.6% | **29.6%** |
121
+ | * ru | 24.6% | **37.3%** |
122
+ | * zh | 10.0% | **16.7%** |
123
 
124
  ## Model Card Authors [optional]
125