File size: 20,701 Bytes
4fec6b2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ae34f4f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
INFO: 2024-07-13 14:29:23,827: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-07-13 14:29:23,828: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:23,828: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:23,892: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-07-13 14:29:23,896: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:23,896: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:24,151: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-07-13 14:29:24,154: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:24,154: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:24,345: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-07-13 14:29:24,346: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:24,346: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:25,729: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-13 14:29:25,731: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:25,731: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:27,678: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-07-13 14:29:27,678: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:27,678: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:29,484: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-07-13 14:29:29,484: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:29,484: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:33,887: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.40s
INFO: 2024-07-13 14:29:39,828: llmtf.base.daru/treewayextractive: Loading Dataset: 12.15s
INFO: 2024-07-13 14:29:42,885: llmtf.base.daru/treewayabstractive: Loading Dataset: 17.15s
INFO: 2024-07-13 14:29:45,765: llmtf.base.darumeru/MultiQ: Loading Dataset: 21.61s
INFO: 2024-07-13 14:30:53,478: llmtf.base.darumeru/ruMMLU: Loading Dataset: 89.58s
INFO: 2024-07-13 14:32:57,360: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 213.01s
INFO: 2024-07-13 14:33:24,939: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 231.05s
INFO: 2024-07-13 14:33:24,943: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-07-13 14:33:24,962: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.8278810271761903, 'len': 0.9977030047832767, 'lcs': 0.9847970468194288}
INFO: 2024-07-13 14:33:24,975: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:33:24,975: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:33:28,742: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.77s
INFO: 2024-07-13 14:33:45,284: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 261.46s
INFO: 2024-07-13 14:36:13,193: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 164.45s
INFO: 2024-07-13 14:36:13,226: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-07-13 14:36:13,244: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.424509793356442, 'len': 0.9995781033988959, 'lcs': 0.994055994028679}
INFO: 2024-07-13 14:36:13,246: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:36:13,246: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:36:17,469: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 4.22s
INFO: 2024-07-13 14:36:19,338: llmtf.base.daru/treewayextractive: Processing Dataset: 399.51s
INFO: 2024-07-13 14:36:19,340: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-07-13 14:36:19,799: llmtf.base.daru/treewayextractive: {'r-prec': 0.39738621933621937}
INFO: 2024-07-13 14:36:19,844: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:36:19,850: llmtf.base.evaluator: 
mean	daru/treewayextractive	darumeru/cp_sent_en	darumeru/cp_sent_ru
0.798	0.397	1.000	0.998
INFO: 2024-07-13 14:36:56,298: llmtf.base.darumeru/MultiQ: Processing Dataset: 430.53s
INFO: 2024-07-13 14:36:56,300: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-07-13 14:36:56,305: llmtf.base.darumeru/MultiQ: {'f1': 0.48425376524800046, 'em': 0.3795411089866157}
INFO: 2024-07-13 14:36:56,316: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:36:56,317: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:37:00,009: llmtf.base.darumeru/PARus: Loading Dataset: 3.69s
INFO: 2024-07-13 14:37:13,006: llmtf.base.darumeru/PARus: Processing Dataset: 13.00s
INFO: 2024-07-13 14:37:13,009: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-07-13 14:37:13,021: llmtf.base.darumeru/PARus: {'acc': 0.85}
INFO: 2024-07-13 14:37:13,023: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:37:13,023: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:37:16,908: llmtf.base.darumeru/RCB: Loading Dataset: 3.88s
INFO: 2024-07-13 14:37:39,047: llmtf.base.darumeru/RCB: Processing Dataset: 22.12s
INFO: 2024-07-13 14:37:39,050: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-07-13 14:37:39,056: llmtf.base.darumeru/RCB: {'acc': 0.5272727272727272, 'f1_macro': 0.43555405633327715}
INFO: 2024-07-13 14:37:39,058: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:37:39,059: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:37:53,697: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 14.64s
INFO: 2024-07-13 14:40:08,010: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 134.31s
INFO: 2024-07-13 14:40:08,013: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-07-13 14:40:08,027: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7680412371134021, 'f1_macro': 0.7680185950653384}
INFO: 2024-07-13 14:40:08,043: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:40:08,043: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:40:15,245: llmtf.base.darumeru/ruTiE: Loading Dataset: 7.20s
INFO: 2024-07-13 14:41:10,015: llmtf.base.daru/treewayabstractive: Processing Dataset: 687.13s
INFO: 2024-07-13 14:41:10,017: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-07-13 14:41:10,037: llmtf.base.daru/treewayabstractive: {'rouge1': 0.360975899636531, 'rouge2': 0.1330737491255763}
INFO: 2024-07-13 14:41:10,042: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:41:10,069: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruOpenBookQA
0.647	0.247	0.397	0.432	0.850	0.481	1.000	0.998	0.768
INFO: 2024-07-13 14:41:58,403: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 340.93s
INFO: 2024-07-13 14:41:58,453: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-07-13 14:41:58,457: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.9697516062295746, 'len': 0.9984044778480231, 'lcs': 0.9773285044731846}
INFO: 2024-07-13 14:41:58,459: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:41:58,459: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:42:02,784: llmtf.base.darumeru/cp_para_en: Loading Dataset: 4.32s
INFO: 2024-07-13 14:44:45,025: llmtf.base.darumeru/ruTiE: Processing Dataset: 269.78s
INFO: 2024-07-13 14:44:45,027: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE:
INFO: 2024-07-13 14:44:45,073: llmtf.base.darumeru/ruTiE: {'acc': 0.3511627906976744}
INFO: 2024-07-13 14:44:45,076: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:44:45,077: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:44:47,875: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.80s
INFO: 2024-07-13 14:44:55,693: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 7.80s
INFO: 2024-07-13 14:44:55,695: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-07-13 14:44:55,700: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8761904761904762, 'f1_macro': 0.8733631471423589}
INFO: 2024-07-13 14:44:55,701: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:44:55,701: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:45:00,084: llmtf.base.darumeru/RWSD: Loading Dataset: 4.38s
INFO: 2024-07-13 14:45:19,405: llmtf.base.darumeru/RWSD: Processing Dataset: 19.32s
INFO: 2024-07-13 14:45:19,421: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-07-13 14:45:19,425: llmtf.base.darumeru/RWSD: {'acc': 0.5441176470588235}
INFO: 2024-07-13 14:45:19,427: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:45:19,427: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:45:34,612: llmtf.base.darumeru/USE: Loading Dataset: 15.18s
INFO: 2024-07-13 14:46:14,635: llmtf.base.darumeru/cp_para_en: Processing Dataset: 251.85s
INFO: 2024-07-13 14:46:14,638: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-07-13 14:46:14,657: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.485777628533072, 'len': 0.999455845790753, 'lcs': 0.9727731185644367}
INFO: 2024-07-13 14:46:14,658: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:46:14,684: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree
0.684	0.247	0.397	0.432	0.850	0.481	0.544	0.973	0.977	1.000	0.998	0.768	0.351	0.875
INFO: 2024-07-13 14:48:58,982: llmtf.base.darumeru/USE: Processing Dataset: 204.37s
INFO: 2024-07-13 14:48:58,999: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-07-13 14:48:59,004: llmtf.base.darumeru/USE: {'grade_norm': 0.18725490196078434}
INFO: 2024-07-13 14:48:59,010: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:48:59,010: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:49:19,451: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 20.44s
INFO: 2024-07-13 14:50:14,250: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 1036.87s
INFO: 2024-07-13 14:50:14,255: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-07-13 14:50:14,302: llmtf.base.nlpcoreteam/enMMLU:                                        metric
subject                                      
abstract_algebra                     0.350000
anatomy                              0.696296
astronomy                            0.730263
business_ethics                      0.700000
clinical_knowledge                   0.754717
college_biology                      0.812500
college_chemistry                    0.500000
college_computer_science             0.590000
college_mathematics                  0.330000
college_medicine                     0.670520
college_physics                      0.470588
computer_security                    0.780000
conceptual_physics                   0.570213
econometrics                         0.561404
electrical_engineering               0.634483
elementary_mathematics               0.439153
formal_logic                         0.507937
global_facts                         0.430000
high_school_biology                  0.800000
high_school_chemistry                0.517241
high_school_computer_science         0.760000
high_school_european_history         0.787879
high_school_geography                0.843434
high_school_government_and_politics  0.922280
high_school_macroeconomics           0.671795
high_school_mathematics              0.381481
high_school_microeconomics           0.764706
high_school_physics                  0.417219
high_school_psychology               0.847706
high_school_statistics               0.537037
high_school_us_history               0.833333
high_school_world_history            0.835443
human_aging                          0.730942
human_sexuality                      0.801527
international_law                    0.818182
jurisprudence                        0.759259
logical_fallacies                    0.766871
machine_learning                     0.544643
management                           0.825243
marketing                            0.901709
medical_genetics                     0.830000
miscellaneous                        0.842912
moral_disputes                       0.751445
moral_scenarios                      0.497207
nutrition                            0.754902
philosophy                           0.720257
prehistory                           0.753086
professional_accounting              0.556738
professional_law                     0.483051
professional_medicine                0.742647
professional_psychology              0.717320
public_relations                     0.690909
security_studies                     0.722449
sociology                            0.840796
us_foreign_policy                    0.840000
virology                             0.512048
world_religions                      0.818713
INFO: 2024-07-13 14:50:14,310: llmtf.base.nlpcoreteam/enMMLU:                                    metric
subject                                  
STEM                             0.564712
humanities                       0.717897
other (business, health, misc.)  0.710620
social sciences                  0.768694
INFO: 2024-07-13 14:50:14,318: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6904807286717012}
INFO: 2024-07-13 14:50:14,385: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:50:14,399: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree	nlpcoreteam/enMMLU
0.651	0.247	0.397	0.432	0.850	0.481	0.544	0.187	0.973	0.977	1.000	0.998	0.768	0.351	0.875	0.690
INFO: 2024-07-13 14:51:55,784: llmtf.base.darumeru/ruMMLU: Processing Dataset: 1262.30s
INFO: 2024-07-13 14:51:55,788: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-07-13 14:51:55,799: llmtf.base.darumeru/ruMMLU: {'acc': 0.5138182180983737}
INFO: 2024-07-13 14:51:55,888: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:51:55,906: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree	nlpcoreteam/enMMLU
0.643	0.247	0.397	0.432	0.850	0.481	0.544	0.187	0.973	0.977	1.000	0.998	0.514	0.768	0.351	0.875	0.690
INFO: 2024-07-13 14:52:18,001: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 178.55s
INFO: 2024-07-13 14:52:18,002: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-07-13 14:52:18,035: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7115177610333692, 'mcc': 0.3362227509262135}
INFO: 2024-07-13 14:52:18,046: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:52:18,077: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree	nlpcoreteam/enMMLU	russiannlp/rucola_custom
0.636	0.247	0.397	0.432	0.850	0.481	0.544	0.187	0.973	0.977	1.000	0.998	0.514	0.768	0.351	0.875	0.690	0.524
INFO: 2024-07-13 14:59:07,852: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 1522.57s
INFO: 2024-07-13 14:59:07,871: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-07-13 14:59:07,917: llmtf.base.nlpcoreteam/ruMMLU:                                        metric
subject                                      
abstract_algebra                     0.330000
anatomy                              0.511111
astronomy                            0.651316
business_ethics                      0.680000
clinical_knowledge                   0.588679
college_biology                      0.534722
college_chemistry                    0.480000
college_computer_science             0.520000
college_mathematics                  0.350000
college_medicine                     0.549133
college_physics                      0.352941
computer_security                    0.720000
conceptual_physics                   0.540426
econometrics                         0.438596
electrical_engineering               0.572414
elementary_mathematics               0.417989
formal_logic                         0.396825
global_facts                         0.370000
high_school_biology                  0.664516
high_school_chemistry                0.394089
high_school_computer_science         0.690000
high_school_european_history         0.763636
high_school_geography                0.666667
high_school_government_and_politics  0.647668
high_school_macroeconomics           0.553846
high_school_mathematics              0.348148
high_school_microeconomics           0.546218
high_school_physics                  0.410596
high_school_psychology               0.682569
high_school_statistics               0.449074
high_school_us_history               0.691176
high_school_world_history            0.734177
human_aging                          0.538117
human_sexuality                      0.641221
international_law                    0.743802
jurisprudence                        0.657407
logical_fallacies                    0.558282
machine_learning                     0.401786
management                           0.689320
marketing                            0.730769
medical_genetics                     0.670000
miscellaneous                        0.650064
moral_disputes                       0.630058
moral_scenarios                      0.382123
nutrition                            0.604575
philosophy                           0.614148
prehistory                           0.574074
professional_accounting              0.397163
professional_law                     0.397001
professional_medicine                0.514706
professional_psychology              0.514706
public_relations                     0.609091
security_studies                     0.657143
sociology                            0.676617
us_foreign_policy                    0.740000
virology                             0.457831
world_religions                      0.695906
INFO: 2024-07-13 14:59:07,924: llmtf.base.nlpcoreteam/ruMMLU:                                    metric
subject                                  
STEM                             0.490445
humanities                       0.602971
other (business, health, misc.)  0.567962
social sciences                  0.614529
INFO: 2024-07-13 14:59:07,947: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5689766403256171}
INFO: 2024-07-13 14:59:08,029: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:59:08,049: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.632	0.247	0.397	0.432	0.850	0.481	0.544	0.187	0.973	0.977	1.000	0.998	0.514	0.768	0.351	0.875	0.690	0.569	0.524