File size: 23,385 Bytes
b43ce9f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
INFO: 2024-07-13 15:18:25,981: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-07-13 15:18:25,995: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:25,996: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:26,240: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-07-13 15:18:26,240: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:26,240: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:27,990: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-07-13 15:18:27,991: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:27,991: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:29,333: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-07-13 15:18:29,333: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:29,333: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:29,480: llmtf.base.darumeru/MultiQ: Loading Dataset: 3.48s
INFO: 2024-07-13 15:18:30,985: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-13 15:18:30,985: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:30,985: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:33,199: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-07-13 15:18:33,200: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:33,200: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:33,694: llmtf.base.darumeru/ruMMLU: Loading Dataset: 7.45s
INFO: 2024-07-13 15:18:35,345: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.36s
INFO: 2024-07-13 15:18:35,432: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-07-13 15:18:35,433: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:35,433: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:37,953: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 2.52s
INFO: 2024-07-13 15:18:40,885: llmtf.base.daru/treewayextractive: Loading Dataset: 7.69s
INFO: 2024-07-13 15:23:40,040: llmtf.base.evaluator: Starting eval on ['darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'russiannlp/rucola_custom']
INFO: 2024-07-13 15:23:40,042: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:40,042: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:40,509: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu', 'daru/treewayextractive']
INFO: 2024-07-13 15:23:40,510: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:40,510: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:41,090: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu', 'nlpcoreteam/enmmlu']
INFO: 2024-07-13 15:23:41,091: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:41,091: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:42,369: llmtf.base.darumeru/PARus: Loading Dataset: 2.33s
INFO: 2024-07-13 15:23:43,206: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-13 15:23:43,207: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:43,207: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:45,405: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/use']
INFO: 2024-07-13 15:23:45,405: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:45,405: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:45,786: llmtf.base.darumeru/PARus: Processing Dataset: 3.42s
INFO: 2024-07-13 15:23:45,788: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-07-13 15:23:45,800: llmtf.base.darumeru/PARus: {'acc': 0.75}
INFO: 2024-07-13 15:23:45,801: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:45,801: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:47,436: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.23s
INFO: 2024-07-13 15:23:47,479: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_para_ru']
INFO: 2024-07-13 15:23:47,479: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:47,479: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:47,668: llmtf.base.darumeru/ruMMLU: Loading Dataset: 7.16s
INFO: 2024-07-13 15:23:47,812: llmtf.base.darumeru/RCB: Loading Dataset: 2.01s
INFO: 2024-07-13 15:23:49,390: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_en', 'darumeru/cp_para_en']
INFO: 2024-07-13 15:23:49,390: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:49,390: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:49,768: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 2.29s
INFO: 2024-07-13 15:23:51,703: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 2.31s
INFO: 2024-07-13 15:23:53,877: llmtf.base.darumeru/RCB: Processing Dataset: 6.06s
INFO: 2024-07-13 15:23:53,892: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-07-13 15:23:53,898: llmtf.base.darumeru/RCB: {'acc': 0.5227272727272727, 'f1_macro': 0.4428418803418803}
INFO: 2024-07-13 15:23:53,899: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:53,899: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:56,231: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 2.33s
INFO: 2024-07-13 15:24:00,471: llmtf.base.darumeru/MultiQ: Loading Dataset: 15.07s
INFO: 2024-07-13 15:24:35,821: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 39.59s
INFO: 2024-07-13 15:24:35,822: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-07-13 15:24:35,835: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7323883161512027, 'f1_macro': 0.7329226353930633}
INFO: 2024-07-13 15:24:35,842: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:24:35,842: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:24:40,146: llmtf.base.darumeru/ruTiE: Loading Dataset: 4.30s
INFO: 2024-07-13 15:26:07,845: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 146.75s
INFO: 2024-07-13 15:26:38,965: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 167.26s
INFO: 2024-07-13 15:26:38,982: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-07-13 15:26:38,999: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 3.895358639670925, 'len': 0.9974420397589191, 'lcs': 0.9801922792969053}
INFO: 2024-07-13 15:26:39,001: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:26:39,001: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:26:41,305: llmtf.base.darumeru/cp_para_en: Loading Dataset: 2.30s
INFO: 2024-07-13 15:27:49,425: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 239.65s
INFO: 2024-07-13 15:27:49,442: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-07-13 15:27:49,446: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.3734936335344794, 'len': 0.9922334558022529, 'lcs': 0.9153760193869099}
INFO: 2024-07-13 15:27:49,448: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:27:49,448: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:27:51,495: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.05s
INFO: 2024-07-13 15:28:59,772: llmtf.base.darumeru/ruTiE: Processing Dataset: 259.62s
INFO: 2024-07-13 15:28:59,774: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE:
INFO: 2024-07-13 15:28:59,815: llmtf.base.darumeru/ruTiE: {'acc': 0.5372093023255814}
INFO: 2024-07-13 15:28:59,818: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:28:59,819: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:29:01,873: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.05s
INFO: 2024-07-13 15:29:04,102: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.21s
INFO: 2024-07-13 15:29:04,104: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-07-13 15:29:04,109: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8761904761904762, 'f1_macro': 0.8744880624959789}
INFO: 2024-07-13 15:29:04,110: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:29:04,110: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:29:06,342: llmtf.base.darumeru/RWSD: Loading Dataset: 2.23s
INFO: 2024-07-13 15:29:13,292: llmtf.base.darumeru/RWSD: Processing Dataset: 6.95s
INFO: 2024-07-13 15:29:13,294: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-07-13 15:29:13,298: llmtf.base.darumeru/RWSD: {'acc': 0.5392156862745098}
INFO: 2024-07-13 15:29:13,299: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:29:13,299: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:29:16,878: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 3.58s
INFO: 2024-07-13 15:30:03,574: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 46.69s
INFO: 2024-07-13 15:30:03,578: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-07-13 15:30:03,605: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7366343738787227, 'mcc': 0.34075509260259335}
INFO: 2024-07-13 15:30:03,609: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:30:03,635: llmtf.base.evaluator: 
mean	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree	russiannlp/rucola_custom
0.716	0.750	0.483	0.539	0.997	0.992	0.733	0.537	0.875	0.539
INFO: 2024-07-13 15:30:38,848: llmtf.base.darumeru/cp_para_en: Processing Dataset: 237.54s
INFO: 2024-07-13 15:30:38,850: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-07-13 15:30:38,867: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 3.964633931969539, 'len': 0.9963390331388082, 'lcs': 0.873438038674546}
INFO: 2024-07-13 15:30:38,867: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:30:38,877: llmtf.base.evaluator: 
mean	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/cp_para_en	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree	russiannlp/rucola_custom
0.732	0.750	0.483	0.539	0.873	0.997	0.992	0.733	0.537	0.875	0.539
INFO: 2024-07-13 15:30:58,082: llmtf.base.darumeru/ruMMLU: Processing Dataset: 430.41s
INFO: 2024-07-13 15:30:58,085: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-07-13 15:30:58,123: llmtf.base.darumeru/ruMMLU: {'acc': 0.4818916492068243}
INFO: 2024-07-13 15:30:58,162: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:30:58,163: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:31:08,009: llmtf.base.daru/treewayextractive: Loading Dataset: 9.84s
INFO: 2024-07-13 15:31:29,151: llmtf.base.darumeru/MultiQ: Processing Dataset: 448.68s
INFO: 2024-07-13 15:31:29,154: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-07-13 15:31:29,174: llmtf.base.darumeru/MultiQ: {'f1': 0.2909161781249439, 'em': 0.16634799235181644}
INFO: 2024-07-13 15:31:29,179: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:31:29,179: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:31:32,357: llmtf.base.darumeru/USE: Loading Dataset: 3.18s
INFO: 2024-07-13 15:33:20,030: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 432.18s
INFO: 2024-07-13 15:33:20,032: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-07-13 15:33:20,078: llmtf.base.nlpcoreteam/ruMMLU:                                        metric
subject                                      
abstract_algebra                     0.360000
anatomy                              0.414815
astronomy                            0.565789
business_ethics                      0.550000
clinical_knowledge                   0.528302
college_biology                      0.486111
college_chemistry                    0.450000
college_computer_science             0.510000
college_mathematics                  0.390000
college_medicine                     0.508671
college_physics                      0.254902
computer_security                    0.580000
conceptual_physics                   0.434043
econometrics                         0.359649
electrical_engineering               0.489655
elementary_mathematics               0.370370
formal_logic                         0.325397
global_facts                         0.280000
high_school_biology                  0.590323
high_school_chemistry                0.374384
high_school_computer_science         0.600000
high_school_european_history         0.666667
high_school_geography                0.666667
high_school_government_and_politics  0.580311
high_school_macroeconomics           0.438462
high_school_mathematics              0.359259
high_school_microeconomics           0.478992
high_school_physics                  0.397351
high_school_psychology               0.625688
high_school_statistics               0.467593
high_school_us_history               0.681373
high_school_world_history            0.713080
human_aging                          0.515695
human_sexuality                      0.557252
international_law                    0.652893
jurisprudence                        0.527778
logical_fallacies                    0.423313
machine_learning                     0.321429
management                           0.631068
marketing                            0.675214
medical_genetics                     0.530000
miscellaneous                        0.624521
moral_disputes                       0.528902
moral_scenarios                      0.231285
nutrition                            0.555556
philosophy                           0.482315
prehistory                           0.490741
professional_accounting              0.382979
professional_law                     0.367014
professional_medicine                0.477941
professional_psychology              0.439542
public_relations                     0.554545
security_studies                     0.612245
sociology                            0.686567
us_foreign_policy                    0.730000
virology                             0.487952
world_religions                      0.690058
INFO: 2024-07-13 15:33:20,086: llmtf.base.nlpcoreteam/ruMMLU:                                    metric
subject                                  
STEM                             0.444512
humanities                       0.521601
other (business, health, misc.)  0.511622
social sciences                  0.560827
INFO: 2024-07-13 15:33:20,093: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5096404231777287}
INFO: 2024-07-13 15:33:20,128: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:33:20,129: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:34:10,886: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 379.39s
INFO: 2024-07-13 15:34:10,904: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-07-13 15:34:10,908: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.469064241004292, 'len': 0.9929789601006123, 'lcs': 0.843045621556421}
INFO: 2024-07-13 15:34:10,908: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:34:10,938: llmtf.base.evaluator: 
mean	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.670	0.229	0.750	0.483	0.539	0.873	0.843	0.997	0.992	0.482	0.733	0.537	0.875	0.510	0.539
INFO: 2024-07-13 15:35:15,961: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 115.83s
INFO: 2024-07-13 15:36:14,977: llmtf.base.darumeru/USE: Processing Dataset: 282.62s
INFO: 2024-07-13 15:36:14,978: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-07-13 15:36:14,982: llmtf.base.darumeru/USE: {'grade_norm': 0.06568627450980391}
INFO: 2024-07-13 15:36:14,985: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:36:14,994: llmtf.base.evaluator: 
mean	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.630	0.229	0.750	0.483	0.539	0.066	0.873	0.843	0.997	0.992	0.482	0.733	0.537	0.875	0.510	0.539
INFO: 2024-07-13 15:39:22,754: llmtf.base.daru/treewayextractive: Processing Dataset: 494.73s
INFO: 2024-07-13 15:39:22,774: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-07-13 15:39:23,003: llmtf.base.daru/treewayextractive: {'r-prec': 0.40769011544011546}
INFO: 2024-07-13 15:39:23,057: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:39:23,070: llmtf.base.evaluator: 
mean	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.616	0.408	0.229	0.750	0.483	0.539	0.066	0.873	0.843	0.997	0.992	0.482	0.733	0.537	0.875	0.510	0.539
INFO: 2024-07-13 15:40:06,403: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 290.44s
INFO: 2024-07-13 15:40:06,407: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-07-13 15:40:06,452: llmtf.base.nlpcoreteam/enMMLU:                                        metric
subject                                      
abstract_algebra                     0.340000
anatomy                              0.614815
astronomy                            0.651316
business_ethics                      0.650000
clinical_knowledge                   0.675472
college_biology                      0.729167
college_chemistry                    0.470000
college_computer_science             0.550000
college_mathematics                  0.380000
college_medicine                     0.641618
college_physics                      0.352941
computer_security                    0.700000
conceptual_physics                   0.553191
econometrics                         0.456140
electrical_engineering               0.593103
elementary_mathematics               0.412698
formal_logic                         0.476190
global_facts                         0.310000
high_school_biology                  0.790323
high_school_chemistry                0.458128
high_school_computer_science         0.670000
high_school_european_history         0.769697
high_school_geography                0.792929
high_school_government_and_politics  0.880829
high_school_macroeconomics           0.612821
high_school_mathematics              0.344444
high_school_microeconomics           0.642857
high_school_physics                  0.337748
high_school_psychology               0.823853
high_school_statistics               0.467593
high_school_us_history               0.794118
high_school_world_history            0.814346
human_aging                          0.721973
human_sexuality                      0.732824
international_law                    0.735537
jurisprudence                        0.722222
logical_fallacies                    0.760736
machine_learning                     0.455357
management                           0.776699
marketing                            0.858974
medical_genetics                     0.690000
miscellaneous                        0.830140
moral_disputes                       0.679191
moral_scenarios                      0.232402
nutrition                            0.709150
philosophy                           0.655949
prehistory                           0.672840
professional_accounting              0.460993
professional_law                     0.468057
professional_medicine                0.709559
professional_psychology              0.619281
public_relations                     0.645455
security_studies                     0.673469
sociology                            0.850746
us_foreign_policy                    0.850000
virology                             0.512048
world_religions                      0.853801
INFO: 2024-07-13 15:40:06,459: llmtf.base.nlpcoreteam/enMMLU:                                    metric
subject                                  
STEM                             0.514223
humanities                       0.664237
other (business, health, misc.)  0.654389
social sciences                  0.715100
INFO: 2024-07-13 15:40:06,466: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6369873391991915}
INFO: 2024-07-13 15:40:06,497: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:40:06,508: llmtf.base.evaluator: 
mean	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.617	0.408	0.229	0.750	0.483	0.539	0.066	0.873	0.843	0.997	0.992	0.482	0.733	0.537	0.875	0.637	0.510	0.539
INFO: 2024-07-13 15:44:34,352: llmtf.base.daru/treewayabstractive: Processing Dataset: 1246.91s
INFO: 2024-07-13 15:44:34,354: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-07-13 15:44:34,373: llmtf.base.daru/treewayabstractive: {'rouge1': 0.34479017541198337, 'rouge2': 0.12451437402782907}
INFO: 2024-07-13 15:44:34,376: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:44:34,403: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.596	0.235	0.408	0.229	0.750	0.483	0.539	0.066	0.873	0.843	0.997	0.992	0.482	0.733	0.537	0.875	0.637	0.510	0.539