File size: 20,221 Bytes
4fec6b2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
INFO: 2024-07-13 14:29:01,210: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-07-13 14:29:01,211: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:01,211: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:01,212: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-07-13 14:29:01,212: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:01,212: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:01,379: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-07-13 14:29:01,379: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:01,380: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:01,969: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-07-13 14:29:01,970: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:01,970: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:04,129: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-13 14:29:04,130: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:04,130: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:05,366: llmtf.base.darumeru/MultiQ: Loading Dataset: 4.15s
INFO: 2024-07-13 14:29:05,855: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-07-13 14:29:05,855: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:05,855: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:07,422: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-07-13 14:29:07,422: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:07,422: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:08,720: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.59s
INFO: 2024-07-13 14:29:09,722: llmtf.base.darumeru/ruMMLU: Loading Dataset: 8.51s
INFO: 2024-07-13 14:29:09,808: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 2.39s
INFO: 2024-07-13 14:29:18,031: llmtf.base.daru/treewayextractive: Loading Dataset: 12.17s
INFO: 2024-07-13 14:31:16,783: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 134.81s
INFO: 2024-07-13 14:31:18,578: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 137.20s
INFO: 2024-07-13 14:32:42,801: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 212.99s
INFO: 2024-07-13 14:32:42,818: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-07-13 14:32:42,822: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.8294160005417113, 'len': 0.993227090420785, 'lcs': 0.9520454300336516}
INFO: 2024-07-13 14:32:42,824: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:32:42,824: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:32:45,506: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 2.68s
INFO: 2024-07-13 14:35:04,924: llmtf.base.darumeru/ruMMLU: Processing Dataset: 355.20s
INFO: 2024-07-13 14:35:04,929: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-07-13 14:35:04,937: llmtf.base.darumeru/ruMMLU: {'acc': 0.5046393295420533}
INFO: 2024-07-13 14:35:04,978: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:35:04,984: llmtf.base.evaluator: 
mean	darumeru/cp_sent_ru	darumeru/ruMMLU
0.749	0.993	0.505
INFO: 2024-07-13 14:35:16,448: llmtf.base.darumeru/MultiQ: Processing Dataset: 371.08s
INFO: 2024-07-13 14:35:16,452: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-07-13 14:35:16,456: llmtf.base.darumeru/MultiQ: {'f1': 0.3370324579707962, 'em': 0.21032504780114722}
INFO: 2024-07-13 14:35:16,460: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:35:16,461: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:35:19,048: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 153.54s
INFO: 2024-07-13 14:35:19,050: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-07-13 14:35:19,083: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.424907714143083, 'len': 0.9996416196590585, 'lcs': 0.995460815828734}
INFO: 2024-07-13 14:35:19,084: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:35:19,085: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:35:19,188: llmtf.base.darumeru/PARus: Loading Dataset: 2.73s
INFO: 2024-07-13 14:35:20,825: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 1.74s
INFO: 2024-07-13 14:35:22,119: llmtf.base.darumeru/PARus: Processing Dataset: 2.93s
INFO: 2024-07-13 14:35:22,121: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-07-13 14:35:22,164: llmtf.base.darumeru/PARus: {'acc': 0.64}
INFO: 2024-07-13 14:35:22,165: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:35:22,165: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:35:24,196: llmtf.base.darumeru/RCB: Loading Dataset: 2.03s
INFO: 2024-07-13 14:35:29,614: llmtf.base.darumeru/RCB: Processing Dataset: 5.41s
INFO: 2024-07-13 14:35:29,616: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-07-13 14:35:29,622: llmtf.base.darumeru/RCB: {'acc': 0.4863636363636364, 'f1_macro': 0.4094575374734713}
INFO: 2024-07-13 14:35:29,624: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:35:29,624: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:35:32,722: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 3.10s
INFO: 2024-07-13 14:35:40,173: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 263.39s
INFO: 2024-07-13 14:35:40,174: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-07-13 14:35:40,219: llmtf.base.nlpcoreteam/enMMLU:                                        metric
subject                                      
abstract_algebra                     0.340000
anatomy                              0.718519
astronomy                            0.730263
business_ethics                      0.720000
clinical_knowledge                   0.735849
college_biology                      0.791667
college_chemistry                    0.460000
college_computer_science             0.600000
college_mathematics                  0.310000
college_medicine                     0.647399
college_physics                      0.480392
computer_security                    0.760000
conceptual_physics                   0.570213
econometrics                         0.517544
electrical_engineering               0.606897
elementary_mathematics               0.468254
formal_logic                         0.523810
global_facts                         0.410000
high_school_biology                  0.809677
high_school_chemistry                0.541872
high_school_computer_science         0.730000
high_school_european_history         0.733333
high_school_geography                0.823232
high_school_government_and_politics  0.865285
high_school_macroeconomics           0.630769
high_school_mathematics              0.370370
high_school_microeconomics           0.752101
high_school_physics                  0.410596
high_school_psychology               0.855046
high_school_statistics               0.532407
high_school_us_history               0.828431
high_school_world_history            0.839662
human_aging                          0.721973
human_sexuality                      0.778626
international_law                    0.760331
jurisprudence                        0.796296
logical_fallacies                    0.779141
machine_learning                     0.446429
management                           0.796117
marketing                            0.893162
medical_genetics                     0.780000
miscellaneous                        0.840358
moral_disputes                       0.696532
moral_scenarios                      0.293855
nutrition                            0.764706
philosophy                           0.720257
prehistory                           0.706790
professional_accounting              0.542553
professional_law                     0.481747
professional_medicine                0.731618
professional_psychology              0.674837
public_relations                     0.663636
security_studies                     0.714286
sociology                            0.825871
us_foreign_policy                    0.890000
virology                             0.487952
world_religions                      0.824561
INFO: 2024-07-13 14:35:40,227: llmtf.base.nlpcoreteam/enMMLU:                                    metric
subject                                  
STEM                             0.553280
humanities                       0.691134
other (business, health, misc.)  0.699300
social sciences                  0.749269
INFO: 2024-07-13 14:35:40,234: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6732459770237078}
INFO: 2024-07-13 14:35:40,267: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:35:40,273: llmtf.base.evaluator: 
mean	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	nlpcoreteam/enMMLU
0.647	0.274	0.640	0.448	1.000	0.993	0.505	0.673
INFO: 2024-07-13 14:35:54,003: llmtf.base.daru/treewayextractive: Processing Dataset: 395.96s
INFO: 2024-07-13 14:35:54,004: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-07-13 14:35:54,481: llmtf.base.daru/treewayextractive: {'r-prec': 0.39738621933621937}
INFO: 2024-07-13 14:35:54,526: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:35:54,533: llmtf.base.evaluator: 
mean	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	nlpcoreteam/enMMLU
0.616	0.397	0.274	0.640	0.448	1.000	0.993	0.505	0.673
INFO: 2024-07-13 14:36:08,587: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 35.86s
INFO: 2024-07-13 14:36:08,588: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-07-13 14:36:08,601: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.6907216494845361, 'f1_macro': 0.6911297261861948}
INFO: 2024-07-13 14:36:08,608: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:36:08,608: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:36:16,304: llmtf.base.darumeru/ruTiE: Loading Dataset: 7.69s
INFO: 2024-07-13 14:37:20,843: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 362.26s
INFO: 2024-07-13 14:37:20,846: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-07-13 14:37:20,893: llmtf.base.nlpcoreteam/ruMMLU:                                        metric
subject                                      
abstract_algebra                     0.300000
anatomy                              0.459259
astronomy                            0.651316
business_ethics                      0.600000
clinical_knowledge                   0.566038
college_biology                      0.541667
college_chemistry                    0.400000
college_computer_science             0.460000
college_mathematics                  0.320000
college_medicine                     0.502890
college_physics                      0.352941
computer_security                    0.570000
conceptual_physics                   0.485106
econometrics                         0.350877
electrical_engineering               0.551724
elementary_mathematics               0.410053
formal_logic                         0.380952
global_facts                         0.350000
high_school_biology                  0.638710
high_school_chemistry                0.423645
high_school_computer_science         0.610000
high_school_european_history         0.715152
high_school_geography                0.661616
high_school_government_and_politics  0.595855
high_school_macroeconomics           0.510256
high_school_mathematics              0.337037
high_school_microeconomics           0.495798
high_school_physics                  0.344371
high_school_psychology               0.669725
high_school_statistics               0.467593
high_school_us_history               0.651961
high_school_world_history            0.713080
human_aging                          0.551570
human_sexuality                      0.656489
international_law                    0.710744
jurisprudence                        0.592593
logical_fallacies                    0.527607
machine_learning                     0.357143
management                           0.669903
marketing                            0.705128
medical_genetics                     0.560000
miscellaneous                        0.646232
moral_disputes                       0.560694
moral_scenarios                      0.249162
nutrition                            0.598039
philosophy                           0.565916
prehistory                           0.558642
professional_accounting              0.386525
professional_law                     0.359192
professional_medicine                0.518382
professional_psychology              0.485294
public_relations                     0.572727
security_studies                     0.620408
sociology                            0.701493
us_foreign_policy                    0.750000
virology                             0.415663
world_religions                      0.695906
INFO: 2024-07-13 14:37:20,902: llmtf.base.nlpcoreteam/ruMMLU:                                    metric
subject                                  
STEM                             0.456739
humanities                       0.560123
other (business, health, misc.)  0.537831
social sciences                  0.589212
INFO: 2024-07-13 14:37:20,909: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5359761297506582}
INFO: 2024-07-13 14:37:20,942: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:37:21,003: llmtf.base.evaluator: 
mean	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU
0.616	0.397	0.274	0.640	0.448	1.000	0.993	0.505	0.691	0.673	0.536
INFO: 2024-07-13 14:38:13,255: llmtf.base.daru/treewayabstractive: Processing Dataset: 544.53s
INFO: 2024-07-13 14:38:13,256: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-07-13 14:38:13,260: llmtf.base.daru/treewayabstractive: {'rouge1': 0.35574041658645894, 'rouge2': 0.1282333481459036}
INFO: 2024-07-13 14:38:13,262: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:38:13,270: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU
0.582	0.242	0.397	0.274	0.640	0.448	1.000	0.993	0.505	0.691	0.673	0.536
INFO: 2024-07-13 14:40:26,872: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 306.04s
INFO: 2024-07-13 14:40:26,875: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-07-13 14:40:26,895: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.968660662438201, 'len': 0.9950114211220992, 'lcs': 0.9146147408713498}
INFO: 2024-07-13 14:40:26,896: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:40:26,896: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:40:28,747: llmtf.base.darumeru/cp_para_en: Loading Dataset: 1.85s
INFO: 2024-07-13 14:40:42,169: llmtf.base.darumeru/ruTiE: Processing Dataset: 265.86s
INFO: 2024-07-13 14:40:42,170: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE:
INFO: 2024-07-13 14:40:42,198: llmtf.base.darumeru/ruTiE: {'acc': 0.3511627906976744}
INFO: 2024-07-13 14:40:42,201: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:40:42,202: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:40:44,145: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 1.94s
INFO: 2024-07-13 14:40:46,061: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 1.92s
INFO: 2024-07-13 14:40:46,063: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-07-13 14:40:46,081: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8476190476190476, 'f1_macro': 0.8445201637796824}
INFO: 2024-07-13 14:40:46,082: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:40:46,082: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:40:48,101: llmtf.base.darumeru/RWSD: Loading Dataset: 2.02s
INFO: 2024-07-13 14:40:53,690: llmtf.base.darumeru/RWSD: Processing Dataset: 5.59s
INFO: 2024-07-13 14:40:53,692: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-07-13 14:40:53,696: llmtf.base.darumeru/RWSD: {'acc': 0.5490196078431373}
INFO: 2024-07-13 14:40:53,697: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:40:53,697: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:40:56,414: llmtf.base.darumeru/USE: Loading Dataset: 2.72s
INFO: 2024-07-13 14:44:03,848: llmtf.base.darumeru/cp_para_en: Processing Dataset: 215.10s
INFO: 2024-07-13 14:44:03,851: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-07-13 14:44:03,854: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.463140535341514, 'len': 0.9941296296409974, 'lcs': 0.955732821155511}
INFO: 2024-07-13 14:44:03,855: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:44:03,884: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU
0.626	0.242	0.397	0.274	0.640	0.448	0.549	0.956	0.915	1.000	0.993	0.505	0.691	0.351	0.846	0.673	0.536
INFO: 2024-07-13 14:45:47,572: llmtf.base.darumeru/USE: Processing Dataset: 291.16s
INFO: 2024-07-13 14:45:47,575: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-07-13 14:45:47,607: llmtf.base.darumeru/USE: {'grade_norm': 0.07941176470588233}
INFO: 2024-07-13 14:45:47,610: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:45:47,611: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:45:52,951: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 5.34s
INFO: 2024-07-13 14:46:34,251: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 41.30s
INFO: 2024-07-13 14:46:34,255: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-07-13 14:46:34,267: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7061356297093649, 'mcc': 0.2603067425656207}
INFO: 2024-07-13 14:46:34,271: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:46:34,283: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.588	0.242	0.397	0.274	0.640	0.448	0.549	0.079	0.956	0.915	1.000	0.993	0.505	0.691	0.351	0.846	0.673	0.536	0.483