File size: 23,258 Bytes
b43ce9f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
INFO: 2024-07-13 15:18:29,367: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-07-13 15:18:29,368: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:29,368: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:30,101: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-07-13 15:18:30,101: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:30,102: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:31,006: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-07-13 15:18:31,007: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:31,007: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:33,846: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-07-13 15:18:33,846: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:33,846: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:34,873: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-13 15:18:34,874: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:34,874: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:36,947: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-07-13 15:18:36,948: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:36,948: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:39,585: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-07-13 15:18:39,585: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:39,585: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:42,261: llmtf.base.darumeru/MultiQ: Loading Dataset: 12.89s
INFO: 2024-07-13 15:18:43,245: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 3.66s
INFO: 2024-07-13 15:18:43,377: llmtf.base.daru/treewayabstractive: Loading Dataset: 8.50s
INFO: 2024-07-13 15:18:44,950: llmtf.base.daru/treewayextractive: Loading Dataset: 8.00s
INFO: 2024-07-13 15:19:21,718: llmtf.base.darumeru/ruMMLU: Loading Dataset: 51.62s
INFO: 2024-07-13 15:23:45,855: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-07-13 15:23:45,858: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:45,858: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:46,328: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-07-13 15:23:46,329: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:46,329: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:48,239: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-07-13 15:23:48,240: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:48,240: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:50,172: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-07-13 15:23:50,172: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:50,172: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:52,594: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-13 15:23:52,594: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:52,594: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:53,731: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-07-13 15:23:53,732: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:53,732: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:55,589: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-07-13 15:23:55,589: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:55,589: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:58,285: llmtf.base.darumeru/MultiQ: Loading Dataset: 12.43s
INFO: 2024-07-13 15:23:59,075: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 3.49s
INFO: 2024-07-13 15:24:00,764: llmtf.base.daru/treewayabstractive: Loading Dataset: 8.17s
INFO: 2024-07-13 15:24:01,255: llmtf.base.daru/treewayextractive: Loading Dataset: 7.52s
INFO: 2024-07-13 15:24:37,276: llmtf.base.darumeru/ruMMLU: Loading Dataset: 50.95s
INFO: 2024-07-13 15:27:06,687: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 196.51s
INFO: 2024-07-13 15:27:15,808: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 207.57s
INFO: 2024-07-13 15:29:44,399: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 345.32s
INFO: 2024-07-13 15:29:44,403: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-07-13 15:29:44,407: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.3701923347659983, 'len': 0.9987691197336923, 'lcs': 0.9819406016228798}
INFO: 2024-07-13 15:29:44,410: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:29:44,410: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:29:47,896: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.48s
INFO: 2024-07-13 15:32:14,981: llmtf.base.daru/treewayextractive: Processing Dataset: 493.72s
INFO: 2024-07-13 15:32:14,987: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-07-13 15:32:15,227: llmtf.base.daru/treewayextractive: {'r-prec': 0.40769011544011546}
INFO: 2024-07-13 15:32:15,287: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:32:15,293: llmtf.base.evaluator: 
mean	daru/treewayextractive	darumeru/cp_sent_ru
0.703	0.408	0.999
INFO: 2024-07-13 15:33:08,688: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 200.79s
INFO: 2024-07-13 15:33:08,691: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-07-13 15:33:08,708: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 3.8994152226580563, 'len': 0.9995035620835028, 'lcs': 0.9936840637058483}
INFO: 2024-07-13 15:33:08,711: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:33:08,711: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:33:11,469: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.76s
INFO: 2024-07-13 15:33:32,789: llmtf.base.darumeru/MultiQ: Processing Dataset: 574.49s
INFO: 2024-07-13 15:33:32,791: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-07-13 15:33:32,796: llmtf.base.darumeru/MultiQ: {'f1': 0.5726350715356451, 'em': 0.5019120458891013}
INFO: 2024-07-13 15:33:32,807: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:33:32,808: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:33:35,547: llmtf.base.darumeru/PARus: Loading Dataset: 2.74s
INFO: 2024-07-13 15:33:51,177: llmtf.base.darumeru/PARus: Processing Dataset: 15.63s
INFO: 2024-07-13 15:33:51,179: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-07-13 15:33:51,191: llmtf.base.darumeru/PARus: {'acc': 0.83}
INFO: 2024-07-13 15:33:51,193: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:33:51,193: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:33:54,244: llmtf.base.darumeru/RCB: Loading Dataset: 3.05s
INFO: 2024-07-13 15:34:20,224: llmtf.base.darumeru/RCB: Processing Dataset: 25.98s
INFO: 2024-07-13 15:34:20,241: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-07-13 15:34:20,248: llmtf.base.darumeru/RCB: {'acc': 0.5181818181818182, 'f1_macro': 0.46564877615699873}
INFO: 2024-07-13 15:34:20,250: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:34:20,250: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:34:28,734: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 8.48s
INFO: 2024-07-13 15:37:02,786: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 154.05s
INFO: 2024-07-13 15:37:02,802: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-07-13 15:37:02,816: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7525773195876289, 'f1_macro': 0.7540227232789819}
INFO: 2024-07-13 15:37:02,832: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:37:02,832: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:37:07,215: llmtf.base.darumeru/ruTiE: Loading Dataset: 4.38s
INFO: 2024-07-13 15:41:29,256: llmtf.base.darumeru/ruTiE: Processing Dataset: 262.04s
INFO: 2024-07-13 15:41:29,260: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE:
INFO: 2024-07-13 15:41:29,289: llmtf.base.darumeru/ruTiE: {'acc': 0.5372093023255814}
INFO: 2024-07-13 15:41:29,292: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:41:29,292: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:41:32,242: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.95s
INFO: 2024-07-13 15:41:41,454: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 9.21s
INFO: 2024-07-13 15:41:41,471: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-07-13 15:41:41,493: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8857142857142857, 'f1_macro': 0.8846523292790873}
INFO: 2024-07-13 15:41:41,494: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:41:41,494: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:41:45,149: llmtf.base.darumeru/RWSD: Loading Dataset: 3.65s
INFO: 2024-07-13 15:42:09,254: llmtf.base.darumeru/RWSD: Processing Dataset: 24.10s
INFO: 2024-07-13 15:42:09,256: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-07-13 15:42:09,261: llmtf.base.darumeru/RWSD: {'acc': 0.6078431372549019}
INFO: 2024-07-13 15:42:09,263: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:42:09,263: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:42:16,716: llmtf.base.darumeru/USE: Loading Dataset: 7.45s
INFO: 2024-07-13 15:46:19,569: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 1152.88s
INFO: 2024-07-13 15:46:19,575: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-07-13 15:46:19,620: llmtf.base.nlpcoreteam/enMMLU:                                        metric
subject                                      
abstract_algebra                     0.330000
anatomy                              0.651852
astronomy                            0.671053
business_ethics                      0.650000
clinical_knowledge                   0.720755
college_biology                      0.770833
college_chemistry                    0.500000
college_computer_science             0.560000
college_mathematics                  0.400000
college_medicine                     0.676301
college_physics                      0.362745
computer_security                    0.760000
conceptual_physics                   0.578723
econometrics                         0.473684
electrical_engineering               0.551724
elementary_mathematics               0.394180
formal_logic                         0.492063
global_facts                         0.330000
high_school_biology                  0.770968
high_school_chemistry                0.487685
high_school_computer_science         0.690000
high_school_european_history         0.806061
high_school_geography                0.792929
high_school_government_and_politics  0.891192
high_school_macroeconomics           0.638462
high_school_mathematics              0.359259
high_school_microeconomics           0.655462
high_school_physics                  0.350993
high_school_psychology               0.834862
high_school_statistics               0.476852
high_school_us_history               0.823529
high_school_world_history            0.831224
human_aging                          0.717489
human_sexuality                      0.770992
international_law                    0.801653
jurisprudence                        0.750000
logical_fallacies                    0.797546
machine_learning                     0.508929
management                           0.844660
marketing                            0.880342
medical_genetics                     0.740000
miscellaneous                        0.826309
moral_disputes                       0.734104
moral_scenarios                      0.269274
nutrition                            0.725490
philosophy                           0.710611
prehistory                           0.762346
professional_accounting              0.475177
professional_law                     0.481747
professional_medicine                0.709559
professional_psychology              0.640523
public_relations                     0.654545
security_studies                     0.738776
sociology                            0.830846
us_foreign_policy                    0.850000
virology                             0.524096
world_religions                      0.818713
INFO: 2024-07-13 15:46:19,627: llmtf.base.nlpcoreteam/enMMLU:                                    metric
subject                                  
STEM                             0.529108
humanities                       0.698375
other (business, health, misc.)  0.676574
social sciences                  0.731023
INFO: 2024-07-13 15:46:19,635: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6587697555779514}
INFO: 2024-07-13 15:46:19,704: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:46:19,740: llmtf.base.evaluator: 
mean	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree	nlpcoreteam/enMMLU
0.701	0.408	0.537	0.830	0.492	0.608	1.000	0.999	0.753	0.537	0.885	0.659
INFO: 2024-07-13 15:46:23,553: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 792.08s
INFO: 2024-07-13 15:46:23,572: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-07-13 15:46:23,603: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.4704173225051846, 'len': 0.9993025871189104, 'lcs': 0.9552661852470385}
INFO: 2024-07-13 15:46:23,606: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:46:23,606: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:46:26,330: llmtf.base.darumeru/cp_para_en: Loading Dataset: 2.72s
INFO: 2024-07-13 15:48:32,771: llmtf.base.darumeru/USE: Processing Dataset: 376.05s
INFO: 2024-07-13 15:48:32,775: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-07-13 15:48:32,780: llmtf.base.darumeru/USE: {'grade_norm': 0.12352941176470587}
INFO: 2024-07-13 15:48:32,787: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:48:32,787: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:48:44,556: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 11.77s
INFO: 2024-07-13 15:50:07,016: llmtf.base.darumeru/ruMMLU: Processing Dataset: 1529.74s
INFO: 2024-07-13 15:50:07,019: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-07-13 15:50:07,028: llmtf.base.darumeru/ruMMLU: {'acc': 0.4868801755961289}
INFO: 2024-07-13 15:50:07,113: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:50:07,146: llmtf.base.evaluator: 
mean	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree	nlpcoreteam/enMMLU
0.662	0.408	0.537	0.830	0.492	0.608	0.124	0.955	1.000	0.999	0.487	0.753	0.537	0.885	0.659
INFO: 2024-07-13 15:52:21,515: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 216.96s
INFO: 2024-07-13 15:52:21,520: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-07-13 15:52:21,533: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7384284176533907, 'mcc': 0.3763427268436289}
INFO: 2024-07-13 15:52:21,545: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:52:21,562: llmtf.base.evaluator: 
mean	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree	nlpcoreteam/enMMLU	russiannlp/rucola_custom
0.655	0.408	0.537	0.830	0.492	0.608	0.124	0.955	1.000	0.999	0.487	0.753	0.537	0.885	0.659	0.557
INFO: 2024-07-13 15:54:48,357: llmtf.base.daru/treewayabstractive: Processing Dataset: 1847.59s
INFO: 2024-07-13 15:54:48,390: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-07-13 15:54:48,397: llmtf.base.daru/treewayabstractive: {'rouge1': 0.34956234095604516, 'rouge2': 0.13050451589110393}
INFO: 2024-07-13 15:54:48,402: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:54:48,429: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree	nlpcoreteam/enMMLU	russiannlp/rucola_custom
0.629	0.240	0.408	0.537	0.830	0.492	0.608	0.124	0.955	1.000	0.999	0.487	0.753	0.537	0.885	0.659	0.557
INFO: 2024-07-13 15:55:03,040: llmtf.base.darumeru/cp_para_en: Processing Dataset: 516.71s
INFO: 2024-07-13 15:55:03,042: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-07-13 15:55:03,046: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 3.960763996832381, 'len': 0.9995281850843424, 'lcs': 0.9811766452032213}
INFO: 2024-07-13 15:55:03,048: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:55:03,057: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree	nlpcoreteam/enMMLU	russiannlp/rucola_custom
0.650	0.240	0.408	0.537	0.830	0.492	0.608	0.124	0.981	0.955	1.000	0.999	0.487	0.753	0.537	0.885	0.659	0.557
INFO: 2024-07-13 15:57:18,212: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 1802.40s
INFO: 2024-07-13 15:57:18,228: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-07-13 15:57:18,274: llmtf.base.nlpcoreteam/ruMMLU:                                        metric
subject                                      
abstract_algebra                     0.280000
anatomy                              0.392593
astronomy                            0.565789
business_ethics                      0.560000
clinical_knowledge                   0.554717
college_biology                      0.465278
college_chemistry                    0.410000
college_computer_science             0.500000
college_mathematics                  0.360000
college_medicine                     0.554913
college_physics                      0.333333
computer_security                    0.590000
conceptual_physics                   0.468085
econometrics                         0.403509
electrical_engineering               0.503448
elementary_mathematics               0.367725
formal_logic                         0.365079
global_facts                         0.330000
high_school_biology                  0.619355
high_school_chemistry                0.399015
high_school_computer_science         0.640000
high_school_european_history         0.678788
high_school_geography                0.676768
high_school_government_and_politics  0.647668
high_school_macroeconomics           0.512821
high_school_mathematics              0.314815
high_school_microeconomics           0.533613
high_school_physics                  0.344371
high_school_psychology               0.651376
high_school_statistics               0.416667
high_school_us_history               0.720588
high_school_world_history            0.679325
human_aging                          0.520179
human_sexuality                      0.618321
international_law                    0.719008
jurisprudence                        0.601852
logical_fallacies                    0.509202
machine_learning                     0.464286
management                           0.669903
marketing                            0.735043
medical_genetics                     0.530000
miscellaneous                        0.605364
moral_disputes                       0.580925
moral_scenarios                      0.189944
nutrition                            0.611111
philosophy                           0.581994
prehistory                           0.524691
professional_accounting              0.397163
professional_law                     0.361147
professional_medicine                0.441176
professional_psychology              0.486928
public_relations                     0.545455
security_studies                     0.595918
sociology                            0.681592
us_foreign_policy                    0.690000
virology                             0.427711
world_religions                      0.748538
INFO: 2024-07-13 15:57:18,281: llmtf.base.nlpcoreteam/ruMMLU:                                    metric
subject                                  
STEM                             0.446787
humanities                       0.558545
other (business, health, misc.)  0.523562
social sciences                  0.586997
INFO: 2024-07-13 15:57:18,303: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5289728961247521}
INFO: 2024-07-13 15:57:18,385: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:57:18,616: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruTiE	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.643	0.240	0.408	0.537	0.830	0.492	0.608	0.124	0.981	0.955	1.000	0.999	0.487	0.753	0.537	0.885	0.659	0.529	0.557