File size: 20,701 Bytes
4fec6b2 ae34f4f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 | INFO: 2024-07-13 14:29:23,827: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-07-13 14:29:23,828: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:23,828: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:23,892: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-07-13 14:29:23,896: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:23,896: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:24,151: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-07-13 14:29:24,154: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:24,154: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:24,345: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-07-13 14:29:24,346: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:24,346: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:25,729: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-13 14:29:25,731: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:25,731: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:27,678: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-07-13 14:29:27,678: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:27,678: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:29,484: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-07-13 14:29:29,484: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:29,484: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:33,887: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.40s
INFO: 2024-07-13 14:29:39,828: llmtf.base.daru/treewayextractive: Loading Dataset: 12.15s
INFO: 2024-07-13 14:29:42,885: llmtf.base.daru/treewayabstractive: Loading Dataset: 17.15s
INFO: 2024-07-13 14:29:45,765: llmtf.base.darumeru/MultiQ: Loading Dataset: 21.61s
INFO: 2024-07-13 14:30:53,478: llmtf.base.darumeru/ruMMLU: Loading Dataset: 89.58s
INFO: 2024-07-13 14:32:57,360: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 213.01s
INFO: 2024-07-13 14:33:24,939: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 231.05s
INFO: 2024-07-13 14:33:24,943: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-07-13 14:33:24,962: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.8278810271761903, 'len': 0.9977030047832767, 'lcs': 0.9847970468194288}
INFO: 2024-07-13 14:33:24,975: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:33:24,975: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:33:28,742: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.77s
INFO: 2024-07-13 14:33:45,284: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 261.46s
INFO: 2024-07-13 14:36:13,193: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 164.45s
INFO: 2024-07-13 14:36:13,226: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-07-13 14:36:13,244: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.424509793356442, 'len': 0.9995781033988959, 'lcs': 0.994055994028679}
INFO: 2024-07-13 14:36:13,246: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:36:13,246: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:36:17,469: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 4.22s
INFO: 2024-07-13 14:36:19,338: llmtf.base.daru/treewayextractive: Processing Dataset: 399.51s
INFO: 2024-07-13 14:36:19,340: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-07-13 14:36:19,799: llmtf.base.daru/treewayextractive: {'r-prec': 0.39738621933621937}
INFO: 2024-07-13 14:36:19,844: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:36:19,850: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/cp_sent_en darumeru/cp_sent_ru
0.798 0.397 1.000 0.998
INFO: 2024-07-13 14:36:56,298: llmtf.base.darumeru/MultiQ: Processing Dataset: 430.53s
INFO: 2024-07-13 14:36:56,300: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-07-13 14:36:56,305: llmtf.base.darumeru/MultiQ: {'f1': 0.48425376524800046, 'em': 0.3795411089866157}
INFO: 2024-07-13 14:36:56,316: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:36:56,317: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:37:00,009: llmtf.base.darumeru/PARus: Loading Dataset: 3.69s
INFO: 2024-07-13 14:37:13,006: llmtf.base.darumeru/PARus: Processing Dataset: 13.00s
INFO: 2024-07-13 14:37:13,009: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-07-13 14:37:13,021: llmtf.base.darumeru/PARus: {'acc': 0.85}
INFO: 2024-07-13 14:37:13,023: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:37:13,023: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:37:16,908: llmtf.base.darumeru/RCB: Loading Dataset: 3.88s
INFO: 2024-07-13 14:37:39,047: llmtf.base.darumeru/RCB: Processing Dataset: 22.12s
INFO: 2024-07-13 14:37:39,050: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-07-13 14:37:39,056: llmtf.base.darumeru/RCB: {'acc': 0.5272727272727272, 'f1_macro': 0.43555405633327715}
INFO: 2024-07-13 14:37:39,058: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:37:39,059: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:37:53,697: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 14.64s
INFO: 2024-07-13 14:40:08,010: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 134.31s
INFO: 2024-07-13 14:40:08,013: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-07-13 14:40:08,027: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7680412371134021, 'f1_macro': 0.7680185950653384}
INFO: 2024-07-13 14:40:08,043: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:40:08,043: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:40:15,245: llmtf.base.darumeru/ruTiE: Loading Dataset: 7.20s
INFO: 2024-07-13 14:41:10,015: llmtf.base.daru/treewayabstractive: Processing Dataset: 687.13s
INFO: 2024-07-13 14:41:10,017: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-07-13 14:41:10,037: llmtf.base.daru/treewayabstractive: {'rouge1': 0.360975899636531, 'rouge2': 0.1330737491255763}
INFO: 2024-07-13 14:41:10,042: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:41:10,069: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA
0.647 0.247 0.397 0.432 0.850 0.481 1.000 0.998 0.768
INFO: 2024-07-13 14:41:58,403: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 340.93s
INFO: 2024-07-13 14:41:58,453: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-07-13 14:41:58,457: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.9697516062295746, 'len': 0.9984044778480231, 'lcs': 0.9773285044731846}
INFO: 2024-07-13 14:41:58,459: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:41:58,459: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:42:02,784: llmtf.base.darumeru/cp_para_en: Loading Dataset: 4.32s
INFO: 2024-07-13 14:44:45,025: llmtf.base.darumeru/ruTiE: Processing Dataset: 269.78s
INFO: 2024-07-13 14:44:45,027: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE:
INFO: 2024-07-13 14:44:45,073: llmtf.base.darumeru/ruTiE: {'acc': 0.3511627906976744}
INFO: 2024-07-13 14:44:45,076: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:44:45,077: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:44:47,875: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.80s
INFO: 2024-07-13 14:44:55,693: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 7.80s
INFO: 2024-07-13 14:44:55,695: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-07-13 14:44:55,700: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8761904761904762, 'f1_macro': 0.8733631471423589}
INFO: 2024-07-13 14:44:55,701: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:44:55,701: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:45:00,084: llmtf.base.darumeru/RWSD: Loading Dataset: 4.38s
INFO: 2024-07-13 14:45:19,405: llmtf.base.darumeru/RWSD: Processing Dataset: 19.32s
INFO: 2024-07-13 14:45:19,421: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-07-13 14:45:19,425: llmtf.base.darumeru/RWSD: {'acc': 0.5441176470588235}
INFO: 2024-07-13 14:45:19,427: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:45:19,427: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:45:34,612: llmtf.base.darumeru/USE: Loading Dataset: 15.18s
INFO: 2024-07-13 14:46:14,635: llmtf.base.darumeru/cp_para_en: Processing Dataset: 251.85s
INFO: 2024-07-13 14:46:14,638: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-07-13 14:46:14,657: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.485777628533072, 'len': 0.999455845790753, 'lcs': 0.9727731185644367}
INFO: 2024-07-13 14:46:14,658: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:46:14,684: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree
0.684 0.247 0.397 0.432 0.850 0.481 0.544 0.973 0.977 1.000 0.998 0.768 0.351 0.875
INFO: 2024-07-13 14:48:58,982: llmtf.base.darumeru/USE: Processing Dataset: 204.37s
INFO: 2024-07-13 14:48:58,999: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-07-13 14:48:59,004: llmtf.base.darumeru/USE: {'grade_norm': 0.18725490196078434}
INFO: 2024-07-13 14:48:59,010: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:48:59,010: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:49:19,451: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 20.44s
INFO: 2024-07-13 14:50:14,250: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 1036.87s
INFO: 2024-07-13 14:50:14,255: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-07-13 14:50:14,302: llmtf.base.nlpcoreteam/enMMLU: metric
subject
abstract_algebra 0.350000
anatomy 0.696296
astronomy 0.730263
business_ethics 0.700000
clinical_knowledge 0.754717
college_biology 0.812500
college_chemistry 0.500000
college_computer_science 0.590000
college_mathematics 0.330000
college_medicine 0.670520
college_physics 0.470588
computer_security 0.780000
conceptual_physics 0.570213
econometrics 0.561404
electrical_engineering 0.634483
elementary_mathematics 0.439153
formal_logic 0.507937
global_facts 0.430000
high_school_biology 0.800000
high_school_chemistry 0.517241
high_school_computer_science 0.760000
high_school_european_history 0.787879
high_school_geography 0.843434
high_school_government_and_politics 0.922280
high_school_macroeconomics 0.671795
high_school_mathematics 0.381481
high_school_microeconomics 0.764706
high_school_physics 0.417219
high_school_psychology 0.847706
high_school_statistics 0.537037
high_school_us_history 0.833333
high_school_world_history 0.835443
human_aging 0.730942
human_sexuality 0.801527
international_law 0.818182
jurisprudence 0.759259
logical_fallacies 0.766871
machine_learning 0.544643
management 0.825243
marketing 0.901709
medical_genetics 0.830000
miscellaneous 0.842912
moral_disputes 0.751445
moral_scenarios 0.497207
nutrition 0.754902
philosophy 0.720257
prehistory 0.753086
professional_accounting 0.556738
professional_law 0.483051
professional_medicine 0.742647
professional_psychology 0.717320
public_relations 0.690909
security_studies 0.722449
sociology 0.840796
us_foreign_policy 0.840000
virology 0.512048
world_religions 0.818713
INFO: 2024-07-13 14:50:14,310: llmtf.base.nlpcoreteam/enMMLU: metric
subject
STEM 0.564712
humanities 0.717897
other (business, health, misc.) 0.710620
social sciences 0.768694
INFO: 2024-07-13 14:50:14,318: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6904807286717012}
INFO: 2024-07-13 14:50:14,385: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:50:14,399: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU
0.651 0.247 0.397 0.432 0.850 0.481 0.544 0.187 0.973 0.977 1.000 0.998 0.768 0.351 0.875 0.690
INFO: 2024-07-13 14:51:55,784: llmtf.base.darumeru/ruMMLU: Processing Dataset: 1262.30s
INFO: 2024-07-13 14:51:55,788: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-07-13 14:51:55,799: llmtf.base.darumeru/ruMMLU: {'acc': 0.5138182180983737}
INFO: 2024-07-13 14:51:55,888: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:51:55,906: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU
0.643 0.247 0.397 0.432 0.850 0.481 0.544 0.187 0.973 0.977 1.000 0.998 0.514 0.768 0.351 0.875 0.690
INFO: 2024-07-13 14:52:18,001: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 178.55s
INFO: 2024-07-13 14:52:18,002: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-07-13 14:52:18,035: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7115177610333692, 'mcc': 0.3362227509262135}
INFO: 2024-07-13 14:52:18,046: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:52:18,077: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom
0.636 0.247 0.397 0.432 0.850 0.481 0.544 0.187 0.973 0.977 1.000 0.998 0.514 0.768 0.351 0.875 0.690 0.524
INFO: 2024-07-13 14:59:07,852: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 1522.57s
INFO: 2024-07-13 14:59:07,871: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-07-13 14:59:07,917: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
abstract_algebra 0.330000
anatomy 0.511111
astronomy 0.651316
business_ethics 0.680000
clinical_knowledge 0.588679
college_biology 0.534722
college_chemistry 0.480000
college_computer_science 0.520000
college_mathematics 0.350000
college_medicine 0.549133
college_physics 0.352941
computer_security 0.720000
conceptual_physics 0.540426
econometrics 0.438596
electrical_engineering 0.572414
elementary_mathematics 0.417989
formal_logic 0.396825
global_facts 0.370000
high_school_biology 0.664516
high_school_chemistry 0.394089
high_school_computer_science 0.690000
high_school_european_history 0.763636
high_school_geography 0.666667
high_school_government_and_politics 0.647668
high_school_macroeconomics 0.553846
high_school_mathematics 0.348148
high_school_microeconomics 0.546218
high_school_physics 0.410596
high_school_psychology 0.682569
high_school_statistics 0.449074
high_school_us_history 0.691176
high_school_world_history 0.734177
human_aging 0.538117
human_sexuality 0.641221
international_law 0.743802
jurisprudence 0.657407
logical_fallacies 0.558282
machine_learning 0.401786
management 0.689320
marketing 0.730769
medical_genetics 0.670000
miscellaneous 0.650064
moral_disputes 0.630058
moral_scenarios 0.382123
nutrition 0.604575
philosophy 0.614148
prehistory 0.574074
professional_accounting 0.397163
professional_law 0.397001
professional_medicine 0.514706
professional_psychology 0.514706
public_relations 0.609091
security_studies 0.657143
sociology 0.676617
us_foreign_policy 0.740000
virology 0.457831
world_religions 0.695906
INFO: 2024-07-13 14:59:07,924: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
STEM 0.490445
humanities 0.602971
other (business, health, misc.) 0.567962
social sciences 0.614529
INFO: 2024-07-13 14:59:07,947: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5689766403256171}
INFO: 2024-07-13 14:59:08,029: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:59:08,049: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.632 0.247 0.397 0.432 0.850 0.481 0.544 0.187 0.973 0.977 1.000 0.998 0.514 0.768 0.351 0.875 0.690 0.569 0.524
|