File size: 23,385 Bytes
b43ce9f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 | INFO: 2024-07-13 15:18:25,981: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-07-13 15:18:25,995: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:25,996: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:26,240: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-07-13 15:18:26,240: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:26,240: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:27,990: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-07-13 15:18:27,991: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:27,991: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:29,333: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-07-13 15:18:29,333: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:29,333: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:29,480: llmtf.base.darumeru/MultiQ: Loading Dataset: 3.48s
INFO: 2024-07-13 15:18:30,985: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-13 15:18:30,985: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:30,985: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:33,199: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-07-13 15:18:33,200: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:33,200: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:33,694: llmtf.base.darumeru/ruMMLU: Loading Dataset: 7.45s
INFO: 2024-07-13 15:18:35,345: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.36s
INFO: 2024-07-13 15:18:35,432: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-07-13 15:18:35,433: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:35,433: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:37,953: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 2.52s
INFO: 2024-07-13 15:18:40,885: llmtf.base.daru/treewayextractive: Loading Dataset: 7.69s
INFO: 2024-07-13 15:23:40,040: llmtf.base.evaluator: Starting eval on ['darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'russiannlp/rucola_custom']
INFO: 2024-07-13 15:23:40,042: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:40,042: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:40,509: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu', 'daru/treewayextractive']
INFO: 2024-07-13 15:23:40,510: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:40,510: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:41,090: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu', 'nlpcoreteam/enmmlu']
INFO: 2024-07-13 15:23:41,091: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:41,091: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:42,369: llmtf.base.darumeru/PARus: Loading Dataset: 2.33s
INFO: 2024-07-13 15:23:43,206: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-13 15:23:43,207: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:43,207: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:45,405: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/use']
INFO: 2024-07-13 15:23:45,405: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:45,405: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:45,786: llmtf.base.darumeru/PARus: Processing Dataset: 3.42s
INFO: 2024-07-13 15:23:45,788: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-07-13 15:23:45,800: llmtf.base.darumeru/PARus: {'acc': 0.75}
INFO: 2024-07-13 15:23:45,801: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:45,801: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:47,436: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.23s
INFO: 2024-07-13 15:23:47,479: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_para_ru']
INFO: 2024-07-13 15:23:47,479: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:47,479: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:47,668: llmtf.base.darumeru/ruMMLU: Loading Dataset: 7.16s
INFO: 2024-07-13 15:23:47,812: llmtf.base.darumeru/RCB: Loading Dataset: 2.01s
INFO: 2024-07-13 15:23:49,390: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_en', 'darumeru/cp_para_en']
INFO: 2024-07-13 15:23:49,390: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:49,390: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:49,768: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 2.29s
INFO: 2024-07-13 15:23:51,703: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 2.31s
INFO: 2024-07-13 15:23:53,877: llmtf.base.darumeru/RCB: Processing Dataset: 6.06s
INFO: 2024-07-13 15:23:53,892: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-07-13 15:23:53,898: llmtf.base.darumeru/RCB: {'acc': 0.5227272727272727, 'f1_macro': 0.4428418803418803}
INFO: 2024-07-13 15:23:53,899: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:53,899: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:56,231: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 2.33s
INFO: 2024-07-13 15:24:00,471: llmtf.base.darumeru/MultiQ: Loading Dataset: 15.07s
INFO: 2024-07-13 15:24:35,821: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 39.59s
INFO: 2024-07-13 15:24:35,822: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-07-13 15:24:35,835: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7323883161512027, 'f1_macro': 0.7329226353930633}
INFO: 2024-07-13 15:24:35,842: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:24:35,842: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:24:40,146: llmtf.base.darumeru/ruTiE: Loading Dataset: 4.30s
INFO: 2024-07-13 15:26:07,845: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 146.75s
INFO: 2024-07-13 15:26:38,965: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 167.26s
INFO: 2024-07-13 15:26:38,982: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-07-13 15:26:38,999: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 3.895358639670925, 'len': 0.9974420397589191, 'lcs': 0.9801922792969053}
INFO: 2024-07-13 15:26:39,001: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:26:39,001: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:26:41,305: llmtf.base.darumeru/cp_para_en: Loading Dataset: 2.30s
INFO: 2024-07-13 15:27:49,425: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 239.65s
INFO: 2024-07-13 15:27:49,442: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-07-13 15:27:49,446: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.3734936335344794, 'len': 0.9922334558022529, 'lcs': 0.9153760193869099}
INFO: 2024-07-13 15:27:49,448: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:27:49,448: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:27:51,495: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.05s
INFO: 2024-07-13 15:28:59,772: llmtf.base.darumeru/ruTiE: Processing Dataset: 259.62s
INFO: 2024-07-13 15:28:59,774: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE:
INFO: 2024-07-13 15:28:59,815: llmtf.base.darumeru/ruTiE: {'acc': 0.5372093023255814}
INFO: 2024-07-13 15:28:59,818: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:28:59,819: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:29:01,873: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.05s
INFO: 2024-07-13 15:29:04,102: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.21s
INFO: 2024-07-13 15:29:04,104: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-07-13 15:29:04,109: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8761904761904762, 'f1_macro': 0.8744880624959789}
INFO: 2024-07-13 15:29:04,110: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:29:04,110: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:29:06,342: llmtf.base.darumeru/RWSD: Loading Dataset: 2.23s
INFO: 2024-07-13 15:29:13,292: llmtf.base.darumeru/RWSD: Processing Dataset: 6.95s
INFO: 2024-07-13 15:29:13,294: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-07-13 15:29:13,298: llmtf.base.darumeru/RWSD: {'acc': 0.5392156862745098}
INFO: 2024-07-13 15:29:13,299: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:29:13,299: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:29:16,878: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 3.58s
INFO: 2024-07-13 15:30:03,574: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 46.69s
INFO: 2024-07-13 15:30:03,578: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-07-13 15:30:03,605: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7366343738787227, 'mcc': 0.34075509260259335}
INFO: 2024-07-13 15:30:03,609: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:30:03,635: llmtf.base.evaluator:
mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree russiannlp/rucola_custom
0.716 0.750 0.483 0.539 0.997 0.992 0.733 0.537 0.875 0.539
INFO: 2024-07-13 15:30:38,848: llmtf.base.darumeru/cp_para_en: Processing Dataset: 237.54s
INFO: 2024-07-13 15:30:38,850: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-07-13 15:30:38,867: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 3.964633931969539, 'len': 0.9963390331388082, 'lcs': 0.873438038674546}
INFO: 2024-07-13 15:30:38,867: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:30:38,877: llmtf.base.evaluator:
mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_en darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree russiannlp/rucola_custom
0.732 0.750 0.483 0.539 0.873 0.997 0.992 0.733 0.537 0.875 0.539
INFO: 2024-07-13 15:30:58,082: llmtf.base.darumeru/ruMMLU: Processing Dataset: 430.41s
INFO: 2024-07-13 15:30:58,085: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-07-13 15:30:58,123: llmtf.base.darumeru/ruMMLU: {'acc': 0.4818916492068243}
INFO: 2024-07-13 15:30:58,162: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:30:58,163: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:31:08,009: llmtf.base.daru/treewayextractive: Loading Dataset: 9.84s
INFO: 2024-07-13 15:31:29,151: llmtf.base.darumeru/MultiQ: Processing Dataset: 448.68s
INFO: 2024-07-13 15:31:29,154: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-07-13 15:31:29,174: llmtf.base.darumeru/MultiQ: {'f1': 0.2909161781249439, 'em': 0.16634799235181644}
INFO: 2024-07-13 15:31:29,179: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:31:29,179: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:31:32,357: llmtf.base.darumeru/USE: Loading Dataset: 3.18s
INFO: 2024-07-13 15:33:20,030: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 432.18s
INFO: 2024-07-13 15:33:20,032: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-07-13 15:33:20,078: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
abstract_algebra 0.360000
anatomy 0.414815
astronomy 0.565789
business_ethics 0.550000
clinical_knowledge 0.528302
college_biology 0.486111
college_chemistry 0.450000
college_computer_science 0.510000
college_mathematics 0.390000
college_medicine 0.508671
college_physics 0.254902
computer_security 0.580000
conceptual_physics 0.434043
econometrics 0.359649
electrical_engineering 0.489655
elementary_mathematics 0.370370
formal_logic 0.325397
global_facts 0.280000
high_school_biology 0.590323
high_school_chemistry 0.374384
high_school_computer_science 0.600000
high_school_european_history 0.666667
high_school_geography 0.666667
high_school_government_and_politics 0.580311
high_school_macroeconomics 0.438462
high_school_mathematics 0.359259
high_school_microeconomics 0.478992
high_school_physics 0.397351
high_school_psychology 0.625688
high_school_statistics 0.467593
high_school_us_history 0.681373
high_school_world_history 0.713080
human_aging 0.515695
human_sexuality 0.557252
international_law 0.652893
jurisprudence 0.527778
logical_fallacies 0.423313
machine_learning 0.321429
management 0.631068
marketing 0.675214
medical_genetics 0.530000
miscellaneous 0.624521
moral_disputes 0.528902
moral_scenarios 0.231285
nutrition 0.555556
philosophy 0.482315
prehistory 0.490741
professional_accounting 0.382979
professional_law 0.367014
professional_medicine 0.477941
professional_psychology 0.439542
public_relations 0.554545
security_studies 0.612245
sociology 0.686567
us_foreign_policy 0.730000
virology 0.487952
world_religions 0.690058
INFO: 2024-07-13 15:33:20,086: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
STEM 0.444512
humanities 0.521601
other (business, health, misc.) 0.511622
social sciences 0.560827
INFO: 2024-07-13 15:33:20,093: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5096404231777287}
INFO: 2024-07-13 15:33:20,128: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:33:20,129: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:34:10,886: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 379.39s
INFO: 2024-07-13 15:34:10,904: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-07-13 15:34:10,908: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.469064241004292, 'len': 0.9929789601006123, 'lcs': 0.843045621556421}
INFO: 2024-07-13 15:34:10,908: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:34:10,938: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.670 0.229 0.750 0.483 0.539 0.873 0.843 0.997 0.992 0.482 0.733 0.537 0.875 0.510 0.539
INFO: 2024-07-13 15:35:15,961: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 115.83s
INFO: 2024-07-13 15:36:14,977: llmtf.base.darumeru/USE: Processing Dataset: 282.62s
INFO: 2024-07-13 15:36:14,978: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-07-13 15:36:14,982: llmtf.base.darumeru/USE: {'grade_norm': 0.06568627450980391}
INFO: 2024-07-13 15:36:14,985: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:36:14,994: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.630 0.229 0.750 0.483 0.539 0.066 0.873 0.843 0.997 0.992 0.482 0.733 0.537 0.875 0.510 0.539
INFO: 2024-07-13 15:39:22,754: llmtf.base.daru/treewayextractive: Processing Dataset: 494.73s
INFO: 2024-07-13 15:39:22,774: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-07-13 15:39:23,003: llmtf.base.daru/treewayextractive: {'r-prec': 0.40769011544011546}
INFO: 2024-07-13 15:39:23,057: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:39:23,070: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.616 0.408 0.229 0.750 0.483 0.539 0.066 0.873 0.843 0.997 0.992 0.482 0.733 0.537 0.875 0.510 0.539
INFO: 2024-07-13 15:40:06,403: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 290.44s
INFO: 2024-07-13 15:40:06,407: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-07-13 15:40:06,452: llmtf.base.nlpcoreteam/enMMLU: metric
subject
abstract_algebra 0.340000
anatomy 0.614815
astronomy 0.651316
business_ethics 0.650000
clinical_knowledge 0.675472
college_biology 0.729167
college_chemistry 0.470000
college_computer_science 0.550000
college_mathematics 0.380000
college_medicine 0.641618
college_physics 0.352941
computer_security 0.700000
conceptual_physics 0.553191
econometrics 0.456140
electrical_engineering 0.593103
elementary_mathematics 0.412698
formal_logic 0.476190
global_facts 0.310000
high_school_biology 0.790323
high_school_chemistry 0.458128
high_school_computer_science 0.670000
high_school_european_history 0.769697
high_school_geography 0.792929
high_school_government_and_politics 0.880829
high_school_macroeconomics 0.612821
high_school_mathematics 0.344444
high_school_microeconomics 0.642857
high_school_physics 0.337748
high_school_psychology 0.823853
high_school_statistics 0.467593
high_school_us_history 0.794118
high_school_world_history 0.814346
human_aging 0.721973
human_sexuality 0.732824
international_law 0.735537
jurisprudence 0.722222
logical_fallacies 0.760736
machine_learning 0.455357
management 0.776699
marketing 0.858974
medical_genetics 0.690000
miscellaneous 0.830140
moral_disputes 0.679191
moral_scenarios 0.232402
nutrition 0.709150
philosophy 0.655949
prehistory 0.672840
professional_accounting 0.460993
professional_law 0.468057
professional_medicine 0.709559
professional_psychology 0.619281
public_relations 0.645455
security_studies 0.673469
sociology 0.850746
us_foreign_policy 0.850000
virology 0.512048
world_religions 0.853801
INFO: 2024-07-13 15:40:06,459: llmtf.base.nlpcoreteam/enMMLU: metric
subject
STEM 0.514223
humanities 0.664237
other (business, health, misc.) 0.654389
social sciences 0.715100
INFO: 2024-07-13 15:40:06,466: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6369873391991915}
INFO: 2024-07-13 15:40:06,497: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:40:06,508: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.617 0.408 0.229 0.750 0.483 0.539 0.066 0.873 0.843 0.997 0.992 0.482 0.733 0.537 0.875 0.637 0.510 0.539
INFO: 2024-07-13 15:44:34,352: llmtf.base.daru/treewayabstractive: Processing Dataset: 1246.91s
INFO: 2024-07-13 15:44:34,354: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-07-13 15:44:34,373: llmtf.base.daru/treewayabstractive: {'rouge1': 0.34479017541198337, 'rouge2': 0.12451437402782907}
INFO: 2024-07-13 15:44:34,376: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:44:34,403: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.596 0.235 0.408 0.229 0.750 0.483 0.539 0.066 0.873 0.843 0.997 0.992 0.482 0.733 0.537 0.875 0.637 0.510 0.539
|