File size: 23,258 Bytes
b43ce9f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 | INFO: 2024-07-13 15:18:29,367: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-07-13 15:18:29,368: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:29,368: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:30,101: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-07-13 15:18:30,101: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:30,102: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:31,006: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-07-13 15:18:31,007: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:31,007: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:33,846: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-07-13 15:18:33,846: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:33,846: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:34,873: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-13 15:18:34,874: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:34,874: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:36,947: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-07-13 15:18:36,948: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:36,948: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:39,585: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-07-13 15:18:39,585: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:39,585: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:42,261: llmtf.base.darumeru/MultiQ: Loading Dataset: 12.89s
INFO: 2024-07-13 15:18:43,245: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 3.66s
INFO: 2024-07-13 15:18:43,377: llmtf.base.daru/treewayabstractive: Loading Dataset: 8.50s
INFO: 2024-07-13 15:18:44,950: llmtf.base.daru/treewayextractive: Loading Dataset: 8.00s
INFO: 2024-07-13 15:19:21,718: llmtf.base.darumeru/ruMMLU: Loading Dataset: 51.62s
INFO: 2024-07-13 15:23:45,855: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-07-13 15:23:45,858: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:45,858: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:46,328: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-07-13 15:23:46,329: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:46,329: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:48,239: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-07-13 15:23:48,240: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:48,240: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:50,172: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-07-13 15:23:50,172: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:50,172: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:52,594: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-13 15:23:52,594: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:52,594: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:53,731: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-07-13 15:23:53,732: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:53,732: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:55,589: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-07-13 15:23:55,589: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:55,589: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:58,285: llmtf.base.darumeru/MultiQ: Loading Dataset: 12.43s
INFO: 2024-07-13 15:23:59,075: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 3.49s
INFO: 2024-07-13 15:24:00,764: llmtf.base.daru/treewayabstractive: Loading Dataset: 8.17s
INFO: 2024-07-13 15:24:01,255: llmtf.base.daru/treewayextractive: Loading Dataset: 7.52s
INFO: 2024-07-13 15:24:37,276: llmtf.base.darumeru/ruMMLU: Loading Dataset: 50.95s
INFO: 2024-07-13 15:27:06,687: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 196.51s
INFO: 2024-07-13 15:27:15,808: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 207.57s
INFO: 2024-07-13 15:29:44,399: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 345.32s
INFO: 2024-07-13 15:29:44,403: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-07-13 15:29:44,407: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.3701923347659983, 'len': 0.9987691197336923, 'lcs': 0.9819406016228798}
INFO: 2024-07-13 15:29:44,410: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:29:44,410: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:29:47,896: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.48s
INFO: 2024-07-13 15:32:14,981: llmtf.base.daru/treewayextractive: Processing Dataset: 493.72s
INFO: 2024-07-13 15:32:14,987: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-07-13 15:32:15,227: llmtf.base.daru/treewayextractive: {'r-prec': 0.40769011544011546}
INFO: 2024-07-13 15:32:15,287: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:32:15,293: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/cp_sent_ru
0.703 0.408 0.999
INFO: 2024-07-13 15:33:08,688: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 200.79s
INFO: 2024-07-13 15:33:08,691: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-07-13 15:33:08,708: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 3.8994152226580563, 'len': 0.9995035620835028, 'lcs': 0.9936840637058483}
INFO: 2024-07-13 15:33:08,711: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:33:08,711: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:33:11,469: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.76s
INFO: 2024-07-13 15:33:32,789: llmtf.base.darumeru/MultiQ: Processing Dataset: 574.49s
INFO: 2024-07-13 15:33:32,791: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-07-13 15:33:32,796: llmtf.base.darumeru/MultiQ: {'f1': 0.5726350715356451, 'em': 0.5019120458891013}
INFO: 2024-07-13 15:33:32,807: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:33:32,808: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:33:35,547: llmtf.base.darumeru/PARus: Loading Dataset: 2.74s
INFO: 2024-07-13 15:33:51,177: llmtf.base.darumeru/PARus: Processing Dataset: 15.63s
INFO: 2024-07-13 15:33:51,179: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-07-13 15:33:51,191: llmtf.base.darumeru/PARus: {'acc': 0.83}
INFO: 2024-07-13 15:33:51,193: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:33:51,193: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:33:54,244: llmtf.base.darumeru/RCB: Loading Dataset: 3.05s
INFO: 2024-07-13 15:34:20,224: llmtf.base.darumeru/RCB: Processing Dataset: 25.98s
INFO: 2024-07-13 15:34:20,241: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-07-13 15:34:20,248: llmtf.base.darumeru/RCB: {'acc': 0.5181818181818182, 'f1_macro': 0.46564877615699873}
INFO: 2024-07-13 15:34:20,250: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:34:20,250: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:34:28,734: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 8.48s
INFO: 2024-07-13 15:37:02,786: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 154.05s
INFO: 2024-07-13 15:37:02,802: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-07-13 15:37:02,816: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7525773195876289, 'f1_macro': 0.7540227232789819}
INFO: 2024-07-13 15:37:02,832: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:37:02,832: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:37:07,215: llmtf.base.darumeru/ruTiE: Loading Dataset: 4.38s
INFO: 2024-07-13 15:41:29,256: llmtf.base.darumeru/ruTiE: Processing Dataset: 262.04s
INFO: 2024-07-13 15:41:29,260: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE:
INFO: 2024-07-13 15:41:29,289: llmtf.base.darumeru/ruTiE: {'acc': 0.5372093023255814}
INFO: 2024-07-13 15:41:29,292: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:41:29,292: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:41:32,242: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.95s
INFO: 2024-07-13 15:41:41,454: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 9.21s
INFO: 2024-07-13 15:41:41,471: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-07-13 15:41:41,493: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8857142857142857, 'f1_macro': 0.8846523292790873}
INFO: 2024-07-13 15:41:41,494: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:41:41,494: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:41:45,149: llmtf.base.darumeru/RWSD: Loading Dataset: 3.65s
INFO: 2024-07-13 15:42:09,254: llmtf.base.darumeru/RWSD: Processing Dataset: 24.10s
INFO: 2024-07-13 15:42:09,256: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-07-13 15:42:09,261: llmtf.base.darumeru/RWSD: {'acc': 0.6078431372549019}
INFO: 2024-07-13 15:42:09,263: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:42:09,263: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:42:16,716: llmtf.base.darumeru/USE: Loading Dataset: 7.45s
INFO: 2024-07-13 15:46:19,569: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 1152.88s
INFO: 2024-07-13 15:46:19,575: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-07-13 15:46:19,620: llmtf.base.nlpcoreteam/enMMLU: metric
subject
abstract_algebra 0.330000
anatomy 0.651852
astronomy 0.671053
business_ethics 0.650000
clinical_knowledge 0.720755
college_biology 0.770833
college_chemistry 0.500000
college_computer_science 0.560000
college_mathematics 0.400000
college_medicine 0.676301
college_physics 0.362745
computer_security 0.760000
conceptual_physics 0.578723
econometrics 0.473684
electrical_engineering 0.551724
elementary_mathematics 0.394180
formal_logic 0.492063
global_facts 0.330000
high_school_biology 0.770968
high_school_chemistry 0.487685
high_school_computer_science 0.690000
high_school_european_history 0.806061
high_school_geography 0.792929
high_school_government_and_politics 0.891192
high_school_macroeconomics 0.638462
high_school_mathematics 0.359259
high_school_microeconomics 0.655462
high_school_physics 0.350993
high_school_psychology 0.834862
high_school_statistics 0.476852
high_school_us_history 0.823529
high_school_world_history 0.831224
human_aging 0.717489
human_sexuality 0.770992
international_law 0.801653
jurisprudence 0.750000
logical_fallacies 0.797546
machine_learning 0.508929
management 0.844660
marketing 0.880342
medical_genetics 0.740000
miscellaneous 0.826309
moral_disputes 0.734104
moral_scenarios 0.269274
nutrition 0.725490
philosophy 0.710611
prehistory 0.762346
professional_accounting 0.475177
professional_law 0.481747
professional_medicine 0.709559
professional_psychology 0.640523
public_relations 0.654545
security_studies 0.738776
sociology 0.830846
us_foreign_policy 0.850000
virology 0.524096
world_religions 0.818713
INFO: 2024-07-13 15:46:19,627: llmtf.base.nlpcoreteam/enMMLU: metric
subject
STEM 0.529108
humanities 0.698375
other (business, health, misc.) 0.676574
social sciences 0.731023
INFO: 2024-07-13 15:46:19,635: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6587697555779514}
INFO: 2024-07-13 15:46:19,704: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:46:19,740: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU
0.701 0.408 0.537 0.830 0.492 0.608 1.000 0.999 0.753 0.537 0.885 0.659
INFO: 2024-07-13 15:46:23,553: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 792.08s
INFO: 2024-07-13 15:46:23,572: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-07-13 15:46:23,603: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.4704173225051846, 'len': 0.9993025871189104, 'lcs': 0.9552661852470385}
INFO: 2024-07-13 15:46:23,606: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:46:23,606: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:46:26,330: llmtf.base.darumeru/cp_para_en: Loading Dataset: 2.72s
INFO: 2024-07-13 15:48:32,771: llmtf.base.darumeru/USE: Processing Dataset: 376.05s
INFO: 2024-07-13 15:48:32,775: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-07-13 15:48:32,780: llmtf.base.darumeru/USE: {'grade_norm': 0.12352941176470587}
INFO: 2024-07-13 15:48:32,787: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:48:32,787: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:48:44,556: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 11.77s
INFO: 2024-07-13 15:50:07,016: llmtf.base.darumeru/ruMMLU: Processing Dataset: 1529.74s
INFO: 2024-07-13 15:50:07,019: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-07-13 15:50:07,028: llmtf.base.darumeru/ruMMLU: {'acc': 0.4868801755961289}
INFO: 2024-07-13 15:50:07,113: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:50:07,146: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU
0.662 0.408 0.537 0.830 0.492 0.608 0.124 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659
INFO: 2024-07-13 15:52:21,515: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 216.96s
INFO: 2024-07-13 15:52:21,520: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-07-13 15:52:21,533: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7384284176533907, 'mcc': 0.3763427268436289}
INFO: 2024-07-13 15:52:21,545: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:52:21,562: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom
0.655 0.408 0.537 0.830 0.492 0.608 0.124 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659 0.557
INFO: 2024-07-13 15:54:48,357: llmtf.base.daru/treewayabstractive: Processing Dataset: 1847.59s
INFO: 2024-07-13 15:54:48,390: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-07-13 15:54:48,397: llmtf.base.daru/treewayabstractive: {'rouge1': 0.34956234095604516, 'rouge2': 0.13050451589110393}
INFO: 2024-07-13 15:54:48,402: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:54:48,429: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom
0.629 0.240 0.408 0.537 0.830 0.492 0.608 0.124 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659 0.557
INFO: 2024-07-13 15:55:03,040: llmtf.base.darumeru/cp_para_en: Processing Dataset: 516.71s
INFO: 2024-07-13 15:55:03,042: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-07-13 15:55:03,046: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 3.960763996832381, 'len': 0.9995281850843424, 'lcs': 0.9811766452032213}
INFO: 2024-07-13 15:55:03,048: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:55:03,057: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom
0.650 0.240 0.408 0.537 0.830 0.492 0.608 0.124 0.981 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659 0.557
INFO: 2024-07-13 15:57:18,212: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 1802.40s
INFO: 2024-07-13 15:57:18,228: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-07-13 15:57:18,274: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
abstract_algebra 0.280000
anatomy 0.392593
astronomy 0.565789
business_ethics 0.560000
clinical_knowledge 0.554717
college_biology 0.465278
college_chemistry 0.410000
college_computer_science 0.500000
college_mathematics 0.360000
college_medicine 0.554913
college_physics 0.333333
computer_security 0.590000
conceptual_physics 0.468085
econometrics 0.403509
electrical_engineering 0.503448
elementary_mathematics 0.367725
formal_logic 0.365079
global_facts 0.330000
high_school_biology 0.619355
high_school_chemistry 0.399015
high_school_computer_science 0.640000
high_school_european_history 0.678788
high_school_geography 0.676768
high_school_government_and_politics 0.647668
high_school_macroeconomics 0.512821
high_school_mathematics 0.314815
high_school_microeconomics 0.533613
high_school_physics 0.344371
high_school_psychology 0.651376
high_school_statistics 0.416667
high_school_us_history 0.720588
high_school_world_history 0.679325
human_aging 0.520179
human_sexuality 0.618321
international_law 0.719008
jurisprudence 0.601852
logical_fallacies 0.509202
machine_learning 0.464286
management 0.669903
marketing 0.735043
medical_genetics 0.530000
miscellaneous 0.605364
moral_disputes 0.580925
moral_scenarios 0.189944
nutrition 0.611111
philosophy 0.581994
prehistory 0.524691
professional_accounting 0.397163
professional_law 0.361147
professional_medicine 0.441176
professional_psychology 0.486928
public_relations 0.545455
security_studies 0.595918
sociology 0.681592
us_foreign_policy 0.690000
virology 0.427711
world_religions 0.748538
INFO: 2024-07-13 15:57:18,281: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
STEM 0.446787
humanities 0.558545
other (business, health, misc.) 0.523562
social sciences 0.586997
INFO: 2024-07-13 15:57:18,303: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5289728961247521}
INFO: 2024-07-13 15:57:18,385: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:57:18,616: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.643 0.240 0.408 0.537 0.830 0.492 0.608 0.124 0.981 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659 0.529 0.557
|