RefalMachine's picture
Upload folder using huggingface_hub
427b863 verified
INFO: 2024-07-12 12:31:54,102: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-07-12 12:31:54,104: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271]
INFO: 2024-07-12 12:31:54,104: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:31:54,851: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-07-12 12:31:54,851: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271]
INFO: 2024-07-12 12:31:54,851: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:31:56,886: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-07-12 12:31:56,888: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-12 12:31:56,888: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-12 12:31:58,472: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-07-12 12:31:58,472: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-12 12:31:58,472: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-12 12:32:00,857: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-12 12:32:00,858: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271]
INFO: 2024-07-12 12:32:00,858: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:32:02,761: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-07-12 12:32:02,762: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-12 12:32:02,762: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-12 12:32:04,488: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-07-12 12:32:04,489: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271]
INFO: 2024-07-12 12:32:04,489: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:32:09,155: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.67s
INFO: 2024-07-12 12:32:14,891: llmtf.base.daru/treewayextractive: Loading Dataset: 12.13s
INFO: 2024-07-12 12:32:15,608: llmtf.base.darumeru/MultiQ: Loading Dataset: 21.50s
INFO: 2024-07-12 12:32:17,484: llmtf.base.daru/treewayabstractive: Loading Dataset: 16.63s
INFO: 2024-07-12 12:33:23,447: llmtf.base.darumeru/ruMMLU: Loading Dataset: 88.60s
INFO: 2024-07-12 12:35:29,475: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 211.00s
INFO: 2024-07-12 12:36:18,872: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 261.98s
INFO: 2024-07-12 12:41:00,316: llmtf.base.darumeru/MultiQ: Processing Dataset: 524.71s
INFO: 2024-07-12 12:41:00,319: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-07-12 12:41:00,324: llmtf.base.darumeru/MultiQ: {'f1': 0.478766497045627, 'em': 0.37858508604206503}
INFO: 2024-07-12 12:41:00,335: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271]
INFO: 2024-07-12 12:41:00,335: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:41:03,781: llmtf.base.darumeru/PARus: Loading Dataset: 3.45s
INFO: 2024-07-12 12:41:18,655: llmtf.base.darumeru/PARus: Processing Dataset: 14.87s
INFO: 2024-07-12 12:41:18,658: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-07-12 12:41:18,688: llmtf.base.darumeru/PARus: {'acc': 0.85}
INFO: 2024-07-12 12:41:18,691: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271]
INFO: 2024-07-12 12:41:18,691: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:41:22,570: llmtf.base.darumeru/RCB: Loading Dataset: 3.88s
INFO: 2024-07-12 12:41:46,109: llmtf.base.darumeru/RCB: Processing Dataset: 23.53s
INFO: 2024-07-12 12:41:46,111: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-07-12 12:41:46,134: llmtf.base.darumeru/RCB: {'acc': 0.5363636363636364, 'f1_macro': 0.44417678744688677}
INFO: 2024-07-12 12:41:46,136: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271]
INFO: 2024-07-12 12:41:46,136: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:42:01,187: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 15.05s
INFO: 2024-07-12 12:42:07,490: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 598.33s
INFO: 2024-07-12 12:42:07,494: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-07-12 12:42:07,514: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.8275470644342664, 'len': 0.9976152623010418, 'lcs': 0.9848046802324042}
INFO: 2024-07-12 12:42:07,517: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271]
INFO: 2024-07-12 12:42:07,517: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:42:11,744: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 4.23s
INFO: 2024-07-12 12:44:32,297: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 151.11s
INFO: 2024-07-12 12:44:32,299: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-07-12 12:44:32,313: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7689003436426117, 'f1_macro': 0.76891271419294}
INFO: 2024-07-12 12:44:32,329: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271]
INFO: 2024-07-12 12:44:32,329: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:44:39,798: llmtf.base.darumeru/ruTiE: Loading Dataset: 7.47s
INFO: 2024-07-12 12:49:10,076: llmtf.base.darumeru/ruTiE: Processing Dataset: 270.28s
INFO: 2024-07-12 12:49:10,077: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE:
INFO: 2024-07-12 12:49:10,105: llmtf.base.darumeru/ruTiE: {'acc': 0.3511627906976744}
INFO: 2024-07-12 12:49:10,109: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271]
INFO: 2024-07-12 12:49:10,109: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:49:12,787: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.68s
INFO: 2024-07-12 12:49:21,195: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 8.41s
INFO: 2024-07-12 12:49:21,197: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-07-12 12:49:21,226: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8666666666666667, 'f1_macro': 0.8628249661014111}
INFO: 2024-07-12 12:49:21,227: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271]
INFO: 2024-07-12 12:49:21,227: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:49:25,118: llmtf.base.darumeru/RWSD: Loading Dataset: 3.89s
INFO: 2024-07-12 12:49:45,525: llmtf.base.darumeru/RWSD: Processing Dataset: 20.40s
INFO: 2024-07-12 12:49:45,527: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-07-12 12:49:45,531: llmtf.base.darumeru/RWSD: {'acc': 0.5392156862745098}
INFO: 2024-07-12 12:49:45,533: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271]
INFO: 2024-07-12 12:49:45,533: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:50:00,952: llmtf.base.darumeru/USE: Loading Dataset: 15.42s
INFO: 2024-07-12 12:50:23,763: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 492.02s
INFO: 2024-07-12 12:50:23,765: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-07-12 12:50:23,785: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.424509793356442, 'len': 0.9995781033988959, 'lcs': 0.994055994028679}
INFO: 2024-07-12 12:50:23,788: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271]
INFO: 2024-07-12 12:50:23,788: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:50:27,725: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 3.94s
INFO: 2024-07-12 12:53:49,377: llmtf.base.darumeru/ruMMLU: Processing Dataset: 1225.92s
INFO: 2024-07-12 12:53:49,380: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-07-12 12:53:49,389: llmtf.base.darumeru/ruMMLU: {'acc': 0.5154145465429512}
INFO: 2024-07-12 12:53:49,476: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 12:53:49,488: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree
0.681 0.429 0.850 0.490 0.539 1.000 0.998 0.515 0.769 0.351 0.865
INFO: 2024-07-12 12:54:13,002: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 1123.52s
INFO: 2024-07-12 12:54:13,004: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-07-12 12:54:13,044: llmtf.base.nlpcoreteam/enMMLU: metric
subject
abstract_algebra 0.350000
anatomy 0.696296
astronomy 0.723684
business_ethics 0.700000
clinical_knowledge 0.750943
college_biology 0.805556
college_chemistry 0.510000
college_computer_science 0.600000
college_mathematics 0.330000
college_medicine 0.676301
college_physics 0.480392
computer_security 0.780000
conceptual_physics 0.565957
econometrics 0.561404
electrical_engineering 0.627586
elementary_mathematics 0.447090
formal_logic 0.492063
global_facts 0.430000
high_school_biology 0.800000
high_school_chemistry 0.512315
high_school_computer_science 0.760000
high_school_european_history 0.787879
high_school_geography 0.853535
high_school_government_and_politics 0.922280
high_school_macroeconomics 0.674359
high_school_mathematics 0.377778
high_school_microeconomics 0.773109
high_school_physics 0.423841
high_school_psychology 0.840367
high_school_statistics 0.537037
high_school_us_history 0.828431
high_school_world_history 0.839662
human_aging 0.735426
human_sexuality 0.809160
international_law 0.826446
jurisprudence 0.777778
logical_fallacies 0.760736
machine_learning 0.544643
management 0.825243
marketing 0.905983
medical_genetics 0.830000
miscellaneous 0.841635
moral_disputes 0.751445
moral_scenarios 0.498324
nutrition 0.754902
philosophy 0.726688
prehistory 0.759259
professional_accounting 0.560284
professional_law 0.486310
professional_medicine 0.750000
professional_psychology 0.717320
public_relations 0.690909
security_studies 0.722449
sociology 0.845771
us_foreign_policy 0.830000
virology 0.524096
world_religions 0.818713
INFO: 2024-07-12 12:54:13,052: llmtf.base.nlpcoreteam/enMMLU: metric
subject
STEM 0.565327
humanities 0.719518
other (business, health, misc.) 0.712936
social sciences 0.770055
INFO: 2024-07-12 12:54:13,060: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6919591187019263}
INFO: 2024-07-12 12:54:13,132: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 12:54:13,141: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU
0.682 0.429 0.850 0.490 0.539 1.000 0.998 0.515 0.769 0.351 0.865 0.692
INFO: 2024-07-12 12:54:45,001: llmtf.base.daru/treewayextractive: Processing Dataset: 1350.11s
INFO: 2024-07-12 12:54:45,002: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-07-12 12:54:45,249: llmtf.base.daru/treewayextractive: {'r-prec': 0.3960751082251082}
INFO: 2024-07-12 12:54:45,733: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 12:54:45,743: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU
0.658 0.396 0.429 0.850 0.490 0.539 1.000 0.998 0.515 0.769 0.351 0.865 0.692
INFO: 2024-07-12 12:55:01,556: llmtf.base.darumeru/USE: Processing Dataset: 300.60s
INFO: 2024-07-12 12:55:01,575: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-07-12 12:55:01,596: llmtf.base.darumeru/USE: {'grade_norm': 0.1852941176470588}
INFO: 2024-07-12 12:55:01,602: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-12 12:55:01,603: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-12 12:55:22,870: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 21.27s
INFO: 2024-07-12 12:58:48,313: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 205.44s
INFO: 2024-07-12 12:58:48,314: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-07-12 12:58:48,340: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7115177610333692, 'mcc': 0.3351158798730935}
INFO: 2024-07-12 12:58:48,351: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 12:58:48,359: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom
0.614 0.396 0.429 0.850 0.490 0.539 0.185 1.000 0.998 0.515 0.769 0.351 0.865 0.692 0.523
INFO: 2024-07-12 12:59:33,950: llmtf.base.daru/treewayabstractive: Processing Dataset: 1636.45s
INFO: 2024-07-12 12:59:33,951: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-07-12 12:59:33,956: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3631169847404013, 'rouge2': 0.13422177695865692}
INFO: 2024-07-12 12:59:33,960: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 12:59:33,969: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom
0.590 0.249 0.396 0.429 0.850 0.490 0.539 0.185 1.000 0.998 0.515 0.769 0.351 0.865 0.692 0.523
INFO: 2024-07-12 13:02:17,260: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 1558.37s
INFO: 2024-07-12 13:02:17,262: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-07-12 13:02:17,302: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
abstract_algebra 0.340000
anatomy 0.518519
astronomy 0.657895
business_ethics 0.690000
clinical_knowledge 0.588679
college_biology 0.541667
college_chemistry 0.470000
college_computer_science 0.520000
college_mathematics 0.360000
college_medicine 0.554913
college_physics 0.343137
computer_security 0.710000
conceptual_physics 0.523404
econometrics 0.447368
electrical_engineering 0.551724
elementary_mathematics 0.417989
formal_logic 0.412698
global_facts 0.360000
high_school_biology 0.661290
high_school_chemistry 0.399015
high_school_computer_science 0.700000
high_school_european_history 0.763636
high_school_geography 0.666667
high_school_government_and_politics 0.652850
high_school_macroeconomics 0.556410
high_school_mathematics 0.359259
high_school_microeconomics 0.554622
high_school_physics 0.397351
high_school_psychology 0.680734
high_school_statistics 0.453704
high_school_us_history 0.696078
high_school_world_history 0.734177
human_aging 0.542601
human_sexuality 0.641221
international_law 0.752066
jurisprudence 0.657407
logical_fallacies 0.564417
machine_learning 0.383929
management 0.689320
marketing 0.735043
medical_genetics 0.660000
miscellaneous 0.648787
moral_disputes 0.630058
moral_scenarios 0.382123
nutrition 0.601307
philosophy 0.617363
prehistory 0.570988
professional_accounting 0.397163
professional_law 0.398957
professional_medicine 0.503676
professional_psychology 0.522876
public_relations 0.609091
security_studies 0.653061
sociology 0.686567
us_foreign_policy 0.740000
virology 0.463855
world_religions 0.695906
INFO: 2024-07-12 13:02:17,309: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
STEM 0.488354
humanities 0.605837
other (business, health, misc.) 0.568133
social sciences 0.617622
INFO: 2024-07-12 13:02:17,317: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5699864045459677}
INFO: 2024-07-12 13:02:17,400: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 13:02:17,437: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.589 0.249 0.396 0.429 0.850 0.490 0.539 0.185 1.000 0.998 0.515 0.769 0.351 0.865 0.692 0.570 0.523
INFO: 2024-07-12 13:03:33,021: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 785.29s
INFO: 2024-07-12 13:03:33,024: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-07-12 13:03:33,042: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.9697516062295746, 'len': 0.9984044778480231, 'lcs': 0.9773285044731846}
INFO: 2024-07-12 13:03:33,044: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271]
INFO: 2024-07-12 13:03:33,044: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 13:03:37,463: llmtf.base.darumeru/cp_para_en: Loading Dataset: 4.42s
INFO: 2024-07-12 13:14:17,087: llmtf.base.darumeru/cp_para_en: Processing Dataset: 639.62s
INFO: 2024-07-12 13:14:17,090: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-07-12 13:14:17,110: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.485692967910676, 'len': 0.9994082267431339, 'lcs': 0.9723279130849847}
INFO: 2024-07-12 13:14:17,112: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 13:14:17,137: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.632 0.249 0.396 0.429 0.850 0.490 0.539 0.185 0.972 0.977 1.000 0.998 0.515 0.769 0.351 0.865 0.692 0.570 0.523