RefalMachine's picture
Upload folder using huggingface_hub
b43ce9f verified
INFO: 2024-07-13 15:18:29,367: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-07-13 15:18:29,368: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:29,368: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:30,101: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-07-13 15:18:30,101: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:30,102: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:31,006: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-07-13 15:18:31,007: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:31,007: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:33,846: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-07-13 15:18:33,846: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:33,846: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:34,873: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-13 15:18:34,874: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:34,874: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:36,947: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-07-13 15:18:36,948: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:36,948: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:39,585: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-07-13 15:18:39,585: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:18:39,585: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:18:42,261: llmtf.base.darumeru/MultiQ: Loading Dataset: 12.89s
INFO: 2024-07-13 15:18:43,245: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 3.66s
INFO: 2024-07-13 15:18:43,377: llmtf.base.daru/treewayabstractive: Loading Dataset: 8.50s
INFO: 2024-07-13 15:18:44,950: llmtf.base.daru/treewayextractive: Loading Dataset: 8.00s
INFO: 2024-07-13 15:19:21,718: llmtf.base.darumeru/ruMMLU: Loading Dataset: 51.62s
INFO: 2024-07-13 15:23:45,855: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-07-13 15:23:45,858: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:45,858: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:46,328: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-07-13 15:23:46,329: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:46,329: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:48,239: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-07-13 15:23:48,240: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:48,240: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:50,172: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-07-13 15:23:50,172: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:50,172: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:52,594: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-13 15:23:52,594: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:52,594: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:53,731: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-07-13 15:23:53,732: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:53,732: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:55,589: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-07-13 15:23:55,589: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:23:55,589: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:23:58,285: llmtf.base.darumeru/MultiQ: Loading Dataset: 12.43s
INFO: 2024-07-13 15:23:59,075: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 3.49s
INFO: 2024-07-13 15:24:00,764: llmtf.base.daru/treewayabstractive: Loading Dataset: 8.17s
INFO: 2024-07-13 15:24:01,255: llmtf.base.daru/treewayextractive: Loading Dataset: 7.52s
INFO: 2024-07-13 15:24:37,276: llmtf.base.darumeru/ruMMLU: Loading Dataset: 50.95s
INFO: 2024-07-13 15:27:06,687: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 196.51s
INFO: 2024-07-13 15:27:15,808: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 207.57s
INFO: 2024-07-13 15:29:44,399: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 345.32s
INFO: 2024-07-13 15:29:44,403: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-07-13 15:29:44,407: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.3701923347659983, 'len': 0.9987691197336923, 'lcs': 0.9819406016228798}
INFO: 2024-07-13 15:29:44,410: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:29:44,410: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:29:47,896: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.48s
INFO: 2024-07-13 15:32:14,981: llmtf.base.daru/treewayextractive: Processing Dataset: 493.72s
INFO: 2024-07-13 15:32:14,987: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-07-13 15:32:15,227: llmtf.base.daru/treewayextractive: {'r-prec': 0.40769011544011546}
INFO: 2024-07-13 15:32:15,287: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:32:15,293: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/cp_sent_ru
0.703 0.408 0.999
INFO: 2024-07-13 15:33:08,688: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 200.79s
INFO: 2024-07-13 15:33:08,691: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-07-13 15:33:08,708: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 3.8994152226580563, 'len': 0.9995035620835028, 'lcs': 0.9936840637058483}
INFO: 2024-07-13 15:33:08,711: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:33:08,711: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:33:11,469: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.76s
INFO: 2024-07-13 15:33:32,789: llmtf.base.darumeru/MultiQ: Processing Dataset: 574.49s
INFO: 2024-07-13 15:33:32,791: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-07-13 15:33:32,796: llmtf.base.darumeru/MultiQ: {'f1': 0.5726350715356451, 'em': 0.5019120458891013}
INFO: 2024-07-13 15:33:32,807: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:33:32,808: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:33:35,547: llmtf.base.darumeru/PARus: Loading Dataset: 2.74s
INFO: 2024-07-13 15:33:51,177: llmtf.base.darumeru/PARus: Processing Dataset: 15.63s
INFO: 2024-07-13 15:33:51,179: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-07-13 15:33:51,191: llmtf.base.darumeru/PARus: {'acc': 0.83}
INFO: 2024-07-13 15:33:51,193: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:33:51,193: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:33:54,244: llmtf.base.darumeru/RCB: Loading Dataset: 3.05s
INFO: 2024-07-13 15:34:20,224: llmtf.base.darumeru/RCB: Processing Dataset: 25.98s
INFO: 2024-07-13 15:34:20,241: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-07-13 15:34:20,248: llmtf.base.darumeru/RCB: {'acc': 0.5181818181818182, 'f1_macro': 0.46564877615699873}
INFO: 2024-07-13 15:34:20,250: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:34:20,250: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:34:28,734: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 8.48s
INFO: 2024-07-13 15:37:02,786: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 154.05s
INFO: 2024-07-13 15:37:02,802: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-07-13 15:37:02,816: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7525773195876289, 'f1_macro': 0.7540227232789819}
INFO: 2024-07-13 15:37:02,832: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:37:02,832: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:37:07,215: llmtf.base.darumeru/ruTiE: Loading Dataset: 4.38s
INFO: 2024-07-13 15:41:29,256: llmtf.base.darumeru/ruTiE: Processing Dataset: 262.04s
INFO: 2024-07-13 15:41:29,260: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE:
INFO: 2024-07-13 15:41:29,289: llmtf.base.darumeru/ruTiE: {'acc': 0.5372093023255814}
INFO: 2024-07-13 15:41:29,292: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:41:29,292: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:41:32,242: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.95s
INFO: 2024-07-13 15:41:41,454: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 9.21s
INFO: 2024-07-13 15:41:41,471: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-07-13 15:41:41,493: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8857142857142857, 'f1_macro': 0.8846523292790873}
INFO: 2024-07-13 15:41:41,494: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:41:41,494: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:41:45,149: llmtf.base.darumeru/RWSD: Loading Dataset: 3.65s
INFO: 2024-07-13 15:42:09,254: llmtf.base.darumeru/RWSD: Processing Dataset: 24.10s
INFO: 2024-07-13 15:42:09,256: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-07-13 15:42:09,261: llmtf.base.darumeru/RWSD: {'acc': 0.6078431372549019}
INFO: 2024-07-13 15:42:09,263: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:42:09,263: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:42:16,716: llmtf.base.darumeru/USE: Loading Dataset: 7.45s
INFO: 2024-07-13 15:46:19,569: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 1152.88s
INFO: 2024-07-13 15:46:19,575: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-07-13 15:46:19,620: llmtf.base.nlpcoreteam/enMMLU: metric
subject
abstract_algebra 0.330000
anatomy 0.651852
astronomy 0.671053
business_ethics 0.650000
clinical_knowledge 0.720755
college_biology 0.770833
college_chemistry 0.500000
college_computer_science 0.560000
college_mathematics 0.400000
college_medicine 0.676301
college_physics 0.362745
computer_security 0.760000
conceptual_physics 0.578723
econometrics 0.473684
electrical_engineering 0.551724
elementary_mathematics 0.394180
formal_logic 0.492063
global_facts 0.330000
high_school_biology 0.770968
high_school_chemistry 0.487685
high_school_computer_science 0.690000
high_school_european_history 0.806061
high_school_geography 0.792929
high_school_government_and_politics 0.891192
high_school_macroeconomics 0.638462
high_school_mathematics 0.359259
high_school_microeconomics 0.655462
high_school_physics 0.350993
high_school_psychology 0.834862
high_school_statistics 0.476852
high_school_us_history 0.823529
high_school_world_history 0.831224
human_aging 0.717489
human_sexuality 0.770992
international_law 0.801653
jurisprudence 0.750000
logical_fallacies 0.797546
machine_learning 0.508929
management 0.844660
marketing 0.880342
medical_genetics 0.740000
miscellaneous 0.826309
moral_disputes 0.734104
moral_scenarios 0.269274
nutrition 0.725490
philosophy 0.710611
prehistory 0.762346
professional_accounting 0.475177
professional_law 0.481747
professional_medicine 0.709559
professional_psychology 0.640523
public_relations 0.654545
security_studies 0.738776
sociology 0.830846
us_foreign_policy 0.850000
virology 0.524096
world_religions 0.818713
INFO: 2024-07-13 15:46:19,627: llmtf.base.nlpcoreteam/enMMLU: metric
subject
STEM 0.529108
humanities 0.698375
other (business, health, misc.) 0.676574
social sciences 0.731023
INFO: 2024-07-13 15:46:19,635: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6587697555779514}
INFO: 2024-07-13 15:46:19,704: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:46:19,740: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU
0.701 0.408 0.537 0.830 0.492 0.608 1.000 0.999 0.753 0.537 0.885 0.659
INFO: 2024-07-13 15:46:23,553: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 792.08s
INFO: 2024-07-13 15:46:23,572: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-07-13 15:46:23,603: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.4704173225051846, 'len': 0.9993025871189104, 'lcs': 0.9552661852470385}
INFO: 2024-07-13 15:46:23,606: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:46:23,606: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:46:26,330: llmtf.base.darumeru/cp_para_en: Loading Dataset: 2.72s
INFO: 2024-07-13 15:48:32,771: llmtf.base.darumeru/USE: Processing Dataset: 376.05s
INFO: 2024-07-13 15:48:32,775: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-07-13 15:48:32,780: llmtf.base.darumeru/USE: {'grade_norm': 0.12352941176470587}
INFO: 2024-07-13 15:48:32,787: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:48:32,787: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:48:44,556: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 11.77s
INFO: 2024-07-13 15:50:07,016: llmtf.base.darumeru/ruMMLU: Processing Dataset: 1529.74s
INFO: 2024-07-13 15:50:07,019: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-07-13 15:50:07,028: llmtf.base.darumeru/ruMMLU: {'acc': 0.4868801755961289}
INFO: 2024-07-13 15:50:07,113: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:50:07,146: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU
0.662 0.408 0.537 0.830 0.492 0.608 0.124 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659
INFO: 2024-07-13 15:52:21,515: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 216.96s
INFO: 2024-07-13 15:52:21,520: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-07-13 15:52:21,533: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7384284176533907, 'mcc': 0.3763427268436289}
INFO: 2024-07-13 15:52:21,545: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:52:21,562: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom
0.655 0.408 0.537 0.830 0.492 0.608 0.124 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659 0.557
INFO: 2024-07-13 15:54:48,357: llmtf.base.daru/treewayabstractive: Processing Dataset: 1847.59s
INFO: 2024-07-13 15:54:48,390: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-07-13 15:54:48,397: llmtf.base.daru/treewayabstractive: {'rouge1': 0.34956234095604516, 'rouge2': 0.13050451589110393}
INFO: 2024-07-13 15:54:48,402: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:54:48,429: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom
0.629 0.240 0.408 0.537 0.830 0.492 0.608 0.124 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659 0.557
INFO: 2024-07-13 15:55:03,040: llmtf.base.darumeru/cp_para_en: Processing Dataset: 516.71s
INFO: 2024-07-13 15:55:03,042: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-07-13 15:55:03,046: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 3.960763996832381, 'len': 0.9995281850843424, 'lcs': 0.9811766452032213}
INFO: 2024-07-13 15:55:03,048: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:55:03,057: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom
0.650 0.240 0.408 0.537 0.830 0.492 0.608 0.124 0.981 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659 0.557
INFO: 2024-07-13 15:57:18,212: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 1802.40s
INFO: 2024-07-13 15:57:18,228: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-07-13 15:57:18,274: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
abstract_algebra 0.280000
anatomy 0.392593
astronomy 0.565789
business_ethics 0.560000
clinical_knowledge 0.554717
college_biology 0.465278
college_chemistry 0.410000
college_computer_science 0.500000
college_mathematics 0.360000
college_medicine 0.554913
college_physics 0.333333
computer_security 0.590000
conceptual_physics 0.468085
econometrics 0.403509
electrical_engineering 0.503448
elementary_mathematics 0.367725
formal_logic 0.365079
global_facts 0.330000
high_school_biology 0.619355
high_school_chemistry 0.399015
high_school_computer_science 0.640000
high_school_european_history 0.678788
high_school_geography 0.676768
high_school_government_and_politics 0.647668
high_school_macroeconomics 0.512821
high_school_mathematics 0.314815
high_school_microeconomics 0.533613
high_school_physics 0.344371
high_school_psychology 0.651376
high_school_statistics 0.416667
high_school_us_history 0.720588
high_school_world_history 0.679325
human_aging 0.520179
human_sexuality 0.618321
international_law 0.719008
jurisprudence 0.601852
logical_fallacies 0.509202
machine_learning 0.464286
management 0.669903
marketing 0.735043
medical_genetics 0.530000
miscellaneous 0.605364
moral_disputes 0.580925
moral_scenarios 0.189944
nutrition 0.611111
philosophy 0.581994
prehistory 0.524691
professional_accounting 0.397163
professional_law 0.361147
professional_medicine 0.441176
professional_psychology 0.486928
public_relations 0.545455
security_studies 0.595918
sociology 0.681592
us_foreign_policy 0.690000
virology 0.427711
world_religions 0.748538
INFO: 2024-07-13 15:57:18,281: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
STEM 0.446787
humanities 0.558545
other (business, health, misc.) 0.523562
social sciences 0.586997
INFO: 2024-07-13 15:57:18,303: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5289728961247521}
INFO: 2024-07-13 15:57:18,385: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 15:57:18,616: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.643 0.240 0.408 0.537 0.830 0.492 0.608 0.124 0.981 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659 0.529 0.557