INFO: 2024-07-13 15:18:29,367: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] INFO: 2024-07-13 15:18:29,368: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:18:29,368: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:18:30,101: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] INFO: 2024-07-13 15:18:30,101: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:18:30,102: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:18:31,006: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] INFO: 2024-07-13 15:18:31,007: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:18:31,007: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:18:33,846: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] INFO: 2024-07-13 15:18:33,846: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:18:33,846: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:18:34,873: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-07-13 15:18:34,874: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:18:34,874: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:18:36,947: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] INFO: 2024-07-13 15:18:36,948: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:18:36,948: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:18:39,585: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] INFO: 2024-07-13 15:18:39,585: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:18:39,585: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:18:42,261: llmtf.base.darumeru/MultiQ: Loading Dataset: 12.89s INFO: 2024-07-13 15:18:43,245: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 3.66s INFO: 2024-07-13 15:18:43,377: llmtf.base.daru/treewayabstractive: Loading Dataset: 8.50s INFO: 2024-07-13 15:18:44,950: llmtf.base.daru/treewayextractive: Loading Dataset: 8.00s INFO: 2024-07-13 15:19:21,718: llmtf.base.darumeru/ruMMLU: Loading Dataset: 51.62s INFO: 2024-07-13 15:23:45,855: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] INFO: 2024-07-13 15:23:45,858: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:23:45,858: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:23:46,328: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] INFO: 2024-07-13 15:23:46,329: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:23:46,329: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:23:48,239: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] INFO: 2024-07-13 15:23:48,240: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:23:48,240: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:23:50,172: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] INFO: 2024-07-13 15:23:50,172: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:23:50,172: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:23:52,594: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-07-13 15:23:52,594: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:23:52,594: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:23:53,731: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] INFO: 2024-07-13 15:23:53,732: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:23:53,732: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:23:55,589: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] INFO: 2024-07-13 15:23:55,589: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:23:55,589: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:23:58,285: llmtf.base.darumeru/MultiQ: Loading Dataset: 12.43s INFO: 2024-07-13 15:23:59,075: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 3.49s INFO: 2024-07-13 15:24:00,764: llmtf.base.daru/treewayabstractive: Loading Dataset: 8.17s INFO: 2024-07-13 15:24:01,255: llmtf.base.daru/treewayextractive: Loading Dataset: 7.52s INFO: 2024-07-13 15:24:37,276: llmtf.base.darumeru/ruMMLU: Loading Dataset: 50.95s INFO: 2024-07-13 15:27:06,687: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 196.51s INFO: 2024-07-13 15:27:15,808: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 207.57s INFO: 2024-07-13 15:29:44,399: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 345.32s INFO: 2024-07-13 15:29:44,403: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: INFO: 2024-07-13 15:29:44,407: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.3701923347659983, 'len': 0.9987691197336923, 'lcs': 0.9819406016228798} INFO: 2024-07-13 15:29:44,410: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:29:44,410: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:29:47,896: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.48s INFO: 2024-07-13 15:32:14,981: llmtf.base.daru/treewayextractive: Processing Dataset: 493.72s INFO: 2024-07-13 15:32:14,987: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: INFO: 2024-07-13 15:32:15,227: llmtf.base.daru/treewayextractive: {'r-prec': 0.40769011544011546} INFO: 2024-07-13 15:32:15,287: llmtf.base.evaluator: Ended eval INFO: 2024-07-13 15:32:15,293: llmtf.base.evaluator: mean daru/treewayextractive darumeru/cp_sent_ru 0.703 0.408 0.999 INFO: 2024-07-13 15:33:08,688: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 200.79s INFO: 2024-07-13 15:33:08,691: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: INFO: 2024-07-13 15:33:08,708: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 3.8994152226580563, 'len': 0.9995035620835028, 'lcs': 0.9936840637058483} INFO: 2024-07-13 15:33:08,711: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:33:08,711: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:33:11,469: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.76s INFO: 2024-07-13 15:33:32,789: llmtf.base.darumeru/MultiQ: Processing Dataset: 574.49s INFO: 2024-07-13 15:33:32,791: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: INFO: 2024-07-13 15:33:32,796: llmtf.base.darumeru/MultiQ: {'f1': 0.5726350715356451, 'em': 0.5019120458891013} INFO: 2024-07-13 15:33:32,807: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:33:32,808: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:33:35,547: llmtf.base.darumeru/PARus: Loading Dataset: 2.74s INFO: 2024-07-13 15:33:51,177: llmtf.base.darumeru/PARus: Processing Dataset: 15.63s INFO: 2024-07-13 15:33:51,179: llmtf.base.darumeru/PARus: Results for darumeru/PARus: INFO: 2024-07-13 15:33:51,191: llmtf.base.darumeru/PARus: {'acc': 0.83} INFO: 2024-07-13 15:33:51,193: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:33:51,193: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:33:54,244: llmtf.base.darumeru/RCB: Loading Dataset: 3.05s INFO: 2024-07-13 15:34:20,224: llmtf.base.darumeru/RCB: Processing Dataset: 25.98s INFO: 2024-07-13 15:34:20,241: llmtf.base.darumeru/RCB: Results for darumeru/RCB: INFO: 2024-07-13 15:34:20,248: llmtf.base.darumeru/RCB: {'acc': 0.5181818181818182, 'f1_macro': 0.46564877615699873} INFO: 2024-07-13 15:34:20,250: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:34:20,250: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:34:28,734: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 8.48s INFO: 2024-07-13 15:37:02,786: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 154.05s INFO: 2024-07-13 15:37:02,802: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: INFO: 2024-07-13 15:37:02,816: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7525773195876289, 'f1_macro': 0.7540227232789819} INFO: 2024-07-13 15:37:02,832: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:37:02,832: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:37:07,215: llmtf.base.darumeru/ruTiE: Loading Dataset: 4.38s INFO: 2024-07-13 15:41:29,256: llmtf.base.darumeru/ruTiE: Processing Dataset: 262.04s INFO: 2024-07-13 15:41:29,260: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE: INFO: 2024-07-13 15:41:29,289: llmtf.base.darumeru/ruTiE: {'acc': 0.5372093023255814} INFO: 2024-07-13 15:41:29,292: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:41:29,292: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:41:32,242: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.95s INFO: 2024-07-13 15:41:41,454: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 9.21s INFO: 2024-07-13 15:41:41,471: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: INFO: 2024-07-13 15:41:41,493: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8857142857142857, 'f1_macro': 0.8846523292790873} INFO: 2024-07-13 15:41:41,494: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:41:41,494: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:41:45,149: llmtf.base.darumeru/RWSD: Loading Dataset: 3.65s INFO: 2024-07-13 15:42:09,254: llmtf.base.darumeru/RWSD: Processing Dataset: 24.10s INFO: 2024-07-13 15:42:09,256: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: INFO: 2024-07-13 15:42:09,261: llmtf.base.darumeru/RWSD: {'acc': 0.6078431372549019} INFO: 2024-07-13 15:42:09,263: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:42:09,263: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:42:16,716: llmtf.base.darumeru/USE: Loading Dataset: 7.45s INFO: 2024-07-13 15:46:19,569: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 1152.88s INFO: 2024-07-13 15:46:19,575: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: INFO: 2024-07-13 15:46:19,620: llmtf.base.nlpcoreteam/enMMLU: metric subject abstract_algebra 0.330000 anatomy 0.651852 astronomy 0.671053 business_ethics 0.650000 clinical_knowledge 0.720755 college_biology 0.770833 college_chemistry 0.500000 college_computer_science 0.560000 college_mathematics 0.400000 college_medicine 0.676301 college_physics 0.362745 computer_security 0.760000 conceptual_physics 0.578723 econometrics 0.473684 electrical_engineering 0.551724 elementary_mathematics 0.394180 formal_logic 0.492063 global_facts 0.330000 high_school_biology 0.770968 high_school_chemistry 0.487685 high_school_computer_science 0.690000 high_school_european_history 0.806061 high_school_geography 0.792929 high_school_government_and_politics 0.891192 high_school_macroeconomics 0.638462 high_school_mathematics 0.359259 high_school_microeconomics 0.655462 high_school_physics 0.350993 high_school_psychology 0.834862 high_school_statistics 0.476852 high_school_us_history 0.823529 high_school_world_history 0.831224 human_aging 0.717489 human_sexuality 0.770992 international_law 0.801653 jurisprudence 0.750000 logical_fallacies 0.797546 machine_learning 0.508929 management 0.844660 marketing 0.880342 medical_genetics 0.740000 miscellaneous 0.826309 moral_disputes 0.734104 moral_scenarios 0.269274 nutrition 0.725490 philosophy 0.710611 prehistory 0.762346 professional_accounting 0.475177 professional_law 0.481747 professional_medicine 0.709559 professional_psychology 0.640523 public_relations 0.654545 security_studies 0.738776 sociology 0.830846 us_foreign_policy 0.850000 virology 0.524096 world_religions 0.818713 INFO: 2024-07-13 15:46:19,627: llmtf.base.nlpcoreteam/enMMLU: metric subject STEM 0.529108 humanities 0.698375 other (business, health, misc.) 0.676574 social sciences 0.731023 INFO: 2024-07-13 15:46:19,635: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6587697555779514} INFO: 2024-07-13 15:46:19,704: llmtf.base.evaluator: Ended eval INFO: 2024-07-13 15:46:19,740: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU 0.701 0.408 0.537 0.830 0.492 0.608 1.000 0.999 0.753 0.537 0.885 0.659 INFO: 2024-07-13 15:46:23,553: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 792.08s INFO: 2024-07-13 15:46:23,572: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: INFO: 2024-07-13 15:46:23,603: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.4704173225051846, 'len': 0.9993025871189104, 'lcs': 0.9552661852470385} INFO: 2024-07-13 15:46:23,606: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:46:23,606: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:46:26,330: llmtf.base.darumeru/cp_para_en: Loading Dataset: 2.72s INFO: 2024-07-13 15:48:32,771: llmtf.base.darumeru/USE: Processing Dataset: 376.05s INFO: 2024-07-13 15:48:32,775: llmtf.base.darumeru/USE: Results for darumeru/USE: INFO: 2024-07-13 15:48:32,780: llmtf.base.darumeru/USE: {'grade_norm': 0.12352941176470587} INFO: 2024-07-13 15:48:32,787: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] INFO: 2024-07-13 15:48:32,787: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] INFO: 2024-07-13 15:48:44,556: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 11.77s INFO: 2024-07-13 15:50:07,016: llmtf.base.darumeru/ruMMLU: Processing Dataset: 1529.74s INFO: 2024-07-13 15:50:07,019: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: INFO: 2024-07-13 15:50:07,028: llmtf.base.darumeru/ruMMLU: {'acc': 0.4868801755961289} INFO: 2024-07-13 15:50:07,113: llmtf.base.evaluator: Ended eval INFO: 2024-07-13 15:50:07,146: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU 0.662 0.408 0.537 0.830 0.492 0.608 0.124 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659 INFO: 2024-07-13 15:52:21,515: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 216.96s INFO: 2024-07-13 15:52:21,520: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: INFO: 2024-07-13 15:52:21,533: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7384284176533907, 'mcc': 0.3763427268436289} INFO: 2024-07-13 15:52:21,545: llmtf.base.evaluator: Ended eval INFO: 2024-07-13 15:52:21,562: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom 0.655 0.408 0.537 0.830 0.492 0.608 0.124 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659 0.557 INFO: 2024-07-13 15:54:48,357: llmtf.base.daru/treewayabstractive: Processing Dataset: 1847.59s INFO: 2024-07-13 15:54:48,390: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: INFO: 2024-07-13 15:54:48,397: llmtf.base.daru/treewayabstractive: {'rouge1': 0.34956234095604516, 'rouge2': 0.13050451589110393} INFO: 2024-07-13 15:54:48,402: llmtf.base.evaluator: Ended eval INFO: 2024-07-13 15:54:48,429: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom 0.629 0.240 0.408 0.537 0.830 0.492 0.608 0.124 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659 0.557 INFO: 2024-07-13 15:55:03,040: llmtf.base.darumeru/cp_para_en: Processing Dataset: 516.71s INFO: 2024-07-13 15:55:03,042: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: INFO: 2024-07-13 15:55:03,046: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 3.960763996832381, 'len': 0.9995281850843424, 'lcs': 0.9811766452032213} INFO: 2024-07-13 15:55:03,048: llmtf.base.evaluator: Ended eval INFO: 2024-07-13 15:55:03,057: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom 0.650 0.240 0.408 0.537 0.830 0.492 0.608 0.124 0.981 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659 0.557 INFO: 2024-07-13 15:57:18,212: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 1802.40s INFO: 2024-07-13 15:57:18,228: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: INFO: 2024-07-13 15:57:18,274: llmtf.base.nlpcoreteam/ruMMLU: metric subject abstract_algebra 0.280000 anatomy 0.392593 astronomy 0.565789 business_ethics 0.560000 clinical_knowledge 0.554717 college_biology 0.465278 college_chemistry 0.410000 college_computer_science 0.500000 college_mathematics 0.360000 college_medicine 0.554913 college_physics 0.333333 computer_security 0.590000 conceptual_physics 0.468085 econometrics 0.403509 electrical_engineering 0.503448 elementary_mathematics 0.367725 formal_logic 0.365079 global_facts 0.330000 high_school_biology 0.619355 high_school_chemistry 0.399015 high_school_computer_science 0.640000 high_school_european_history 0.678788 high_school_geography 0.676768 high_school_government_and_politics 0.647668 high_school_macroeconomics 0.512821 high_school_mathematics 0.314815 high_school_microeconomics 0.533613 high_school_physics 0.344371 high_school_psychology 0.651376 high_school_statistics 0.416667 high_school_us_history 0.720588 high_school_world_history 0.679325 human_aging 0.520179 human_sexuality 0.618321 international_law 0.719008 jurisprudence 0.601852 logical_fallacies 0.509202 machine_learning 0.464286 management 0.669903 marketing 0.735043 medical_genetics 0.530000 miscellaneous 0.605364 moral_disputes 0.580925 moral_scenarios 0.189944 nutrition 0.611111 philosophy 0.581994 prehistory 0.524691 professional_accounting 0.397163 professional_law 0.361147 professional_medicine 0.441176 professional_psychology 0.486928 public_relations 0.545455 security_studies 0.595918 sociology 0.681592 us_foreign_policy 0.690000 virology 0.427711 world_religions 0.748538 INFO: 2024-07-13 15:57:18,281: llmtf.base.nlpcoreteam/ruMMLU: metric subject STEM 0.446787 humanities 0.558545 other (business, health, misc.) 0.523562 social sciences 0.586997 INFO: 2024-07-13 15:57:18,303: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5289728961247521} INFO: 2024-07-13 15:57:18,385: llmtf.base.evaluator: Ended eval INFO: 2024-07-13 15:57:18,616: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.643 0.240 0.408 0.537 0.830 0.492 0.608 0.124 0.981 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659 0.529 0.557