| INFO: 2024-07-13 15:18:29,367: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] | |
| INFO: 2024-07-13 15:18:29,368: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:18:29,368: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:18:30,101: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] | |
| INFO: 2024-07-13 15:18:30,101: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:18:30,102: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:18:31,006: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] | |
| INFO: 2024-07-13 15:18:31,007: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:18:31,007: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:18:33,846: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] | |
| INFO: 2024-07-13 15:18:33,846: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:18:33,846: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:18:34,873: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] | |
| INFO: 2024-07-13 15:18:34,874: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:18:34,874: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:18:36,947: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] | |
| INFO: 2024-07-13 15:18:36,948: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:18:36,948: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:18:39,585: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] | |
| INFO: 2024-07-13 15:18:39,585: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:18:39,585: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:18:42,261: llmtf.base.darumeru/MultiQ: Loading Dataset: 12.89s | |
| INFO: 2024-07-13 15:18:43,245: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 3.66s | |
| INFO: 2024-07-13 15:18:43,377: llmtf.base.daru/treewayabstractive: Loading Dataset: 8.50s | |
| INFO: 2024-07-13 15:18:44,950: llmtf.base.daru/treewayextractive: Loading Dataset: 8.00s | |
| INFO: 2024-07-13 15:19:21,718: llmtf.base.darumeru/ruMMLU: Loading Dataset: 51.62s | |
| INFO: 2024-07-13 15:23:45,855: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] | |
| INFO: 2024-07-13 15:23:45,858: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:23:45,858: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:23:46,328: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] | |
| INFO: 2024-07-13 15:23:46,329: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:23:46,329: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:23:48,239: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] | |
| INFO: 2024-07-13 15:23:48,240: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:23:48,240: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:23:50,172: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] | |
| INFO: 2024-07-13 15:23:50,172: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:23:50,172: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:23:52,594: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] | |
| INFO: 2024-07-13 15:23:52,594: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:23:52,594: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:23:53,731: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] | |
| INFO: 2024-07-13 15:23:53,732: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:23:53,732: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:23:55,589: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] | |
| INFO: 2024-07-13 15:23:55,589: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:23:55,589: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:23:58,285: llmtf.base.darumeru/MultiQ: Loading Dataset: 12.43s | |
| INFO: 2024-07-13 15:23:59,075: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 3.49s | |
| INFO: 2024-07-13 15:24:00,764: llmtf.base.daru/treewayabstractive: Loading Dataset: 8.17s | |
| INFO: 2024-07-13 15:24:01,255: llmtf.base.daru/treewayextractive: Loading Dataset: 7.52s | |
| INFO: 2024-07-13 15:24:37,276: llmtf.base.darumeru/ruMMLU: Loading Dataset: 50.95s | |
| INFO: 2024-07-13 15:27:06,687: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 196.51s | |
| INFO: 2024-07-13 15:27:15,808: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 207.57s | |
| INFO: 2024-07-13 15:29:44,399: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 345.32s | |
| INFO: 2024-07-13 15:29:44,403: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: | |
| INFO: 2024-07-13 15:29:44,407: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.3701923347659983, 'len': 0.9987691197336923, 'lcs': 0.9819406016228798} | |
| INFO: 2024-07-13 15:29:44,410: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:29:44,410: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:29:47,896: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.48s | |
| INFO: 2024-07-13 15:32:14,981: llmtf.base.daru/treewayextractive: Processing Dataset: 493.72s | |
| INFO: 2024-07-13 15:32:14,987: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: | |
| INFO: 2024-07-13 15:32:15,227: llmtf.base.daru/treewayextractive: {'r-prec': 0.40769011544011546} | |
| INFO: 2024-07-13 15:32:15,287: llmtf.base.evaluator: Ended eval | |
| INFO: 2024-07-13 15:32:15,293: llmtf.base.evaluator: | |
| mean daru/treewayextractive darumeru/cp_sent_ru | |
| 0.703 0.408 0.999 | |
| INFO: 2024-07-13 15:33:08,688: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 200.79s | |
| INFO: 2024-07-13 15:33:08,691: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: | |
| INFO: 2024-07-13 15:33:08,708: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 3.8994152226580563, 'len': 0.9995035620835028, 'lcs': 0.9936840637058483} | |
| INFO: 2024-07-13 15:33:08,711: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:33:08,711: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:33:11,469: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.76s | |
| INFO: 2024-07-13 15:33:32,789: llmtf.base.darumeru/MultiQ: Processing Dataset: 574.49s | |
| INFO: 2024-07-13 15:33:32,791: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: | |
| INFO: 2024-07-13 15:33:32,796: llmtf.base.darumeru/MultiQ: {'f1': 0.5726350715356451, 'em': 0.5019120458891013} | |
| INFO: 2024-07-13 15:33:32,807: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:33:32,808: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:33:35,547: llmtf.base.darumeru/PARus: Loading Dataset: 2.74s | |
| INFO: 2024-07-13 15:33:51,177: llmtf.base.darumeru/PARus: Processing Dataset: 15.63s | |
| INFO: 2024-07-13 15:33:51,179: llmtf.base.darumeru/PARus: Results for darumeru/PARus: | |
| INFO: 2024-07-13 15:33:51,191: llmtf.base.darumeru/PARus: {'acc': 0.83} | |
| INFO: 2024-07-13 15:33:51,193: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:33:51,193: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:33:54,244: llmtf.base.darumeru/RCB: Loading Dataset: 3.05s | |
| INFO: 2024-07-13 15:34:20,224: llmtf.base.darumeru/RCB: Processing Dataset: 25.98s | |
| INFO: 2024-07-13 15:34:20,241: llmtf.base.darumeru/RCB: Results for darumeru/RCB: | |
| INFO: 2024-07-13 15:34:20,248: llmtf.base.darumeru/RCB: {'acc': 0.5181818181818182, 'f1_macro': 0.46564877615699873} | |
| INFO: 2024-07-13 15:34:20,250: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:34:20,250: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:34:28,734: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 8.48s | |
| INFO: 2024-07-13 15:37:02,786: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 154.05s | |
| INFO: 2024-07-13 15:37:02,802: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: | |
| INFO: 2024-07-13 15:37:02,816: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7525773195876289, 'f1_macro': 0.7540227232789819} | |
| INFO: 2024-07-13 15:37:02,832: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:37:02,832: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:37:07,215: llmtf.base.darumeru/ruTiE: Loading Dataset: 4.38s | |
| INFO: 2024-07-13 15:41:29,256: llmtf.base.darumeru/ruTiE: Processing Dataset: 262.04s | |
| INFO: 2024-07-13 15:41:29,260: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE: | |
| INFO: 2024-07-13 15:41:29,289: llmtf.base.darumeru/ruTiE: {'acc': 0.5372093023255814} | |
| INFO: 2024-07-13 15:41:29,292: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:41:29,292: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:41:32,242: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.95s | |
| INFO: 2024-07-13 15:41:41,454: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 9.21s | |
| INFO: 2024-07-13 15:41:41,471: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: | |
| INFO: 2024-07-13 15:41:41,493: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8857142857142857, 'f1_macro': 0.8846523292790873} | |
| INFO: 2024-07-13 15:41:41,494: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:41:41,494: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:41:45,149: llmtf.base.darumeru/RWSD: Loading Dataset: 3.65s | |
| INFO: 2024-07-13 15:42:09,254: llmtf.base.darumeru/RWSD: Processing Dataset: 24.10s | |
| INFO: 2024-07-13 15:42:09,256: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: | |
| INFO: 2024-07-13 15:42:09,261: llmtf.base.darumeru/RWSD: {'acc': 0.6078431372549019} | |
| INFO: 2024-07-13 15:42:09,263: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:42:09,263: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:42:16,716: llmtf.base.darumeru/USE: Loading Dataset: 7.45s | |
| INFO: 2024-07-13 15:46:19,569: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 1152.88s | |
| INFO: 2024-07-13 15:46:19,575: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: | |
| INFO: 2024-07-13 15:46:19,620: llmtf.base.nlpcoreteam/enMMLU: metric | |
| subject | |
| abstract_algebra 0.330000 | |
| anatomy 0.651852 | |
| astronomy 0.671053 | |
| business_ethics 0.650000 | |
| clinical_knowledge 0.720755 | |
| college_biology 0.770833 | |
| college_chemistry 0.500000 | |
| college_computer_science 0.560000 | |
| college_mathematics 0.400000 | |
| college_medicine 0.676301 | |
| college_physics 0.362745 | |
| computer_security 0.760000 | |
| conceptual_physics 0.578723 | |
| econometrics 0.473684 | |
| electrical_engineering 0.551724 | |
| elementary_mathematics 0.394180 | |
| formal_logic 0.492063 | |
| global_facts 0.330000 | |
| high_school_biology 0.770968 | |
| high_school_chemistry 0.487685 | |
| high_school_computer_science 0.690000 | |
| high_school_european_history 0.806061 | |
| high_school_geography 0.792929 | |
| high_school_government_and_politics 0.891192 | |
| high_school_macroeconomics 0.638462 | |
| high_school_mathematics 0.359259 | |
| high_school_microeconomics 0.655462 | |
| high_school_physics 0.350993 | |
| high_school_psychology 0.834862 | |
| high_school_statistics 0.476852 | |
| high_school_us_history 0.823529 | |
| high_school_world_history 0.831224 | |
| human_aging 0.717489 | |
| human_sexuality 0.770992 | |
| international_law 0.801653 | |
| jurisprudence 0.750000 | |
| logical_fallacies 0.797546 | |
| machine_learning 0.508929 | |
| management 0.844660 | |
| marketing 0.880342 | |
| medical_genetics 0.740000 | |
| miscellaneous 0.826309 | |
| moral_disputes 0.734104 | |
| moral_scenarios 0.269274 | |
| nutrition 0.725490 | |
| philosophy 0.710611 | |
| prehistory 0.762346 | |
| professional_accounting 0.475177 | |
| professional_law 0.481747 | |
| professional_medicine 0.709559 | |
| professional_psychology 0.640523 | |
| public_relations 0.654545 | |
| security_studies 0.738776 | |
| sociology 0.830846 | |
| us_foreign_policy 0.850000 | |
| virology 0.524096 | |
| world_religions 0.818713 | |
| INFO: 2024-07-13 15:46:19,627: llmtf.base.nlpcoreteam/enMMLU: metric | |
| subject | |
| STEM 0.529108 | |
| humanities 0.698375 | |
| other (business, health, misc.) 0.676574 | |
| social sciences 0.731023 | |
| INFO: 2024-07-13 15:46:19,635: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6587697555779514} | |
| INFO: 2024-07-13 15:46:19,704: llmtf.base.evaluator: Ended eval | |
| INFO: 2024-07-13 15:46:19,740: llmtf.base.evaluator: | |
| mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU | |
| 0.701 0.408 0.537 0.830 0.492 0.608 1.000 0.999 0.753 0.537 0.885 0.659 | |
| INFO: 2024-07-13 15:46:23,553: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 792.08s | |
| INFO: 2024-07-13 15:46:23,572: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: | |
| INFO: 2024-07-13 15:46:23,603: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.4704173225051846, 'len': 0.9993025871189104, 'lcs': 0.9552661852470385} | |
| INFO: 2024-07-13 15:46:23,606: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:46:23,606: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:46:26,330: llmtf.base.darumeru/cp_para_en: Loading Dataset: 2.72s | |
| INFO: 2024-07-13 15:48:32,771: llmtf.base.darumeru/USE: Processing Dataset: 376.05s | |
| INFO: 2024-07-13 15:48:32,775: llmtf.base.darumeru/USE: Results for darumeru/USE: | |
| INFO: 2024-07-13 15:48:32,780: llmtf.base.darumeru/USE: {'grade_norm': 0.12352941176470587} | |
| INFO: 2024-07-13 15:48:32,787: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] | |
| INFO: 2024-07-13 15:48:32,787: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] | |
| INFO: 2024-07-13 15:48:44,556: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 11.77s | |
| INFO: 2024-07-13 15:50:07,016: llmtf.base.darumeru/ruMMLU: Processing Dataset: 1529.74s | |
| INFO: 2024-07-13 15:50:07,019: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: | |
| INFO: 2024-07-13 15:50:07,028: llmtf.base.darumeru/ruMMLU: {'acc': 0.4868801755961289} | |
| INFO: 2024-07-13 15:50:07,113: llmtf.base.evaluator: Ended eval | |
| INFO: 2024-07-13 15:50:07,146: llmtf.base.evaluator: | |
| mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU | |
| 0.662 0.408 0.537 0.830 0.492 0.608 0.124 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659 | |
| INFO: 2024-07-13 15:52:21,515: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 216.96s | |
| INFO: 2024-07-13 15:52:21,520: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: | |
| INFO: 2024-07-13 15:52:21,533: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7384284176533907, 'mcc': 0.3763427268436289} | |
| INFO: 2024-07-13 15:52:21,545: llmtf.base.evaluator: Ended eval | |
| INFO: 2024-07-13 15:52:21,562: llmtf.base.evaluator: | |
| mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom | |
| 0.655 0.408 0.537 0.830 0.492 0.608 0.124 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659 0.557 | |
| INFO: 2024-07-13 15:54:48,357: llmtf.base.daru/treewayabstractive: Processing Dataset: 1847.59s | |
| INFO: 2024-07-13 15:54:48,390: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: | |
| INFO: 2024-07-13 15:54:48,397: llmtf.base.daru/treewayabstractive: {'rouge1': 0.34956234095604516, 'rouge2': 0.13050451589110393} | |
| INFO: 2024-07-13 15:54:48,402: llmtf.base.evaluator: Ended eval | |
| INFO: 2024-07-13 15:54:48,429: llmtf.base.evaluator: | |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom | |
| 0.629 0.240 0.408 0.537 0.830 0.492 0.608 0.124 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659 0.557 | |
| INFO: 2024-07-13 15:55:03,040: llmtf.base.darumeru/cp_para_en: Processing Dataset: 516.71s | |
| INFO: 2024-07-13 15:55:03,042: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: | |
| INFO: 2024-07-13 15:55:03,046: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 3.960763996832381, 'len': 0.9995281850843424, 'lcs': 0.9811766452032213} | |
| INFO: 2024-07-13 15:55:03,048: llmtf.base.evaluator: Ended eval | |
| INFO: 2024-07-13 15:55:03,057: llmtf.base.evaluator: | |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom | |
| 0.650 0.240 0.408 0.537 0.830 0.492 0.608 0.124 0.981 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659 0.557 | |
| INFO: 2024-07-13 15:57:18,212: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 1802.40s | |
| INFO: 2024-07-13 15:57:18,228: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: | |
| INFO: 2024-07-13 15:57:18,274: llmtf.base.nlpcoreteam/ruMMLU: metric | |
| subject | |
| abstract_algebra 0.280000 | |
| anatomy 0.392593 | |
| astronomy 0.565789 | |
| business_ethics 0.560000 | |
| clinical_knowledge 0.554717 | |
| college_biology 0.465278 | |
| college_chemistry 0.410000 | |
| college_computer_science 0.500000 | |
| college_mathematics 0.360000 | |
| college_medicine 0.554913 | |
| college_physics 0.333333 | |
| computer_security 0.590000 | |
| conceptual_physics 0.468085 | |
| econometrics 0.403509 | |
| electrical_engineering 0.503448 | |
| elementary_mathematics 0.367725 | |
| formal_logic 0.365079 | |
| global_facts 0.330000 | |
| high_school_biology 0.619355 | |
| high_school_chemistry 0.399015 | |
| high_school_computer_science 0.640000 | |
| high_school_european_history 0.678788 | |
| high_school_geography 0.676768 | |
| high_school_government_and_politics 0.647668 | |
| high_school_macroeconomics 0.512821 | |
| high_school_mathematics 0.314815 | |
| high_school_microeconomics 0.533613 | |
| high_school_physics 0.344371 | |
| high_school_psychology 0.651376 | |
| high_school_statistics 0.416667 | |
| high_school_us_history 0.720588 | |
| high_school_world_history 0.679325 | |
| human_aging 0.520179 | |
| human_sexuality 0.618321 | |
| international_law 0.719008 | |
| jurisprudence 0.601852 | |
| logical_fallacies 0.509202 | |
| machine_learning 0.464286 | |
| management 0.669903 | |
| marketing 0.735043 | |
| medical_genetics 0.530000 | |
| miscellaneous 0.605364 | |
| moral_disputes 0.580925 | |
| moral_scenarios 0.189944 | |
| nutrition 0.611111 | |
| philosophy 0.581994 | |
| prehistory 0.524691 | |
| professional_accounting 0.397163 | |
| professional_law 0.361147 | |
| professional_medicine 0.441176 | |
| professional_psychology 0.486928 | |
| public_relations 0.545455 | |
| security_studies 0.595918 | |
| sociology 0.681592 | |
| us_foreign_policy 0.690000 | |
| virology 0.427711 | |
| world_religions 0.748538 | |
| INFO: 2024-07-13 15:57:18,281: llmtf.base.nlpcoreteam/ruMMLU: metric | |
| subject | |
| STEM 0.446787 | |
| humanities 0.558545 | |
| other (business, health, misc.) 0.523562 | |
| social sciences 0.586997 | |
| INFO: 2024-07-13 15:57:18,303: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5289728961247521} | |
| INFO: 2024-07-13 15:57:18,385: llmtf.base.evaluator: Ended eval | |
| INFO: 2024-07-13 15:57:18,616: llmtf.base.evaluator: | |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom | |
| 0.643 0.240 0.408 0.537 0.830 0.492 0.608 0.124 0.981 0.955 1.000 0.999 0.487 0.753 0.537 0.885 0.659 0.529 0.557 | |