| INFO: 2024-07-13 15:58:14,457: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] |
| INFO: 2024-07-13 15:58:14,458: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-13 15:58:14,458: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 15:58:16,153: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] |
| INFO: 2024-07-13 15:58:16,153: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-13 15:58:16,153: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 15:58:18,397: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] |
| INFO: 2024-07-13 15:58:18,400: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-13 15:58:18,400: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 15:58:20,105: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] |
| INFO: 2024-07-13 15:58:20,105: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-13 15:58:20,105: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 15:58:22,211: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] |
| INFO: 2024-07-13 15:58:22,212: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-13 15:58:22,212: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 15:58:24,449: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] |
| INFO: 2024-07-13 15:58:24,451: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-13 15:58:24,451: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 15:58:25,653: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] |
| INFO: 2024-07-13 15:58:25,654: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-13 15:58:25,654: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 15:58:27,286: llmtf.base.darumeru/MultiQ: Loading Dataset: 12.83s |
| INFO: 2024-07-13 15:58:29,574: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 3.92s |
| INFO: 2024-07-13 15:58:29,934: llmtf.base.daru/treewayabstractive: Loading Dataset: 7.72s |
| INFO: 2024-07-13 15:58:32,475: llmtf.base.daru/treewayextractive: Loading Dataset: 8.02s |
| INFO: 2024-07-13 15:59:07,707: llmtf.base.darumeru/ruMMLU: Loading Dataset: 51.55s |
| INFO: 2024-07-13 16:01:36,434: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 196.33s |
| INFO: 2024-07-13 16:01:41,380: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 202.98s |
| INFO: 2024-07-13 16:06:57,037: llmtf.base.daru/treewayextractive: Processing Dataset: 504.56s |
| INFO: 2024-07-13 16:06:57,042: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: |
| INFO: 2024-07-13 16:06:57,306: llmtf.base.daru/treewayextractive: {'r-prec': 0.4038567821067821} |
| INFO: 2024-07-13 16:06:57,370: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-13 16:06:57,375: llmtf.base.evaluator: |
| mean daru/treewayextractive |
| 0.404 0.404 |
| INFO: 2024-07-13 16:07:09,314: llmtf.base.darumeru/MultiQ: Processing Dataset: 522.01s |
| INFO: 2024-07-13 16:07:09,315: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: |
| INFO: 2024-07-13 16:07:09,320: llmtf.base.darumeru/MultiQ: {'f1': 0.5686661277961469, 'em': 0.4980879541108987} |
| INFO: 2024-07-13 16:07:09,331: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-13 16:07:09,331: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 16:07:12,054: llmtf.base.darumeru/PARus: Loading Dataset: 2.72s |
| INFO: 2024-07-13 16:07:30,044: llmtf.base.darumeru/PARus: Processing Dataset: 17.99s |
| INFO: 2024-07-13 16:07:30,046: llmtf.base.darumeru/PARus: Results for darumeru/PARus: |
| INFO: 2024-07-13 16:07:30,092: llmtf.base.darumeru/PARus: {'acc': 0.83} |
| INFO: 2024-07-13 16:07:30,093: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-13 16:07:30,093: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 16:07:32,671: llmtf.base.darumeru/RCB: Loading Dataset: 2.58s |
| INFO: 2024-07-13 16:07:59,066: llmtf.base.darumeru/RCB: Processing Dataset: 26.39s |
| INFO: 2024-07-13 16:07:59,068: llmtf.base.darumeru/RCB: Results for darumeru/RCB: |
| INFO: 2024-07-13 16:07:59,092: llmtf.base.darumeru/RCB: {'acc': 0.5318181818181819, 'f1_macro': 0.4819804386277897} |
| INFO: 2024-07-13 16:07:59,094: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-13 16:07:59,095: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 16:08:07,609: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 8.51s |
| INFO: 2024-07-13 16:09:56,386: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 686.81s |
| INFO: 2024-07-13 16:09:56,404: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: |
| INFO: 2024-07-13 16:09:56,408: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.369814709505374, 'len': 0.9988464362460299, 'lcs': 0.9808426093801238} |
| INFO: 2024-07-13 16:09:56,411: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-13 16:09:56,412: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 16:09:59,952: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.54s |
| INFO: 2024-07-13 16:11:01,969: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 174.36s |
| INFO: 2024-07-13 16:11:01,971: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: |
| INFO: 2024-07-13 16:11:01,984: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7538659793814433, 'f1_macro': 0.7551200071805053} |
| INFO: 2024-07-13 16:11:02,001: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-13 16:11:02,001: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 16:11:06,320: llmtf.base.darumeru/ruTiE: Loading Dataset: 4.32s |
| INFO: 2024-07-13 16:15:29,583: llmtf.base.darumeru/ruTiE: Processing Dataset: 263.25s |
| INFO: 2024-07-13 16:15:29,589: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE: |
| INFO: 2024-07-13 16:15:29,619: llmtf.base.darumeru/ruTiE: {'acc': 0.5395348837209303} |
| INFO: 2024-07-13 16:15:29,622: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-13 16:15:29,623: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 16:15:31,996: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.37s |
| INFO: 2024-07-13 16:15:42,231: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 10.23s |
| INFO: 2024-07-13 16:15:42,234: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: |
| INFO: 2024-07-13 16:15:42,253: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8761904761904762, 'f1_macro': 0.8761420630173862} |
| INFO: 2024-07-13 16:15:42,255: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-13 16:15:42,255: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 16:15:45,297: llmtf.base.darumeru/RWSD: Loading Dataset: 3.04s |
| INFO: 2024-07-13 16:16:08,974: llmtf.base.darumeru/RWSD: Processing Dataset: 23.66s |
| INFO: 2024-07-13 16:16:08,993: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: |
| INFO: 2024-07-13 16:16:08,997: llmtf.base.darumeru/RWSD: {'acc': 0.6078431372549019} |
| INFO: 2024-07-13 16:16:08,999: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-13 16:16:08,999: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 16:16:16,493: llmtf.base.darumeru/USE: Loading Dataset: 7.49s |
| INFO: 2024-07-13 16:18:57,294: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 537.33s |
| INFO: 2024-07-13 16:18:57,297: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: |
| INFO: 2024-07-13 16:18:57,316: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 3.8994152226580563, 'len': 0.9995035620835028, 'lcs': 0.9936840637058483} |
| INFO: 2024-07-13 16:18:57,319: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-13 16:18:57,319: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 16:19:00,359: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 3.04s |
| INFO: 2024-07-13 16:22:33,063: llmtf.base.darumeru/ruMMLU: Processing Dataset: 1405.35s |
| INFO: 2024-07-13 16:22:33,065: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: |
| INFO: 2024-07-13 16:22:33,095: llmtf.base.darumeru/ruMMLU: {'acc': 0.48737902823505935} |
| INFO: 2024-07-13 16:22:33,217: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-13 16:22:33,361: llmtf.base.evaluator: |
| mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree |
| 0.685 0.404 0.533 0.830 0.507 0.608 1.000 0.999 0.487 0.754 0.540 0.876 |
| INFO: 2024-07-13 16:22:46,117: llmtf.base.darumeru/USE: Processing Dataset: 389.62s |
| INFO: 2024-07-13 16:22:46,120: llmtf.base.darumeru/USE: Results for darumeru/USE: |
| INFO: 2024-07-13 16:22:46,141: llmtf.base.darumeru/USE: {'grade_norm': 0.12156862745098038} |
| INFO: 2024-07-13 16:22:46,149: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-13 16:22:46,149: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 16:22:52,510: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 1276.06s |
| INFO: 2024-07-13 16:22:52,530: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: |
| INFO: 2024-07-13 16:22:52,571: llmtf.base.nlpcoreteam/enMMLU: metric |
| subject |
| abstract_algebra 0.310000 |
| anatomy 0.644444 |
| astronomy 0.677632 |
| business_ethics 0.650000 |
| clinical_knowledge 0.720755 |
| college_biology 0.763889 |
| college_chemistry 0.480000 |
| college_computer_science 0.570000 |
| college_mathematics 0.400000 |
| college_medicine 0.676301 |
| college_physics 0.372549 |
| computer_security 0.760000 |
| conceptual_physics 0.591489 |
| econometrics 0.473684 |
| electrical_engineering 0.551724 |
| elementary_mathematics 0.396825 |
| formal_logic 0.492063 |
| global_facts 0.320000 |
| high_school_biology 0.780645 |
| high_school_chemistry 0.487685 |
| high_school_computer_science 0.680000 |
| high_school_european_history 0.806061 |
| high_school_geography 0.787879 |
| high_school_government_and_politics 0.891192 |
| high_school_macroeconomics 0.643590 |
| high_school_mathematics 0.355556 |
| high_school_microeconomics 0.663866 |
| high_school_physics 0.364238 |
| high_school_psychology 0.834862 |
| high_school_statistics 0.486111 |
| high_school_us_history 0.838235 |
| high_school_world_history 0.835443 |
| human_aging 0.708520 |
| human_sexuality 0.763359 |
| international_law 0.809917 |
| jurisprudence 0.750000 |
| logical_fallacies 0.791411 |
| machine_learning 0.491071 |
| management 0.834951 |
| marketing 0.880342 |
| medical_genetics 0.740000 |
| miscellaneous 0.822478 |
| moral_disputes 0.728324 |
| moral_scenarios 0.271508 |
| nutrition 0.722222 |
| philosophy 0.710611 |
| prehistory 0.762346 |
| professional_accounting 0.492908 |
| professional_law 0.481095 |
| professional_medicine 0.713235 |
| professional_psychology 0.638889 |
| public_relations 0.645455 |
| security_studies 0.742857 |
| sociology 0.840796 |
| us_foreign_policy 0.840000 |
| virology 0.530120 |
| world_religions 0.818713 |
| INFO: 2024-07-13 16:22:52,578: llmtf.base.nlpcoreteam/enMMLU: metric |
| subject |
| STEM 0.528856 |
| humanities 0.699671 |
| other (business, health, misc.) 0.675448 |
| social sciences 0.730536 |
| INFO: 2024-07-13 16:22:52,586: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6586279387925342} |
| INFO: 2024-07-13 16:22:52,657: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-13 16:22:52,667: llmtf.base.evaluator: |
| mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU |
| 0.640 0.404 0.533 0.830 0.507 0.608 0.122 1.000 0.999 0.487 0.754 0.540 0.876 0.659 |
| INFO: 2024-07-13 16:22:58,449: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 12.30s |
| INFO: 2024-07-13 16:27:06,757: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 248.31s |
| INFO: 2024-07-13 16:27:06,762: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: |
| INFO: 2024-07-13 16:27:06,773: llmtf.base.russiannlp/rucola_custom: {'acc': 0.736275565123789, 'mcc': 0.37026925316854403} |
| INFO: 2024-07-13 16:27:06,786: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-13 16:27:06,830: llmtf.base.evaluator: |
| mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom |
| 0.634 0.404 0.533 0.830 0.507 0.608 0.122 1.000 0.999 0.487 0.754 0.540 0.876 0.659 0.553 |
| INFO: 2024-07-13 16:31:58,387: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 1817.00s |
| INFO: 2024-07-13 16:31:58,391: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: |
| INFO: 2024-07-13 16:31:58,435: llmtf.base.nlpcoreteam/ruMMLU: metric |
| subject |
| abstract_algebra 0.300000 |
| anatomy 0.392593 |
| astronomy 0.565789 |
| business_ethics 0.560000 |
| clinical_knowledge 0.554717 |
| college_biology 0.465278 |
| college_chemistry 0.410000 |
| college_computer_science 0.500000 |
| college_mathematics 0.370000 |
| college_medicine 0.560694 |
| college_physics 0.333333 |
| computer_security 0.580000 |
| conceptual_physics 0.472340 |
| econometrics 0.403509 |
| electrical_engineering 0.503448 |
| elementary_mathematics 0.362434 |
| formal_logic 0.357143 |
| global_facts 0.320000 |
| high_school_biology 0.609677 |
| high_school_chemistry 0.389163 |
| high_school_computer_science 0.640000 |
| high_school_european_history 0.672727 |
| high_school_geography 0.671717 |
| high_school_government_and_politics 0.652850 |
| high_school_macroeconomics 0.515385 |
| high_school_mathematics 0.318519 |
| high_school_microeconomics 0.521008 |
| high_school_physics 0.337748 |
| high_school_psychology 0.656881 |
| high_school_statistics 0.430556 |
| high_school_us_history 0.725490 |
| high_school_world_history 0.691983 |
| human_aging 0.520179 |
| human_sexuality 0.610687 |
| international_law 0.710744 |
| jurisprudence 0.592593 |
| logical_fallacies 0.503067 |
| machine_learning 0.446429 |
| management 0.669903 |
| marketing 0.735043 |
| medical_genetics 0.540000 |
| miscellaneous 0.607918 |
| moral_disputes 0.580925 |
| moral_scenarios 0.188827 |
| nutrition 0.611111 |
| philosophy 0.575563 |
| prehistory 0.527778 |
| professional_accounting 0.397163 |
| professional_law 0.365059 |
| professional_medicine 0.437500 |
| professional_psychology 0.493464 |
| public_relations 0.545455 |
| security_studies 0.595918 |
| sociology 0.681592 |
| us_foreign_policy 0.680000 |
| virology 0.433735 |
| world_religions 0.748538 |
| INFO: 2024-07-13 16:31:58,442: llmtf.base.nlpcoreteam/ruMMLU: metric |
| subject |
| STEM 0.446373 |
| humanities 0.556957 |
| other (business, health, misc.) 0.524325 |
| social sciences 0.585705 |
| INFO: 2024-07-13 16:31:58,451: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5283401236619901} |
| INFO: 2024-07-13 16:31:58,534: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-13 16:31:58,610: llmtf.base.evaluator: |
| mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
| 0.627 0.404 0.533 0.830 0.507 0.608 0.122 1.000 0.999 0.487 0.754 0.540 0.876 0.659 0.528 0.553 |
| INFO: 2024-07-13 16:33:42,900: llmtf.base.daru/treewayabstractive: Processing Dataset: 2112.96s |
| INFO: 2024-07-13 16:33:42,904: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: |
| INFO: 2024-07-13 16:33:42,937: llmtf.base.daru/treewayabstractive: {'rouge1': 0.357438599714093, 'rouge2': 0.13372912507444903} |
| INFO: 2024-07-13 16:33:42,941: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-13 16:33:42,951: llmtf.base.evaluator: |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
| 0.603 0.246 0.404 0.533 0.830 0.507 0.608 0.122 1.000 0.999 0.487 0.754 0.540 0.876 0.659 0.528 0.553 |
| INFO: 2024-07-13 16:34:08,837: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 908.48s |
| INFO: 2024-07-13 16:34:08,840: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: |
| INFO: 2024-07-13 16:34:08,857: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.4702341373518975, 'len': 0.9993717494721948, 'lcs': 0.958885193897962} |
| INFO: 2024-07-13 16:34:08,859: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-13 16:34:08,859: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 16:34:11,992: llmtf.base.darumeru/cp_para_en: Loading Dataset: 3.13s |
| INFO: 2024-07-13 16:45:47,059: llmtf.base.darumeru/cp_para_en: Processing Dataset: 695.07s |
| INFO: 2024-07-13 16:45:47,066: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: |
| INFO: 2024-07-13 16:45:47,099: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 3.960763996832381, 'len': 0.9995281850843424, 'lcs': 0.9811766452032213} |
| INFO: 2024-07-13 16:45:47,100: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-13 16:45:47,126: llmtf.base.evaluator: |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
| 0.644 0.246 0.404 0.533 0.830 0.507 0.608 0.122 0.981 0.959 1.000 0.999 0.487 0.754 0.540 0.876 0.659 0.528 0.553 |
|
|