| INFO: 2024-07-12 12:01:15,409: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] |
| INFO: 2024-07-12 12:01:15,410: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13] |
| INFO: 2024-07-12 12:01:15,410: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:01:16,412: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] |
| INFO: 2024-07-12 12:01:16,412: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13] |
| INFO: 2024-07-12 12:01:16,412: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:01:17,736: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] |
| INFO: 2024-07-12 12:01:17,737: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-12 12:01:17,737: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-12 12:01:19,755: llmtf.base.darumeru/MultiQ: Loading Dataset: 4.34s |
| INFO: 2024-07-12 12:01:20,121: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] |
| INFO: 2024-07-12 12:01:20,121: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-12 12:01:20,121: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-12 12:01:21,970: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] |
| INFO: 2024-07-12 12:01:21,970: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13] |
| INFO: 2024-07-12 12:01:21,970: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:01:23,966: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] |
| INFO: 2024-07-12 12:01:23,969: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-12 12:01:23,970: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-12 12:01:25,583: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] |
| INFO: 2024-07-12 12:01:25,589: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13] |
| INFO: 2024-07-12 12:01:25,589: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:01:27,070: llmtf.base.daru/treewayabstractive: Loading Dataset: 5.10s |
| INFO: 2024-07-12 12:01:28,289: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 2.70s |
| INFO: 2024-07-12 12:01:28,621: llmtf.base.darumeru/ruMMLU: Loading Dataset: 12.21s |
| INFO: 2024-07-12 12:01:31,710: llmtf.base.daru/treewayextractive: Loading Dataset: 7.74s |
| INFO: 2024-07-12 12:03:37,385: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 139.65s |
| INFO: 2024-07-12 12:03:39,839: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 139.72s |
| INFO: 2024-07-12 12:07:14,847: llmtf.base.darumeru/MultiQ: Processing Dataset: 355.09s |
| INFO: 2024-07-12 12:07:14,849: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: |
| INFO: 2024-07-12 12:07:14,853: llmtf.base.darumeru/MultiQ: {'f1': 0.5552794495909491, 'em': 0.4751434034416826} |
| INFO: 2024-07-12 12:07:14,860: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13] |
| INFO: 2024-07-12 12:07:14,860: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:07:17,397: llmtf.base.darumeru/PARus: Loading Dataset: 2.54s |
| INFO: 2024-07-12 12:07:26,411: llmtf.base.darumeru/PARus: Processing Dataset: 9.01s |
| INFO: 2024-07-12 12:07:26,412: llmtf.base.darumeru/PARus: Results for darumeru/PARus: |
| INFO: 2024-07-12 12:07:26,436: llmtf.base.darumeru/PARus: {'acc': 0.84} |
| INFO: 2024-07-12 12:07:26,437: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13] |
| INFO: 2024-07-12 12:07:26,437: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:07:28,524: llmtf.base.darumeru/RCB: Loading Dataset: 2.09s |
| INFO: 2024-07-12 12:07:40,423: llmtf.base.darumeru/RCB: Processing Dataset: 11.88s |
| INFO: 2024-07-12 12:07:40,425: llmtf.base.darumeru/RCB: Results for darumeru/RCB: |
| INFO: 2024-07-12 12:07:40,431: llmtf.base.darumeru/RCB: {'acc': 0.5181818181818182, 'f1_macro': 0.4444347650097234} |
| INFO: 2024-07-12 12:07:40,432: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13] |
| INFO: 2024-07-12 12:07:40,432: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:07:44,279: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 3.85s |
| INFO: 2024-07-12 12:09:26,905: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 102.62s |
| INFO: 2024-07-12 12:09:26,914: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: |
| INFO: 2024-07-12 12:09:26,952: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7422680412371134, 'f1_macro': 0.742617154065763} |
| INFO: 2024-07-12 12:09:26,967: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13] |
| INFO: 2024-07-12 12:09:26,967: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:09:31,320: llmtf.base.darumeru/ruTiE: Loading Dataset: 4.35s |
| INFO: 2024-07-12 12:11:39,095: llmtf.base.darumeru/ruMMLU: Processing Dataset: 610.47s |
| INFO: 2024-07-12 12:11:39,097: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: |
| INFO: 2024-07-12 12:11:39,121: llmtf.base.darumeru/ruMMLU: {'acc': 0.4805946323456051} |
| INFO: 2024-07-12 12:11:39,172: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 12:11:39,182: llmtf.base.evaluator: |
| mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruMMLU darumeru/ruOpenBookQA |
| 0.612 0.515 0.840 0.481 0.481 0.742 |
| INFO: 2024-07-12 12:12:36,742: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 668.45s |
| INFO: 2024-07-12 12:12:36,745: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: |
| INFO: 2024-07-12 12:12:36,749: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.368351864419177, 'len': 0.9975712237925127, 'lcs': 0.9673485397373391} |
| INFO: 2024-07-12 12:12:36,750: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13] |
| INFO: 2024-07-12 12:12:36,751: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:12:38,981: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 2.23s |
| INFO: 2024-07-12 12:13:54,250: llmtf.base.darumeru/ruTiE: Processing Dataset: 262.93s |
| INFO: 2024-07-12 12:13:54,253: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE: |
| INFO: 2024-07-12 12:13:54,282: llmtf.base.darumeru/ruTiE: {'acc': 0.5395348837209303} |
| INFO: 2024-07-12 12:13:54,286: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13] |
| INFO: 2024-07-12 12:13:54,286: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:13:56,875: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.59s |
| INFO: 2024-07-12 12:14:01,832: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 4.96s |
| INFO: 2024-07-12 12:14:01,833: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: |
| INFO: 2024-07-12 12:14:01,838: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8666666666666667, 'f1_macro': 0.8655425965568433} |
| INFO: 2024-07-12 12:14:01,839: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13] |
| INFO: 2024-07-12 12:14:01,839: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:14:04,137: llmtf.base.darumeru/RWSD: Loading Dataset: 2.30s |
| INFO: 2024-07-12 12:14:14,784: llmtf.base.darumeru/RWSD: Processing Dataset: 10.65s |
| INFO: 2024-07-12 12:14:14,786: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: |
| INFO: 2024-07-12 12:14:14,790: llmtf.base.darumeru/RWSD: {'acc': 0.5833333333333334} |
| INFO: 2024-07-12 12:14:14,791: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13] |
| INFO: 2024-07-12 12:14:14,791: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:14:17,901: llmtf.base.darumeru/USE: Loading Dataset: 3.11s |
| INFO: 2024-07-12 12:15:22,773: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 702.93s |
| INFO: 2024-07-12 12:15:22,777: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: |
| INFO: 2024-07-12 12:15:22,782: llmtf.base.daru/treewayextractive: Processing Dataset: 831.07s |
| INFO: 2024-07-12 12:15:22,787: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: |
| INFO: 2024-07-12 12:15:22,819: llmtf.base.nlpcoreteam/enMMLU: metric |
| subject |
| abstract_algebra 0.310000 |
| anatomy 0.674074 |
| astronomy 0.651316 |
| business_ethics 0.630000 |
| clinical_knowledge 0.716981 |
| college_biology 0.701389 |
| college_chemistry 0.460000 |
| college_computer_science 0.540000 |
| college_mathematics 0.340000 |
| college_medicine 0.664740 |
| college_physics 0.372549 |
| computer_security 0.760000 |
| conceptual_physics 0.544681 |
| econometrics 0.482456 |
| electrical_engineering 0.558621 |
| elementary_mathematics 0.412698 |
| formal_logic 0.500000 |
| global_facts 0.370000 |
| high_school_biology 0.767742 |
| high_school_chemistry 0.438424 |
| high_school_computer_science 0.690000 |
| high_school_european_history 0.793939 |
| high_school_geography 0.782828 |
| high_school_government_and_politics 0.896373 |
| high_school_macroeconomics 0.623077 |
| high_school_mathematics 0.344444 |
| high_school_microeconomics 0.672269 |
| high_school_physics 0.370861 |
| high_school_psychology 0.822018 |
| high_school_statistics 0.509259 |
| high_school_us_history 0.828431 |
| high_school_world_history 0.801688 |
| human_aging 0.699552 |
| human_sexuality 0.732824 |
| international_law 0.809917 |
| jurisprudence 0.750000 |
| logical_fallacies 0.748466 |
| machine_learning 0.500000 |
| management 0.796117 |
| marketing 0.871795 |
| medical_genetics 0.730000 |
| miscellaneous 0.827586 |
| moral_disputes 0.728324 |
| moral_scenarios 0.174302 |
| nutrition 0.738562 |
| philosophy 0.707395 |
| prehistory 0.743827 |
| professional_accounting 0.489362 |
| professional_law 0.478488 |
| professional_medicine 0.661765 |
| professional_psychology 0.642157 |
| public_relations 0.645455 |
| security_studies 0.751020 |
| sociology 0.845771 |
| us_foreign_policy 0.890000 |
| virology 0.500000 |
| world_religions 0.824561 |
| INFO: 2024-07-12 12:15:22,827: llmtf.base.nlpcoreteam/enMMLU: metric |
| subject |
| STEM 0.515110 |
| humanities 0.683795 |
| other (business, health, misc.) 0.669324 |
| social sciences 0.732187 |
| INFO: 2024-07-12 12:15:22,859: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6501041813275048} |
| INFO: 2024-07-12 12:15:22,901: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 12:15:22,909: llmtf.base.evaluator: |
| mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU |
| 0.670 0.515 0.840 0.481 0.583 0.998 0.481 0.742 0.540 0.866 0.650 |
| INFO: 2024-07-12 12:15:23,042: llmtf.base.daru/treewayextractive: {'r-prec': 0.4038567821067821} |
| INFO: 2024-07-12 12:15:23,640: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 12:15:23,779: llmtf.base.evaluator: |
| mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU |
| 0.645 0.404 0.515 0.840 0.481 0.583 0.998 0.481 0.742 0.540 0.866 0.650 |
| INFO: 2024-07-12 12:18:08,369: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 870.98s |
| INFO: 2024-07-12 12:18:08,388: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: |
| INFO: 2024-07-12 12:18:08,427: llmtf.base.nlpcoreteam/ruMMLU: metric |
| subject |
| abstract_algebra 0.340000 |
| anatomy 0.385185 |
| astronomy 0.565789 |
| business_ethics 0.540000 |
| clinical_knowledge 0.539623 |
| college_biology 0.472222 |
| college_chemistry 0.400000 |
| college_computer_science 0.460000 |
| college_mathematics 0.400000 |
| college_medicine 0.537572 |
| college_physics 0.323529 |
| computer_security 0.610000 |
| conceptual_physics 0.493617 |
| econometrics 0.421053 |
| electrical_engineering 0.503448 |
| elementary_mathematics 0.365079 |
| formal_logic 0.365079 |
| global_facts 0.330000 |
| high_school_biology 0.612903 |
| high_school_chemistry 0.384236 |
| high_school_computer_science 0.580000 |
| high_school_european_history 0.696970 |
| high_school_geography 0.661616 |
| high_school_government_and_politics 0.611399 |
| high_school_macroeconomics 0.464103 |
| high_school_mathematics 0.340741 |
| high_school_microeconomics 0.504202 |
| high_school_physics 0.357616 |
| high_school_psychology 0.605505 |
| high_school_statistics 0.430556 |
| high_school_us_history 0.725490 |
| high_school_world_history 0.704641 |
| human_aging 0.493274 |
| human_sexuality 0.572519 |
| international_law 0.685950 |
| jurisprudence 0.564815 |
| logical_fallacies 0.472393 |
| machine_learning 0.410714 |
| management 0.631068 |
| marketing 0.730769 |
| medical_genetics 0.540000 |
| miscellaneous 0.615581 |
| moral_disputes 0.575145 |
| moral_scenarios 0.158659 |
| nutrition 0.568627 |
| philosophy 0.530547 |
| prehistory 0.537037 |
| professional_accounting 0.354610 |
| professional_law 0.364407 |
| professional_medicine 0.419118 |
| professional_psychology 0.455882 |
| public_relations 0.509091 |
| security_studies 0.644898 |
| sociology 0.701493 |
| us_foreign_policy 0.700000 |
| virology 0.457831 |
| world_religions 0.736842 |
| INFO: 2024-07-12 12:18:08,434: llmtf.base.nlpcoreteam/ruMMLU: metric |
| subject |
| STEM 0.447247 |
| humanities 0.547537 |
| other (business, health, misc.) 0.510233 |
| social sciences 0.570980 |
| INFO: 2024-07-12 12:18:08,457: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5189991335159189} |
| INFO: 2024-07-12 12:18:08,503: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 12:18:08,517: llmtf.base.evaluator: |
| mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU |
| 0.635 0.404 0.515 0.840 0.481 0.583 0.998 0.481 0.742 0.540 0.866 0.650 0.519 |
| INFO: 2024-07-12 12:20:19,190: llmtf.base.darumeru/USE: Processing Dataset: 361.28s |
| INFO: 2024-07-12 12:20:19,194: llmtf.base.darumeru/USE: Results for darumeru/USE: |
| INFO: 2024-07-12 12:20:19,199: llmtf.base.darumeru/USE: {'grade_norm': 0.08921568627450979} |
| INFO: 2024-07-12 12:20:19,203: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000] |
| INFO: 2024-07-12 12:20:19,203: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-12 12:20:24,193: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 4.99s |
| INFO: 2024-07-12 12:21:26,040: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 527.06s |
| INFO: 2024-07-12 12:21:26,042: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: |
| INFO: 2024-07-12 12:21:26,048: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 3.9001608324310224, 'len': 0.9990183431863008, 'lcs': 0.9938456701457182} |
| INFO: 2024-07-12 12:21:26,050: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13] |
| INFO: 2024-07-12 12:21:26,050: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:21:28,516: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.47s |
| INFO: 2024-07-12 12:22:31,014: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 126.82s |
| INFO: 2024-07-12 12:22:31,016: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: |
| INFO: 2024-07-12 12:22:31,059: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7312522425547183, 'mcc': 0.32496683222587364} |
| INFO: 2024-07-12 12:22:31,065: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 12:22:31,076: llmtf.base.evaluator: |
| mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
| 0.616 0.404 0.515 0.840 0.481 0.583 0.089 0.999 0.998 0.481 0.742 0.540 0.866 0.650 0.519 0.528 |
| INFO: 2024-07-12 12:36:23,699: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 895.18s |
| INFO: 2024-07-12 12:36:23,704: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: |
| INFO: 2024-07-12 12:36:23,738: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.4722468821961323, 'len': 0.996050202820598, 'lcs': 0.900415560077835} |
| INFO: 2024-07-12 12:36:23,746: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13] |
| INFO: 2024-07-12 12:36:23,746: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:36:25,765: llmtf.base.darumeru/cp_para_en: Loading Dataset: 2.02s |
| INFO: 2024-07-12 12:41:45,524: llmtf.base.daru/treewayabstractive: Processing Dataset: 2418.44s |
| INFO: 2024-07-12 12:41:45,528: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: |
| INFO: 2024-07-12 12:41:45,533: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3563853188681492, 'rouge2': 0.12951199754927947} |
| INFO: 2024-07-12 12:41:45,536: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 12:41:45,564: llmtf.base.evaluator: |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
| 0.611 0.243 0.404 0.515 0.840 0.481 0.583 0.089 0.900 0.999 0.998 0.481 0.742 0.540 0.866 0.650 0.519 0.528 |
| INFO: 2024-07-12 12:47:44,505: llmtf.base.darumeru/cp_para_en: Processing Dataset: 678.73s |
| INFO: 2024-07-12 12:47:44,507: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: |
| INFO: 2024-07-12 12:47:44,528: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 3.961010453365225, 'len': 0.9994091346932804, 'lcs': 0.9754829484099882} |
| INFO: 2024-07-12 12:47:44,528: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 12:47:44,542: llmtf.base.evaluator: |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
| 0.631 0.243 0.404 0.515 0.840 0.481 0.583 0.089 0.975 0.900 0.999 0.998 0.481 0.742 0.540 0.866 0.650 0.519 0.528 |
|
|