| INFO: 2024-07-12 12:31:54,102: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] |
| INFO: 2024-07-12 12:31:54,104: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 12:31:54,104: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:31:54,851: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] |
| INFO: 2024-07-12 12:31:54,851: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 12:31:54,851: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:31:56,886: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] |
| INFO: 2024-07-12 12:31:56,888: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-12 12:31:56,888: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-12 12:31:58,472: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] |
| INFO: 2024-07-12 12:31:58,472: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-12 12:31:58,472: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-12 12:32:00,857: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] |
| INFO: 2024-07-12 12:32:00,858: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 12:32:00,858: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:32:02,761: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] |
| INFO: 2024-07-12 12:32:02,762: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-12 12:32:02,762: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-12 12:32:04,488: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] |
| INFO: 2024-07-12 12:32:04,489: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 12:32:04,489: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:32:09,155: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.67s |
| INFO: 2024-07-12 12:32:14,891: llmtf.base.daru/treewayextractive: Loading Dataset: 12.13s |
| INFO: 2024-07-12 12:32:15,608: llmtf.base.darumeru/MultiQ: Loading Dataset: 21.50s |
| INFO: 2024-07-12 12:32:17,484: llmtf.base.daru/treewayabstractive: Loading Dataset: 16.63s |
| INFO: 2024-07-12 12:33:23,447: llmtf.base.darumeru/ruMMLU: Loading Dataset: 88.60s |
| INFO: 2024-07-12 12:35:29,475: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 211.00s |
| INFO: 2024-07-12 12:36:18,872: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 261.98s |
| INFO: 2024-07-12 12:41:00,316: llmtf.base.darumeru/MultiQ: Processing Dataset: 524.71s |
| INFO: 2024-07-12 12:41:00,319: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: |
| INFO: 2024-07-12 12:41:00,324: llmtf.base.darumeru/MultiQ: {'f1': 0.478766497045627, 'em': 0.37858508604206503} |
| INFO: 2024-07-12 12:41:00,335: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 12:41:00,335: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:41:03,781: llmtf.base.darumeru/PARus: Loading Dataset: 3.45s |
| INFO: 2024-07-12 12:41:18,655: llmtf.base.darumeru/PARus: Processing Dataset: 14.87s |
| INFO: 2024-07-12 12:41:18,658: llmtf.base.darumeru/PARus: Results for darumeru/PARus: |
| INFO: 2024-07-12 12:41:18,688: llmtf.base.darumeru/PARus: {'acc': 0.85} |
| INFO: 2024-07-12 12:41:18,691: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 12:41:18,691: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:41:22,570: llmtf.base.darumeru/RCB: Loading Dataset: 3.88s |
| INFO: 2024-07-12 12:41:46,109: llmtf.base.darumeru/RCB: Processing Dataset: 23.53s |
| INFO: 2024-07-12 12:41:46,111: llmtf.base.darumeru/RCB: Results for darumeru/RCB: |
| INFO: 2024-07-12 12:41:46,134: llmtf.base.darumeru/RCB: {'acc': 0.5363636363636364, 'f1_macro': 0.44417678744688677} |
| INFO: 2024-07-12 12:41:46,136: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 12:41:46,136: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:42:01,187: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 15.05s |
| INFO: 2024-07-12 12:42:07,490: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 598.33s |
| INFO: 2024-07-12 12:42:07,494: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: |
| INFO: 2024-07-12 12:42:07,514: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.8275470644342664, 'len': 0.9976152623010418, 'lcs': 0.9848046802324042} |
| INFO: 2024-07-12 12:42:07,517: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 12:42:07,517: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:42:11,744: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 4.23s |
| INFO: 2024-07-12 12:44:32,297: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 151.11s |
| INFO: 2024-07-12 12:44:32,299: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: |
| INFO: 2024-07-12 12:44:32,313: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7689003436426117, 'f1_macro': 0.76891271419294} |
| INFO: 2024-07-12 12:44:32,329: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 12:44:32,329: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:44:39,798: llmtf.base.darumeru/ruTiE: Loading Dataset: 7.47s |
| INFO: 2024-07-12 12:49:10,076: llmtf.base.darumeru/ruTiE: Processing Dataset: 270.28s |
| INFO: 2024-07-12 12:49:10,077: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE: |
| INFO: 2024-07-12 12:49:10,105: llmtf.base.darumeru/ruTiE: {'acc': 0.3511627906976744} |
| INFO: 2024-07-12 12:49:10,109: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 12:49:10,109: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:49:12,787: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.68s |
| INFO: 2024-07-12 12:49:21,195: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 8.41s |
| INFO: 2024-07-12 12:49:21,197: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: |
| INFO: 2024-07-12 12:49:21,226: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8666666666666667, 'f1_macro': 0.8628249661014111} |
| INFO: 2024-07-12 12:49:21,227: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 12:49:21,227: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:49:25,118: llmtf.base.darumeru/RWSD: Loading Dataset: 3.89s |
| INFO: 2024-07-12 12:49:45,525: llmtf.base.darumeru/RWSD: Processing Dataset: 20.40s |
| INFO: 2024-07-12 12:49:45,527: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: |
| INFO: 2024-07-12 12:49:45,531: llmtf.base.darumeru/RWSD: {'acc': 0.5392156862745098} |
| INFO: 2024-07-12 12:49:45,533: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 12:49:45,533: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:50:00,952: llmtf.base.darumeru/USE: Loading Dataset: 15.42s |
| INFO: 2024-07-12 12:50:23,763: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 492.02s |
| INFO: 2024-07-12 12:50:23,765: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: |
| INFO: 2024-07-12 12:50:23,785: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.424509793356442, 'len': 0.9995781033988959, 'lcs': 0.994055994028679} |
| INFO: 2024-07-12 12:50:23,788: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 12:50:23,788: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:50:27,725: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 3.94s |
| INFO: 2024-07-12 12:53:49,377: llmtf.base.darumeru/ruMMLU: Processing Dataset: 1225.92s |
| INFO: 2024-07-12 12:53:49,380: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: |
| INFO: 2024-07-12 12:53:49,389: llmtf.base.darumeru/ruMMLU: {'acc': 0.5154145465429512} |
| INFO: 2024-07-12 12:53:49,476: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 12:53:49,488: llmtf.base.evaluator: |
| mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree |
| 0.681 0.429 0.850 0.490 0.539 1.000 0.998 0.515 0.769 0.351 0.865 |
| INFO: 2024-07-12 12:54:13,002: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 1123.52s |
| INFO: 2024-07-12 12:54:13,004: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: |
| INFO: 2024-07-12 12:54:13,044: llmtf.base.nlpcoreteam/enMMLU: metric |
| subject |
| abstract_algebra 0.350000 |
| anatomy 0.696296 |
| astronomy 0.723684 |
| business_ethics 0.700000 |
| clinical_knowledge 0.750943 |
| college_biology 0.805556 |
| college_chemistry 0.510000 |
| college_computer_science 0.600000 |
| college_mathematics 0.330000 |
| college_medicine 0.676301 |
| college_physics 0.480392 |
| computer_security 0.780000 |
| conceptual_physics 0.565957 |
| econometrics 0.561404 |
| electrical_engineering 0.627586 |
| elementary_mathematics 0.447090 |
| formal_logic 0.492063 |
| global_facts 0.430000 |
| high_school_biology 0.800000 |
| high_school_chemistry 0.512315 |
| high_school_computer_science 0.760000 |
| high_school_european_history 0.787879 |
| high_school_geography 0.853535 |
| high_school_government_and_politics 0.922280 |
| high_school_macroeconomics 0.674359 |
| high_school_mathematics 0.377778 |
| high_school_microeconomics 0.773109 |
| high_school_physics 0.423841 |
| high_school_psychology 0.840367 |
| high_school_statistics 0.537037 |
| high_school_us_history 0.828431 |
| high_school_world_history 0.839662 |
| human_aging 0.735426 |
| human_sexuality 0.809160 |
| international_law 0.826446 |
| jurisprudence 0.777778 |
| logical_fallacies 0.760736 |
| machine_learning 0.544643 |
| management 0.825243 |
| marketing 0.905983 |
| medical_genetics 0.830000 |
| miscellaneous 0.841635 |
| moral_disputes 0.751445 |
| moral_scenarios 0.498324 |
| nutrition 0.754902 |
| philosophy 0.726688 |
| prehistory 0.759259 |
| professional_accounting 0.560284 |
| professional_law 0.486310 |
| professional_medicine 0.750000 |
| professional_psychology 0.717320 |
| public_relations 0.690909 |
| security_studies 0.722449 |
| sociology 0.845771 |
| us_foreign_policy 0.830000 |
| virology 0.524096 |
| world_religions 0.818713 |
| INFO: 2024-07-12 12:54:13,052: llmtf.base.nlpcoreteam/enMMLU: metric |
| subject |
| STEM 0.565327 |
| humanities 0.719518 |
| other (business, health, misc.) 0.712936 |
| social sciences 0.770055 |
| INFO: 2024-07-12 12:54:13,060: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6919591187019263} |
| INFO: 2024-07-12 12:54:13,132: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 12:54:13,141: llmtf.base.evaluator: |
| mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU |
| 0.682 0.429 0.850 0.490 0.539 1.000 0.998 0.515 0.769 0.351 0.865 0.692 |
| INFO: 2024-07-12 12:54:45,001: llmtf.base.daru/treewayextractive: Processing Dataset: 1350.11s |
| INFO: 2024-07-12 12:54:45,002: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: |
| INFO: 2024-07-12 12:54:45,249: llmtf.base.daru/treewayextractive: {'r-prec': 0.3960751082251082} |
| INFO: 2024-07-12 12:54:45,733: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 12:54:45,743: llmtf.base.evaluator: |
| mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU |
| 0.658 0.396 0.429 0.850 0.490 0.539 1.000 0.998 0.515 0.769 0.351 0.865 0.692 |
| INFO: 2024-07-12 12:55:01,556: llmtf.base.darumeru/USE: Processing Dataset: 300.60s |
| INFO: 2024-07-12 12:55:01,575: llmtf.base.darumeru/USE: Results for darumeru/USE: |
| INFO: 2024-07-12 12:55:01,596: llmtf.base.darumeru/USE: {'grade_norm': 0.1852941176470588} |
| INFO: 2024-07-12 12:55:01,602: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-12 12:55:01,603: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-12 12:55:22,870: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 21.27s |
| INFO: 2024-07-12 12:58:48,313: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 205.44s |
| INFO: 2024-07-12 12:58:48,314: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: |
| INFO: 2024-07-12 12:58:48,340: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7115177610333692, 'mcc': 0.3351158798730935} |
| INFO: 2024-07-12 12:58:48,351: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 12:58:48,359: llmtf.base.evaluator: |
| mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom |
| 0.614 0.396 0.429 0.850 0.490 0.539 0.185 1.000 0.998 0.515 0.769 0.351 0.865 0.692 0.523 |
| INFO: 2024-07-12 12:59:33,950: llmtf.base.daru/treewayabstractive: Processing Dataset: 1636.45s |
| INFO: 2024-07-12 12:59:33,951: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: |
| INFO: 2024-07-12 12:59:33,956: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3631169847404013, 'rouge2': 0.13422177695865692} |
| INFO: 2024-07-12 12:59:33,960: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 12:59:33,969: llmtf.base.evaluator: |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom |
| 0.590 0.249 0.396 0.429 0.850 0.490 0.539 0.185 1.000 0.998 0.515 0.769 0.351 0.865 0.692 0.523 |
| INFO: 2024-07-12 13:02:17,260: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 1558.37s |
| INFO: 2024-07-12 13:02:17,262: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: |
| INFO: 2024-07-12 13:02:17,302: llmtf.base.nlpcoreteam/ruMMLU: metric |
| subject |
| abstract_algebra 0.340000 |
| anatomy 0.518519 |
| astronomy 0.657895 |
| business_ethics 0.690000 |
| clinical_knowledge 0.588679 |
| college_biology 0.541667 |
| college_chemistry 0.470000 |
| college_computer_science 0.520000 |
| college_mathematics 0.360000 |
| college_medicine 0.554913 |
| college_physics 0.343137 |
| computer_security 0.710000 |
| conceptual_physics 0.523404 |
| econometrics 0.447368 |
| electrical_engineering 0.551724 |
| elementary_mathematics 0.417989 |
| formal_logic 0.412698 |
| global_facts 0.360000 |
| high_school_biology 0.661290 |
| high_school_chemistry 0.399015 |
| high_school_computer_science 0.700000 |
| high_school_european_history 0.763636 |
| high_school_geography 0.666667 |
| high_school_government_and_politics 0.652850 |
| high_school_macroeconomics 0.556410 |
| high_school_mathematics 0.359259 |
| high_school_microeconomics 0.554622 |
| high_school_physics 0.397351 |
| high_school_psychology 0.680734 |
| high_school_statistics 0.453704 |
| high_school_us_history 0.696078 |
| high_school_world_history 0.734177 |
| human_aging 0.542601 |
| human_sexuality 0.641221 |
| international_law 0.752066 |
| jurisprudence 0.657407 |
| logical_fallacies 0.564417 |
| machine_learning 0.383929 |
| management 0.689320 |
| marketing 0.735043 |
| medical_genetics 0.660000 |
| miscellaneous 0.648787 |
| moral_disputes 0.630058 |
| moral_scenarios 0.382123 |
| nutrition 0.601307 |
| philosophy 0.617363 |
| prehistory 0.570988 |
| professional_accounting 0.397163 |
| professional_law 0.398957 |
| professional_medicine 0.503676 |
| professional_psychology 0.522876 |
| public_relations 0.609091 |
| security_studies 0.653061 |
| sociology 0.686567 |
| us_foreign_policy 0.740000 |
| virology 0.463855 |
| world_religions 0.695906 |
| INFO: 2024-07-12 13:02:17,309: llmtf.base.nlpcoreteam/ruMMLU: metric |
| subject |
| STEM 0.488354 |
| humanities 0.605837 |
| other (business, health, misc.) 0.568133 |
| social sciences 0.617622 |
| INFO: 2024-07-12 13:02:17,317: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5699864045459677} |
| INFO: 2024-07-12 13:02:17,400: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 13:02:17,437: llmtf.base.evaluator: |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
| 0.589 0.249 0.396 0.429 0.850 0.490 0.539 0.185 1.000 0.998 0.515 0.769 0.351 0.865 0.692 0.570 0.523 |
| INFO: 2024-07-12 13:03:33,021: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 785.29s |
| INFO: 2024-07-12 13:03:33,024: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: |
| INFO: 2024-07-12 13:03:33,042: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.9697516062295746, 'len': 0.9984044778480231, 'lcs': 0.9773285044731846} |
| INFO: 2024-07-12 13:03:33,044: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 13:03:33,044: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 13:03:37,463: llmtf.base.darumeru/cp_para_en: Loading Dataset: 4.42s |
| INFO: 2024-07-12 13:14:17,087: llmtf.base.darumeru/cp_para_en: Processing Dataset: 639.62s |
| INFO: 2024-07-12 13:14:17,090: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: |
| INFO: 2024-07-12 13:14:17,110: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.485692967910676, 'len': 0.9994082267431339, 'lcs': 0.9723279130849847} |
| INFO: 2024-07-12 13:14:17,112: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 13:14:17,137: llmtf.base.evaluator: |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
| 0.632 0.249 0.396 0.429 0.850 0.490 0.539 0.185 0.972 0.977 1.000 0.998 0.515 0.769 0.351 0.865 0.692 0.570 0.523 |
|
|