| INFO: 2024-07-12 11:50:11,759: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] |
| INFO: 2024-07-12 11:50:11,760: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 11:50:11,760: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 11:50:13,518: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] |
| INFO: 2024-07-12 11:50:13,519: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 11:50:13,519: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 11:50:15,480: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] |
| INFO: 2024-07-12 11:50:15,480: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-12 11:50:15,480: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-12 11:50:17,355: llmtf.base.darumeru/MultiQ: Loading Dataset: 5.59s |
| INFO: 2024-07-12 11:50:17,449: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] |
| INFO: 2024-07-12 11:50:17,450: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-12 11:50:17,450: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-12 11:50:19,384: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] |
| INFO: 2024-07-12 11:50:19,384: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 11:50:19,384: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 11:50:21,378: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] |
| INFO: 2024-07-12 11:50:21,378: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-12 11:50:21,378: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-12 11:50:23,238: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] |
| INFO: 2024-07-12 11:50:23,240: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 11:50:23,240: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 11:50:25,353: llmtf.base.daru/treewayabstractive: Loading Dataset: 5.97s |
| INFO: 2024-07-12 11:50:26,005: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 2.77s |
| INFO: 2024-07-12 11:50:31,706: llmtf.base.darumeru/ruMMLU: Loading Dataset: 18.19s |
| INFO: 2024-07-12 11:50:33,694: llmtf.base.daru/treewayextractive: Loading Dataset: 12.32s |
| INFO: 2024-07-12 11:52:36,206: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 138.76s |
| INFO: 2024-07-12 11:52:42,827: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 147.35s |
| INFO: 2024-07-12 11:56:33,888: llmtf.base.darumeru/MultiQ: Processing Dataset: 376.53s |
| INFO: 2024-07-12 11:56:33,892: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: |
| INFO: 2024-07-12 11:56:33,897: llmtf.base.darumeru/MultiQ: {'f1': 0.4655063954998404, 'em': 0.3527724665391969} |
| INFO: 2024-07-12 11:56:33,903: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 11:56:33,903: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 11:56:36,189: llmtf.base.darumeru/PARus: Loading Dataset: 2.29s |
| INFO: 2024-07-12 11:56:43,408: llmtf.base.darumeru/PARus: Processing Dataset: 7.22s |
| INFO: 2024-07-12 11:56:43,410: llmtf.base.darumeru/PARus: Results for darumeru/PARus: |
| INFO: 2024-07-12 11:56:43,422: llmtf.base.darumeru/PARus: {'acc': 0.81} |
| INFO: 2024-07-12 11:56:43,423: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 11:56:43,423: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 11:56:45,896: llmtf.base.darumeru/RCB: Loading Dataset: 2.47s |
| INFO: 2024-07-12 11:56:55,195: llmtf.base.darumeru/RCB: Processing Dataset: 9.30s |
| INFO: 2024-07-12 11:56:55,197: llmtf.base.darumeru/RCB: Results for darumeru/RCB: |
| INFO: 2024-07-12 11:56:55,203: llmtf.base.darumeru/RCB: {'acc': 0.5318181818181819, 'f1_macro': 0.439336917562724} |
| INFO: 2024-07-12 11:56:55,204: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 11:56:55,204: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 11:56:59,484: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 4.28s |
| INFO: 2024-07-12 11:58:22,725: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 83.22s |
| INFO: 2024-07-12 11:58:22,728: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: |
| INFO: 2024-07-12 11:58:22,741: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7246563573883161, 'f1_macro': 0.7252454850132016} |
| INFO: 2024-07-12 11:58:22,750: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 11:58:22,750: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 11:58:30,226: llmtf.base.darumeru/ruTiE: Loading Dataset: 7.47s |
| INFO: 2024-07-12 11:58:41,473: llmtf.base.darumeru/ruMMLU: Processing Dataset: 489.77s |
| INFO: 2024-07-12 11:58:41,476: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: |
| INFO: 2024-07-12 11:58:41,497: llmtf.base.darumeru/ruMMLU: {'acc': 0.501047590541754} |
| INFO: 2024-07-12 11:58:41,543: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 11:58:41,550: llmtf.base.evaluator: |
| mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruMMLU darumeru/ruOpenBookQA |
| 0.586 0.409 0.810 0.486 0.501 0.725 |
| INFO: 2024-07-12 12:00:02,774: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 576.76s |
| INFO: 2024-07-12 12:00:02,777: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: |
| INFO: 2024-07-12 12:00:02,781: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.8288105617933392, 'len': 0.9965909600064454, 'lcs': 0.9803935769826266} |
| INFO: 2024-07-12 12:00:02,783: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 12:00:02,783: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:00:05,357: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 2.57s |
| INFO: 2024-07-12 12:02:14,754: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 578.55s |
| INFO: 2024-07-12 12:02:14,757: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: |
| INFO: 2024-07-12 12:02:14,803: llmtf.base.nlpcoreteam/enMMLU: metric |
| subject |
| abstract_algebra 0.350000 |
| anatomy 0.674074 |
| astronomy 0.730263 |
| business_ethics 0.740000 |
| clinical_knowledge 0.747170 |
| college_biology 0.777778 |
| college_chemistry 0.460000 |
| college_computer_science 0.560000 |
| college_mathematics 0.310000 |
| college_medicine 0.653179 |
| college_physics 0.450980 |
| computer_security 0.780000 |
| conceptual_physics 0.561702 |
| econometrics 0.570175 |
| electrical_engineering 0.662069 |
| elementary_mathematics 0.462963 |
| formal_logic 0.468254 |
| global_facts 0.330000 |
| high_school_biology 0.793548 |
| high_school_chemistry 0.502463 |
| high_school_computer_science 0.740000 |
| high_school_european_history 0.751515 |
| high_school_geography 0.828283 |
| high_school_government_and_politics 0.922280 |
| high_school_macroeconomics 0.669231 |
| high_school_mathematics 0.355556 |
| high_school_microeconomics 0.731092 |
| high_school_physics 0.470199 |
| high_school_psychology 0.834862 |
| high_school_statistics 0.537037 |
| high_school_us_history 0.833333 |
| high_school_world_history 0.852321 |
| human_aging 0.726457 |
| human_sexuality 0.770992 |
| international_law 0.818182 |
| jurisprudence 0.740741 |
| logical_fallacies 0.754601 |
| machine_learning 0.500000 |
| management 0.815534 |
| marketing 0.893162 |
| medical_genetics 0.840000 |
| miscellaneous 0.825032 |
| moral_disputes 0.728324 |
| moral_scenarios 0.427933 |
| nutrition 0.758170 |
| philosophy 0.700965 |
| prehistory 0.722222 |
| professional_accounting 0.535461 |
| professional_law 0.489570 |
| professional_medicine 0.731618 |
| professional_psychology 0.681373 |
| public_relations 0.700000 |
| security_studies 0.726531 |
| sociology 0.805970 |
| us_foreign_policy 0.860000 |
| virology 0.518072 |
| world_religions 0.818713 |
| INFO: 2024-07-12 12:02:14,811: llmtf.base.nlpcoreteam/enMMLU: metric |
| subject |
| STEM 0.555809 |
| humanities 0.700513 |
| other (business, health, misc.) 0.699138 |
| social sciences 0.758399 |
| INFO: 2024-07-12 12:02:14,847: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6784647703166143} |
| INFO: 2024-07-12 12:02:14,911: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 12:02:14,918: llmtf.base.evaluator: |
| mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA nlpcoreteam/enMMLU |
| 0.658 0.409 0.810 0.486 0.997 0.501 0.725 0.678 |
| INFO: 2024-07-12 12:02:58,876: llmtf.base.darumeru/ruTiE: Processing Dataset: 268.65s |
| INFO: 2024-07-12 12:02:58,877: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE: |
| INFO: 2024-07-12 12:02:58,905: llmtf.base.darumeru/ruTiE: {'acc': 0.3511627906976744} |
| INFO: 2024-07-12 12:02:58,908: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 12:02:58,908: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:03:01,487: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.58s |
| INFO: 2024-07-12 12:03:05,587: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 4.10s |
| INFO: 2024-07-12 12:03:05,589: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: |
| INFO: 2024-07-12 12:03:05,594: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8571428571428571, 'f1_macro': 0.8524773742696758} |
| INFO: 2024-07-12 12:03:05,595: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 12:03:05,595: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:03:07,532: llmtf.base.darumeru/RWSD: Loading Dataset: 1.94s |
| INFO: 2024-07-12 12:03:16,066: llmtf.base.darumeru/RWSD: Processing Dataset: 8.53s |
| INFO: 2024-07-12 12:03:16,083: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: |
| INFO: 2024-07-12 12:03:16,087: llmtf.base.darumeru/RWSD: {'acc': 0.5196078431372549} |
| INFO: 2024-07-12 12:03:16,088: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 12:03:16,088: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:03:20,273: llmtf.base.darumeru/USE: Loading Dataset: 4.18s |
| INFO: 2024-07-12 12:04:31,400: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 708.57s |
| INFO: 2024-07-12 12:04:31,403: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: |
| INFO: 2024-07-12 12:04:31,442: llmtf.base.nlpcoreteam/ruMMLU: metric |
| subject |
| abstract_algebra 0.320000 |
| anatomy 0.444444 |
| astronomy 0.684211 |
| business_ethics 0.640000 |
| clinical_knowledge 0.607547 |
| college_biology 0.493056 |
| college_chemistry 0.410000 |
| college_computer_science 0.430000 |
| college_mathematics 0.320000 |
| college_medicine 0.508671 |
| college_physics 0.313725 |
| computer_security 0.670000 |
| conceptual_physics 0.497872 |
| econometrics 0.412281 |
| electrical_engineering 0.572414 |
| elementary_mathematics 0.391534 |
| formal_logic 0.428571 |
| global_facts 0.330000 |
| high_school_biology 0.596774 |
| high_school_chemistry 0.403941 |
| high_school_computer_science 0.620000 |
| high_school_european_history 0.751515 |
| high_school_geography 0.661616 |
| high_school_government_and_politics 0.606218 |
| high_school_macroeconomics 0.523077 |
| high_school_mathematics 0.325926 |
| high_school_microeconomics 0.516807 |
| high_school_physics 0.384106 |
| high_school_psychology 0.658716 |
| high_school_statistics 0.444444 |
| high_school_us_history 0.691176 |
| high_school_world_history 0.729958 |
| human_aging 0.538117 |
| human_sexuality 0.625954 |
| international_law 0.702479 |
| jurisprudence 0.620370 |
| logical_fallacies 0.484663 |
| machine_learning 0.366071 |
| management 0.689320 |
| marketing 0.747863 |
| medical_genetics 0.640000 |
| miscellaneous 0.646232 |
| moral_disputes 0.589595 |
| moral_scenarios 0.302793 |
| nutrition 0.598039 |
| philosophy 0.630225 |
| prehistory 0.540123 |
| professional_accounting 0.368794 |
| professional_law 0.387223 |
| professional_medicine 0.522059 |
| professional_psychology 0.501634 |
| public_relations 0.581818 |
| security_studies 0.644898 |
| sociology 0.626866 |
| us_foreign_policy 0.700000 |
| virology 0.469880 |
| world_religions 0.695906 |
| INFO: 2024-07-12 12:04:31,450: llmtf.base.nlpcoreteam/ruMMLU: metric |
| subject |
| STEM 0.458004 |
| humanities 0.581123 |
| other (business, health, misc.) 0.553640 |
| social sciences 0.588324 |
| INFO: 2024-07-12 12:04:31,458: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5452728350581164} |
| INFO: 2024-07-12 12:04:31,502: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 12:04:31,511: llmtf.base.evaluator: |
| mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU |
| 0.625 0.409 0.810 0.486 0.520 0.997 0.501 0.725 0.351 0.855 0.678 0.545 |
| INFO: 2024-07-12 12:07:56,870: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 471.51s |
| INFO: 2024-07-12 12:07:56,872: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: |
| INFO: 2024-07-12 12:07:56,906: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.424364613333213, 'len': 0.9995781033988959, 'lcs': 0.9939894734160288} |
| INFO: 2024-07-12 12:07:56,907: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 12:07:56,907: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:07:58,934: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.03s |
| INFO: 2024-07-12 12:08:00,959: llmtf.base.darumeru/USE: Processing Dataset: 280.67s |
| INFO: 2024-07-12 12:08:00,961: llmtf.base.darumeru/USE: Results for darumeru/USE: |
| INFO: 2024-07-12 12:08:00,979: llmtf.base.darumeru/USE: {'grade_norm': 0.13333333333333333} |
| INFO: 2024-07-12 12:08:00,983: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-12 12:08:00,983: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-12 12:08:07,042: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 6.06s |
| INFO: 2024-07-12 12:09:47,993: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 100.95s |
| INFO: 2024-07-12 12:09:47,995: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: |
| INFO: 2024-07-12 12:09:48,022: llmtf.base.russiannlp/rucola_custom: {'acc': 0.6925008970218873, 'mcc': 0.28632822760421195} |
| INFO: 2024-07-12 12:09:48,027: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 12:09:48,035: llmtf.base.evaluator: |
| mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
| 0.607 0.409 0.810 0.486 0.520 0.133 1.000 0.997 0.501 0.725 0.351 0.855 0.678 0.545 0.489 |
| INFO: 2024-07-12 12:11:09,066: llmtf.base.daru/treewayextractive: Processing Dataset: 1235.37s |
| INFO: 2024-07-12 12:11:09,067: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: |
| INFO: 2024-07-12 12:11:09,315: llmtf.base.daru/treewayextractive: {'r-prec': 0.3960751082251082} |
| INFO: 2024-07-12 12:11:09,704: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 12:11:09,714: llmtf.base.evaluator: |
| mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
| 0.593 0.396 0.409 0.810 0.486 0.520 0.133 1.000 0.997 0.501 0.725 0.351 0.855 0.678 0.545 0.489 |
| INFO: 2024-07-12 12:18:07,404: llmtf.base.daru/treewayabstractive: Processing Dataset: 1662.05s |
| INFO: 2024-07-12 12:18:07,410: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: |
| INFO: 2024-07-12 12:18:07,414: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3588531675380842, 'rouge2': 0.1296570302856779} |
| INFO: 2024-07-12 12:18:07,417: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 12:18:07,462: llmtf.base.evaluator: |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
| 0.571 0.244 0.396 0.409 0.810 0.486 0.520 0.133 1.000 0.997 0.501 0.725 0.351 0.855 0.678 0.545 0.489 |
| INFO: 2024-07-12 12:20:36,611: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 757.68s |
| INFO: 2024-07-12 12:20:36,614: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: |
| INFO: 2024-07-12 12:20:36,618: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.9696581328597187, 'len': 0.995839361053339, 'lcs': 0.9354973649884331} |
| INFO: 2024-07-12 12:20:36,619: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009, 198, 271] |
| INFO: 2024-07-12 12:20:36,619: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
| INFO: 2024-07-12 12:20:39,355: llmtf.base.darumeru/cp_para_en: Loading Dataset: 2.74s |
| INFO: 2024-07-12 12:30:57,631: llmtf.base.darumeru/cp_para_en: Processing Dataset: 618.27s |
| INFO: 2024-07-12 12:30:57,647: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: |
| INFO: 2024-07-12 12:30:57,651: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.479971277716798, 'len': 0.9952123972535741, 'lcs': 0.9731059690999859} |
| INFO: 2024-07-12 12:30:57,652: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-12 12:30:57,730: llmtf.base.evaluator: |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
| 0.614 0.244 0.396 0.409 0.810 0.486 0.520 0.133 0.973 0.935 1.000 0.997 0.501 0.725 0.351 0.855 0.678 0.545 0.489 |
|
|