RefalMachine's picture
Upload folder using huggingface_hub
2938659 verified
INFO: 2024-07-13 15:58:14,457: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-07-13 15:58:14,458: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:58:14,458: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:58:16,153: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-07-13 15:58:16,153: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:58:16,153: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:58:18,397: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-07-13 15:58:18,400: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:58:18,400: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:58:20,105: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-07-13 15:58:20,105: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:58:20,105: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:58:22,211: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-13 15:58:22,212: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:58:22,212: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:58:24,449: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-07-13 15:58:24,451: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:58:24,451: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:58:25,653: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-07-13 15:58:25,654: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 15:58:25,654: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 15:58:27,286: llmtf.base.darumeru/MultiQ: Loading Dataset: 12.83s
INFO: 2024-07-13 15:58:29,574: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 3.92s
INFO: 2024-07-13 15:58:29,934: llmtf.base.daru/treewayabstractive: Loading Dataset: 7.72s
INFO: 2024-07-13 15:58:32,475: llmtf.base.daru/treewayextractive: Loading Dataset: 8.02s
INFO: 2024-07-13 15:59:07,707: llmtf.base.darumeru/ruMMLU: Loading Dataset: 51.55s
INFO: 2024-07-13 16:01:36,434: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 196.33s
INFO: 2024-07-13 16:01:41,380: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 202.98s
INFO: 2024-07-13 16:06:57,037: llmtf.base.daru/treewayextractive: Processing Dataset: 504.56s
INFO: 2024-07-13 16:06:57,042: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-07-13 16:06:57,306: llmtf.base.daru/treewayextractive: {'r-prec': 0.4038567821067821}
INFO: 2024-07-13 16:06:57,370: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 16:06:57,375: llmtf.base.evaluator:
mean daru/treewayextractive
0.404 0.404
INFO: 2024-07-13 16:07:09,314: llmtf.base.darumeru/MultiQ: Processing Dataset: 522.01s
INFO: 2024-07-13 16:07:09,315: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-07-13 16:07:09,320: llmtf.base.darumeru/MultiQ: {'f1': 0.5686661277961469, 'em': 0.4980879541108987}
INFO: 2024-07-13 16:07:09,331: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 16:07:09,331: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 16:07:12,054: llmtf.base.darumeru/PARus: Loading Dataset: 2.72s
INFO: 2024-07-13 16:07:30,044: llmtf.base.darumeru/PARus: Processing Dataset: 17.99s
INFO: 2024-07-13 16:07:30,046: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-07-13 16:07:30,092: llmtf.base.darumeru/PARus: {'acc': 0.83}
INFO: 2024-07-13 16:07:30,093: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 16:07:30,093: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 16:07:32,671: llmtf.base.darumeru/RCB: Loading Dataset: 2.58s
INFO: 2024-07-13 16:07:59,066: llmtf.base.darumeru/RCB: Processing Dataset: 26.39s
INFO: 2024-07-13 16:07:59,068: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-07-13 16:07:59,092: llmtf.base.darumeru/RCB: {'acc': 0.5318181818181819, 'f1_macro': 0.4819804386277897}
INFO: 2024-07-13 16:07:59,094: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 16:07:59,095: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 16:08:07,609: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 8.51s
INFO: 2024-07-13 16:09:56,386: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 686.81s
INFO: 2024-07-13 16:09:56,404: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-07-13 16:09:56,408: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.369814709505374, 'len': 0.9988464362460299, 'lcs': 0.9808426093801238}
INFO: 2024-07-13 16:09:56,411: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 16:09:56,412: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 16:09:59,952: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.54s
INFO: 2024-07-13 16:11:01,969: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 174.36s
INFO: 2024-07-13 16:11:01,971: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-07-13 16:11:01,984: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7538659793814433, 'f1_macro': 0.7551200071805053}
INFO: 2024-07-13 16:11:02,001: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 16:11:02,001: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 16:11:06,320: llmtf.base.darumeru/ruTiE: Loading Dataset: 4.32s
INFO: 2024-07-13 16:15:29,583: llmtf.base.darumeru/ruTiE: Processing Dataset: 263.25s
INFO: 2024-07-13 16:15:29,589: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE:
INFO: 2024-07-13 16:15:29,619: llmtf.base.darumeru/ruTiE: {'acc': 0.5395348837209303}
INFO: 2024-07-13 16:15:29,622: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 16:15:29,623: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 16:15:31,996: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.37s
INFO: 2024-07-13 16:15:42,231: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 10.23s
INFO: 2024-07-13 16:15:42,234: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-07-13 16:15:42,253: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8761904761904762, 'f1_macro': 0.8761420630173862}
INFO: 2024-07-13 16:15:42,255: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 16:15:42,255: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 16:15:45,297: llmtf.base.darumeru/RWSD: Loading Dataset: 3.04s
INFO: 2024-07-13 16:16:08,974: llmtf.base.darumeru/RWSD: Processing Dataset: 23.66s
INFO: 2024-07-13 16:16:08,993: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-07-13 16:16:08,997: llmtf.base.darumeru/RWSD: {'acc': 0.6078431372549019}
INFO: 2024-07-13 16:16:08,999: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 16:16:08,999: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 16:16:16,493: llmtf.base.darumeru/USE: Loading Dataset: 7.49s
INFO: 2024-07-13 16:18:57,294: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 537.33s
INFO: 2024-07-13 16:18:57,297: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-07-13 16:18:57,316: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 3.8994152226580563, 'len': 0.9995035620835028, 'lcs': 0.9936840637058483}
INFO: 2024-07-13 16:18:57,319: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 16:18:57,319: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 16:19:00,359: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 3.04s
INFO: 2024-07-13 16:22:33,063: llmtf.base.darumeru/ruMMLU: Processing Dataset: 1405.35s
INFO: 2024-07-13 16:22:33,065: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-07-13 16:22:33,095: llmtf.base.darumeru/ruMMLU: {'acc': 0.48737902823505935}
INFO: 2024-07-13 16:22:33,217: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 16:22:33,361: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree
0.685 0.404 0.533 0.830 0.507 0.608 1.000 0.999 0.487 0.754 0.540 0.876
INFO: 2024-07-13 16:22:46,117: llmtf.base.darumeru/USE: Processing Dataset: 389.62s
INFO: 2024-07-13 16:22:46,120: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-07-13 16:22:46,141: llmtf.base.darumeru/USE: {'grade_norm': 0.12156862745098038}
INFO: 2024-07-13 16:22:46,149: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 16:22:46,149: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 16:22:52,510: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 1276.06s
INFO: 2024-07-13 16:22:52,530: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-07-13 16:22:52,571: llmtf.base.nlpcoreteam/enMMLU: metric
subject
abstract_algebra 0.310000
anatomy 0.644444
astronomy 0.677632
business_ethics 0.650000
clinical_knowledge 0.720755
college_biology 0.763889
college_chemistry 0.480000
college_computer_science 0.570000
college_mathematics 0.400000
college_medicine 0.676301
college_physics 0.372549
computer_security 0.760000
conceptual_physics 0.591489
econometrics 0.473684
electrical_engineering 0.551724
elementary_mathematics 0.396825
formal_logic 0.492063
global_facts 0.320000
high_school_biology 0.780645
high_school_chemistry 0.487685
high_school_computer_science 0.680000
high_school_european_history 0.806061
high_school_geography 0.787879
high_school_government_and_politics 0.891192
high_school_macroeconomics 0.643590
high_school_mathematics 0.355556
high_school_microeconomics 0.663866
high_school_physics 0.364238
high_school_psychology 0.834862
high_school_statistics 0.486111
high_school_us_history 0.838235
high_school_world_history 0.835443
human_aging 0.708520
human_sexuality 0.763359
international_law 0.809917
jurisprudence 0.750000
logical_fallacies 0.791411
machine_learning 0.491071
management 0.834951
marketing 0.880342
medical_genetics 0.740000
miscellaneous 0.822478
moral_disputes 0.728324
moral_scenarios 0.271508
nutrition 0.722222
philosophy 0.710611
prehistory 0.762346
professional_accounting 0.492908
professional_law 0.481095
professional_medicine 0.713235
professional_psychology 0.638889
public_relations 0.645455
security_studies 0.742857
sociology 0.840796
us_foreign_policy 0.840000
virology 0.530120
world_religions 0.818713
INFO: 2024-07-13 16:22:52,578: llmtf.base.nlpcoreteam/enMMLU: metric
subject
STEM 0.528856
humanities 0.699671
other (business, health, misc.) 0.675448
social sciences 0.730536
INFO: 2024-07-13 16:22:52,586: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6586279387925342}
INFO: 2024-07-13 16:22:52,657: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 16:22:52,667: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU
0.640 0.404 0.533 0.830 0.507 0.608 0.122 1.000 0.999 0.487 0.754 0.540 0.876 0.659
INFO: 2024-07-13 16:22:58,449: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 12.30s
INFO: 2024-07-13 16:27:06,757: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 248.31s
INFO: 2024-07-13 16:27:06,762: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-07-13 16:27:06,773: llmtf.base.russiannlp/rucola_custom: {'acc': 0.736275565123789, 'mcc': 0.37026925316854403}
INFO: 2024-07-13 16:27:06,786: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 16:27:06,830: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom
0.634 0.404 0.533 0.830 0.507 0.608 0.122 1.000 0.999 0.487 0.754 0.540 0.876 0.659 0.553
INFO: 2024-07-13 16:31:58,387: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 1817.00s
INFO: 2024-07-13 16:31:58,391: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-07-13 16:31:58,435: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
abstract_algebra 0.300000
anatomy 0.392593
astronomy 0.565789
business_ethics 0.560000
clinical_knowledge 0.554717
college_biology 0.465278
college_chemistry 0.410000
college_computer_science 0.500000
college_mathematics 0.370000
college_medicine 0.560694
college_physics 0.333333
computer_security 0.580000
conceptual_physics 0.472340
econometrics 0.403509
electrical_engineering 0.503448
elementary_mathematics 0.362434
formal_logic 0.357143
global_facts 0.320000
high_school_biology 0.609677
high_school_chemistry 0.389163
high_school_computer_science 0.640000
high_school_european_history 0.672727
high_school_geography 0.671717
high_school_government_and_politics 0.652850
high_school_macroeconomics 0.515385
high_school_mathematics 0.318519
high_school_microeconomics 0.521008
high_school_physics 0.337748
high_school_psychology 0.656881
high_school_statistics 0.430556
high_school_us_history 0.725490
high_school_world_history 0.691983
human_aging 0.520179
human_sexuality 0.610687
international_law 0.710744
jurisprudence 0.592593
logical_fallacies 0.503067
machine_learning 0.446429
management 0.669903
marketing 0.735043
medical_genetics 0.540000
miscellaneous 0.607918
moral_disputes 0.580925
moral_scenarios 0.188827
nutrition 0.611111
philosophy 0.575563
prehistory 0.527778
professional_accounting 0.397163
professional_law 0.365059
professional_medicine 0.437500
professional_psychology 0.493464
public_relations 0.545455
security_studies 0.595918
sociology 0.681592
us_foreign_policy 0.680000
virology 0.433735
world_religions 0.748538
INFO: 2024-07-13 16:31:58,442: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
STEM 0.446373
humanities 0.556957
other (business, health, misc.) 0.524325
social sciences 0.585705
INFO: 2024-07-13 16:31:58,451: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5283401236619901}
INFO: 2024-07-13 16:31:58,534: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 16:31:58,610: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.627 0.404 0.533 0.830 0.507 0.608 0.122 1.000 0.999 0.487 0.754 0.540 0.876 0.659 0.528 0.553
INFO: 2024-07-13 16:33:42,900: llmtf.base.daru/treewayabstractive: Processing Dataset: 2112.96s
INFO: 2024-07-13 16:33:42,904: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-07-13 16:33:42,937: llmtf.base.daru/treewayabstractive: {'rouge1': 0.357438599714093, 'rouge2': 0.13372912507444903}
INFO: 2024-07-13 16:33:42,941: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 16:33:42,951: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.603 0.246 0.404 0.533 0.830 0.507 0.608 0.122 1.000 0.999 0.487 0.754 0.540 0.876 0.659 0.528 0.553
INFO: 2024-07-13 16:34:08,837: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 908.48s
INFO: 2024-07-13 16:34:08,840: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-07-13 16:34:08,857: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.4702341373518975, 'len': 0.9993717494721948, 'lcs': 0.958885193897962}
INFO: 2024-07-13 16:34:08,859: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-13 16:34:08,859: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 16:34:11,992: llmtf.base.darumeru/cp_para_en: Loading Dataset: 3.13s
INFO: 2024-07-13 16:45:47,059: llmtf.base.darumeru/cp_para_en: Processing Dataset: 695.07s
INFO: 2024-07-13 16:45:47,066: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-07-13 16:45:47,099: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 3.960763996832381, 'len': 0.9995281850843424, 'lcs': 0.9811766452032213}
INFO: 2024-07-13 16:45:47,100: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 16:45:47,126: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.644 0.246 0.404 0.533 0.830 0.507 0.608 0.122 0.981 0.959 1.000 0.999 0.487 0.754 0.540 0.876 0.659 0.528 0.553