RefalMachine's picture
Upload folder using huggingface_hub
ae34f4f verified
INFO: 2024-07-13 14:29:23,827: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-07-13 14:29:23,828: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:23,828: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:23,892: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-07-13 14:29:23,896: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:23,896: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:24,151: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-07-13 14:29:24,154: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:24,154: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:24,345: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-07-13 14:29:24,346: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:24,346: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:25,729: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-13 14:29:25,731: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:25,731: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:27,678: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-07-13 14:29:27,678: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:27,678: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:29,484: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-07-13 14:29:29,484: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:29:29,484: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:29:33,887: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.40s
INFO: 2024-07-13 14:29:39,828: llmtf.base.daru/treewayextractive: Loading Dataset: 12.15s
INFO: 2024-07-13 14:29:42,885: llmtf.base.daru/treewayabstractive: Loading Dataset: 17.15s
INFO: 2024-07-13 14:29:45,765: llmtf.base.darumeru/MultiQ: Loading Dataset: 21.61s
INFO: 2024-07-13 14:30:53,478: llmtf.base.darumeru/ruMMLU: Loading Dataset: 89.58s
INFO: 2024-07-13 14:32:57,360: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 213.01s
INFO: 2024-07-13 14:33:24,939: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 231.05s
INFO: 2024-07-13 14:33:24,943: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-07-13 14:33:24,962: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.8278810271761903, 'len': 0.9977030047832767, 'lcs': 0.9847970468194288}
INFO: 2024-07-13 14:33:24,975: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:33:24,975: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:33:28,742: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.77s
INFO: 2024-07-13 14:33:45,284: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 261.46s
INFO: 2024-07-13 14:36:13,193: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 164.45s
INFO: 2024-07-13 14:36:13,226: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-07-13 14:36:13,244: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.424509793356442, 'len': 0.9995781033988959, 'lcs': 0.994055994028679}
INFO: 2024-07-13 14:36:13,246: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:36:13,246: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:36:17,469: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 4.22s
INFO: 2024-07-13 14:36:19,338: llmtf.base.daru/treewayextractive: Processing Dataset: 399.51s
INFO: 2024-07-13 14:36:19,340: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-07-13 14:36:19,799: llmtf.base.daru/treewayextractive: {'r-prec': 0.39738621933621937}
INFO: 2024-07-13 14:36:19,844: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:36:19,850: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/cp_sent_en darumeru/cp_sent_ru
0.798 0.397 1.000 0.998
INFO: 2024-07-13 14:36:56,298: llmtf.base.darumeru/MultiQ: Processing Dataset: 430.53s
INFO: 2024-07-13 14:36:56,300: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-07-13 14:36:56,305: llmtf.base.darumeru/MultiQ: {'f1': 0.48425376524800046, 'em': 0.3795411089866157}
INFO: 2024-07-13 14:36:56,316: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:36:56,317: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:37:00,009: llmtf.base.darumeru/PARus: Loading Dataset: 3.69s
INFO: 2024-07-13 14:37:13,006: llmtf.base.darumeru/PARus: Processing Dataset: 13.00s
INFO: 2024-07-13 14:37:13,009: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-07-13 14:37:13,021: llmtf.base.darumeru/PARus: {'acc': 0.85}
INFO: 2024-07-13 14:37:13,023: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:37:13,023: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:37:16,908: llmtf.base.darumeru/RCB: Loading Dataset: 3.88s
INFO: 2024-07-13 14:37:39,047: llmtf.base.darumeru/RCB: Processing Dataset: 22.12s
INFO: 2024-07-13 14:37:39,050: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-07-13 14:37:39,056: llmtf.base.darumeru/RCB: {'acc': 0.5272727272727272, 'f1_macro': 0.43555405633327715}
INFO: 2024-07-13 14:37:39,058: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:37:39,059: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:37:53,697: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 14.64s
INFO: 2024-07-13 14:40:08,010: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 134.31s
INFO: 2024-07-13 14:40:08,013: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-07-13 14:40:08,027: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7680412371134021, 'f1_macro': 0.7680185950653384}
INFO: 2024-07-13 14:40:08,043: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:40:08,043: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:40:15,245: llmtf.base.darumeru/ruTiE: Loading Dataset: 7.20s
INFO: 2024-07-13 14:41:10,015: llmtf.base.daru/treewayabstractive: Processing Dataset: 687.13s
INFO: 2024-07-13 14:41:10,017: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-07-13 14:41:10,037: llmtf.base.daru/treewayabstractive: {'rouge1': 0.360975899636531, 'rouge2': 0.1330737491255763}
INFO: 2024-07-13 14:41:10,042: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:41:10,069: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA
0.647 0.247 0.397 0.432 0.850 0.481 1.000 0.998 0.768
INFO: 2024-07-13 14:41:58,403: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 340.93s
INFO: 2024-07-13 14:41:58,453: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-07-13 14:41:58,457: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.9697516062295746, 'len': 0.9984044778480231, 'lcs': 0.9773285044731846}
INFO: 2024-07-13 14:41:58,459: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:41:58,459: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:42:02,784: llmtf.base.darumeru/cp_para_en: Loading Dataset: 4.32s
INFO: 2024-07-13 14:44:45,025: llmtf.base.darumeru/ruTiE: Processing Dataset: 269.78s
INFO: 2024-07-13 14:44:45,027: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE:
INFO: 2024-07-13 14:44:45,073: llmtf.base.darumeru/ruTiE: {'acc': 0.3511627906976744}
INFO: 2024-07-13 14:44:45,076: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:44:45,077: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:44:47,875: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.80s
INFO: 2024-07-13 14:44:55,693: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 7.80s
INFO: 2024-07-13 14:44:55,695: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-07-13 14:44:55,700: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8761904761904762, 'f1_macro': 0.8733631471423589}
INFO: 2024-07-13 14:44:55,701: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:44:55,701: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:45:00,084: llmtf.base.darumeru/RWSD: Loading Dataset: 4.38s
INFO: 2024-07-13 14:45:19,405: llmtf.base.darumeru/RWSD: Processing Dataset: 19.32s
INFO: 2024-07-13 14:45:19,421: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-07-13 14:45:19,425: llmtf.base.darumeru/RWSD: {'acc': 0.5441176470588235}
INFO: 2024-07-13 14:45:19,427: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:45:19,427: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:45:34,612: llmtf.base.darumeru/USE: Loading Dataset: 15.18s
INFO: 2024-07-13 14:46:14,635: llmtf.base.darumeru/cp_para_en: Processing Dataset: 251.85s
INFO: 2024-07-13 14:46:14,638: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-07-13 14:46:14,657: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.485777628533072, 'len': 0.999455845790753, 'lcs': 0.9727731185644367}
INFO: 2024-07-13 14:46:14,658: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:46:14,684: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree
0.684 0.247 0.397 0.432 0.850 0.481 0.544 0.973 0.977 1.000 0.998 0.768 0.351 0.875
INFO: 2024-07-13 14:48:58,982: llmtf.base.darumeru/USE: Processing Dataset: 204.37s
INFO: 2024-07-13 14:48:58,999: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-07-13 14:48:59,004: llmtf.base.darumeru/USE: {'grade_norm': 0.18725490196078434}
INFO: 2024-07-13 14:48:59,010: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 14:48:59,010: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 14:49:19,451: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 20.44s
INFO: 2024-07-13 14:50:14,250: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 1036.87s
INFO: 2024-07-13 14:50:14,255: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-07-13 14:50:14,302: llmtf.base.nlpcoreteam/enMMLU: metric
subject
abstract_algebra 0.350000
anatomy 0.696296
astronomy 0.730263
business_ethics 0.700000
clinical_knowledge 0.754717
college_biology 0.812500
college_chemistry 0.500000
college_computer_science 0.590000
college_mathematics 0.330000
college_medicine 0.670520
college_physics 0.470588
computer_security 0.780000
conceptual_physics 0.570213
econometrics 0.561404
electrical_engineering 0.634483
elementary_mathematics 0.439153
formal_logic 0.507937
global_facts 0.430000
high_school_biology 0.800000
high_school_chemistry 0.517241
high_school_computer_science 0.760000
high_school_european_history 0.787879
high_school_geography 0.843434
high_school_government_and_politics 0.922280
high_school_macroeconomics 0.671795
high_school_mathematics 0.381481
high_school_microeconomics 0.764706
high_school_physics 0.417219
high_school_psychology 0.847706
high_school_statistics 0.537037
high_school_us_history 0.833333
high_school_world_history 0.835443
human_aging 0.730942
human_sexuality 0.801527
international_law 0.818182
jurisprudence 0.759259
logical_fallacies 0.766871
machine_learning 0.544643
management 0.825243
marketing 0.901709
medical_genetics 0.830000
miscellaneous 0.842912
moral_disputes 0.751445
moral_scenarios 0.497207
nutrition 0.754902
philosophy 0.720257
prehistory 0.753086
professional_accounting 0.556738
professional_law 0.483051
professional_medicine 0.742647
professional_psychology 0.717320
public_relations 0.690909
security_studies 0.722449
sociology 0.840796
us_foreign_policy 0.840000
virology 0.512048
world_religions 0.818713
INFO: 2024-07-13 14:50:14,310: llmtf.base.nlpcoreteam/enMMLU: metric
subject
STEM 0.564712
humanities 0.717897
other (business, health, misc.) 0.710620
social sciences 0.768694
INFO: 2024-07-13 14:50:14,318: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6904807286717012}
INFO: 2024-07-13 14:50:14,385: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:50:14,399: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU
0.651 0.247 0.397 0.432 0.850 0.481 0.544 0.187 0.973 0.977 1.000 0.998 0.768 0.351 0.875 0.690
INFO: 2024-07-13 14:51:55,784: llmtf.base.darumeru/ruMMLU: Processing Dataset: 1262.30s
INFO: 2024-07-13 14:51:55,788: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-07-13 14:51:55,799: llmtf.base.darumeru/ruMMLU: {'acc': 0.5138182180983737}
INFO: 2024-07-13 14:51:55,888: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:51:55,906: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU
0.643 0.247 0.397 0.432 0.850 0.481 0.544 0.187 0.973 0.977 1.000 0.998 0.514 0.768 0.351 0.875 0.690
INFO: 2024-07-13 14:52:18,001: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 178.55s
INFO: 2024-07-13 14:52:18,002: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-07-13 14:52:18,035: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7115177610333692, 'mcc': 0.3362227509262135}
INFO: 2024-07-13 14:52:18,046: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:52:18,077: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom
0.636 0.247 0.397 0.432 0.850 0.481 0.544 0.187 0.973 0.977 1.000 0.998 0.514 0.768 0.351 0.875 0.690 0.524
INFO: 2024-07-13 14:59:07,852: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 1522.57s
INFO: 2024-07-13 14:59:07,871: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-07-13 14:59:07,917: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
abstract_algebra 0.330000
anatomy 0.511111
astronomy 0.651316
business_ethics 0.680000
clinical_knowledge 0.588679
college_biology 0.534722
college_chemistry 0.480000
college_computer_science 0.520000
college_mathematics 0.350000
college_medicine 0.549133
college_physics 0.352941
computer_security 0.720000
conceptual_physics 0.540426
econometrics 0.438596
electrical_engineering 0.572414
elementary_mathematics 0.417989
formal_logic 0.396825
global_facts 0.370000
high_school_biology 0.664516
high_school_chemistry 0.394089
high_school_computer_science 0.690000
high_school_european_history 0.763636
high_school_geography 0.666667
high_school_government_and_politics 0.647668
high_school_macroeconomics 0.553846
high_school_mathematics 0.348148
high_school_microeconomics 0.546218
high_school_physics 0.410596
high_school_psychology 0.682569
high_school_statistics 0.449074
high_school_us_history 0.691176
high_school_world_history 0.734177
human_aging 0.538117
human_sexuality 0.641221
international_law 0.743802
jurisprudence 0.657407
logical_fallacies 0.558282
machine_learning 0.401786
management 0.689320
marketing 0.730769
medical_genetics 0.670000
miscellaneous 0.650064
moral_disputes 0.630058
moral_scenarios 0.382123
nutrition 0.604575
philosophy 0.614148
prehistory 0.574074
professional_accounting 0.397163
professional_law 0.397001
professional_medicine 0.514706
professional_psychology 0.514706
public_relations 0.609091
security_studies 0.657143
sociology 0.676617
us_foreign_policy 0.740000
virology 0.457831
world_religions 0.695906
INFO: 2024-07-13 14:59:07,924: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
STEM 0.490445
humanities 0.602971
other (business, health, misc.) 0.567962
social sciences 0.614529
INFO: 2024-07-13 14:59:07,947: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5689766403256171}
INFO: 2024-07-13 14:59:08,029: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 14:59:08,049: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.632 0.247 0.397 0.432 0.850 0.481 0.544 0.187 0.973 0.977 1.000 0.998 0.514 0.768 0.351 0.875 0.690 0.569 0.524