| INFO: 2024-07-13 14:29:23,827: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] |
| INFO: 2024-07-13 14:29:23,828: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-13 14:29:23,828: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 14:29:23,892: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] |
| INFO: 2024-07-13 14:29:23,896: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-13 14:29:23,896: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 14:29:24,151: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] |
| INFO: 2024-07-13 14:29:24,154: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-13 14:29:24,154: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 14:29:24,345: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] |
| INFO: 2024-07-13 14:29:24,346: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-13 14:29:24,346: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 14:29:25,729: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] |
| INFO: 2024-07-13 14:29:25,731: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-13 14:29:25,731: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 14:29:27,678: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] |
| INFO: 2024-07-13 14:29:27,678: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-13 14:29:27,678: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 14:29:29,484: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] |
| INFO: 2024-07-13 14:29:29,484: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-13 14:29:29,484: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 14:29:33,887: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.40s |
| INFO: 2024-07-13 14:29:39,828: llmtf.base.daru/treewayextractive: Loading Dataset: 12.15s |
| INFO: 2024-07-13 14:29:42,885: llmtf.base.daru/treewayabstractive: Loading Dataset: 17.15s |
| INFO: 2024-07-13 14:29:45,765: llmtf.base.darumeru/MultiQ: Loading Dataset: 21.61s |
| INFO: 2024-07-13 14:30:53,478: llmtf.base.darumeru/ruMMLU: Loading Dataset: 89.58s |
| INFO: 2024-07-13 14:32:57,360: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 213.01s |
| INFO: 2024-07-13 14:33:24,939: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 231.05s |
| INFO: 2024-07-13 14:33:24,943: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: |
| INFO: 2024-07-13 14:33:24,962: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.8278810271761903, 'len': 0.9977030047832767, 'lcs': 0.9847970468194288} |
| INFO: 2024-07-13 14:33:24,975: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-13 14:33:24,975: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 14:33:28,742: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.77s |
| INFO: 2024-07-13 14:33:45,284: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 261.46s |
| INFO: 2024-07-13 14:36:13,193: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 164.45s |
| INFO: 2024-07-13 14:36:13,226: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: |
| INFO: 2024-07-13 14:36:13,244: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.424509793356442, 'len': 0.9995781033988959, 'lcs': 0.994055994028679} |
| INFO: 2024-07-13 14:36:13,246: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-13 14:36:13,246: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 14:36:17,469: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 4.22s |
| INFO: 2024-07-13 14:36:19,338: llmtf.base.daru/treewayextractive: Processing Dataset: 399.51s |
| INFO: 2024-07-13 14:36:19,340: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: |
| INFO: 2024-07-13 14:36:19,799: llmtf.base.daru/treewayextractive: {'r-prec': 0.39738621933621937} |
| INFO: 2024-07-13 14:36:19,844: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-13 14:36:19,850: llmtf.base.evaluator: |
| mean daru/treewayextractive darumeru/cp_sent_en darumeru/cp_sent_ru |
| 0.798 0.397 1.000 0.998 |
| INFO: 2024-07-13 14:36:56,298: llmtf.base.darumeru/MultiQ: Processing Dataset: 430.53s |
| INFO: 2024-07-13 14:36:56,300: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: |
| INFO: 2024-07-13 14:36:56,305: llmtf.base.darumeru/MultiQ: {'f1': 0.48425376524800046, 'em': 0.3795411089866157} |
| INFO: 2024-07-13 14:36:56,316: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-13 14:36:56,317: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 14:37:00,009: llmtf.base.darumeru/PARus: Loading Dataset: 3.69s |
| INFO: 2024-07-13 14:37:13,006: llmtf.base.darumeru/PARus: Processing Dataset: 13.00s |
| INFO: 2024-07-13 14:37:13,009: llmtf.base.darumeru/PARus: Results for darumeru/PARus: |
| INFO: 2024-07-13 14:37:13,021: llmtf.base.darumeru/PARus: {'acc': 0.85} |
| INFO: 2024-07-13 14:37:13,023: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-13 14:37:13,023: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 14:37:16,908: llmtf.base.darumeru/RCB: Loading Dataset: 3.88s |
| INFO: 2024-07-13 14:37:39,047: llmtf.base.darumeru/RCB: Processing Dataset: 22.12s |
| INFO: 2024-07-13 14:37:39,050: llmtf.base.darumeru/RCB: Results for darumeru/RCB: |
| INFO: 2024-07-13 14:37:39,056: llmtf.base.darumeru/RCB: {'acc': 0.5272727272727272, 'f1_macro': 0.43555405633327715} |
| INFO: 2024-07-13 14:37:39,058: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-13 14:37:39,059: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 14:37:53,697: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 14.64s |
| INFO: 2024-07-13 14:40:08,010: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 134.31s |
| INFO: 2024-07-13 14:40:08,013: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: |
| INFO: 2024-07-13 14:40:08,027: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7680412371134021, 'f1_macro': 0.7680185950653384} |
| INFO: 2024-07-13 14:40:08,043: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-13 14:40:08,043: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 14:40:15,245: llmtf.base.darumeru/ruTiE: Loading Dataset: 7.20s |
| INFO: 2024-07-13 14:41:10,015: llmtf.base.daru/treewayabstractive: Processing Dataset: 687.13s |
| INFO: 2024-07-13 14:41:10,017: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: |
| INFO: 2024-07-13 14:41:10,037: llmtf.base.daru/treewayabstractive: {'rouge1': 0.360975899636531, 'rouge2': 0.1330737491255763} |
| INFO: 2024-07-13 14:41:10,042: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-13 14:41:10,069: llmtf.base.evaluator: |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA |
| 0.647 0.247 0.397 0.432 0.850 0.481 1.000 0.998 0.768 |
| INFO: 2024-07-13 14:41:58,403: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 340.93s |
| INFO: 2024-07-13 14:41:58,453: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: |
| INFO: 2024-07-13 14:41:58,457: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.9697516062295746, 'len': 0.9984044778480231, 'lcs': 0.9773285044731846} |
| INFO: 2024-07-13 14:41:58,459: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-13 14:41:58,459: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 14:42:02,784: llmtf.base.darumeru/cp_para_en: Loading Dataset: 4.32s |
| INFO: 2024-07-13 14:44:45,025: llmtf.base.darumeru/ruTiE: Processing Dataset: 269.78s |
| INFO: 2024-07-13 14:44:45,027: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE: |
| INFO: 2024-07-13 14:44:45,073: llmtf.base.darumeru/ruTiE: {'acc': 0.3511627906976744} |
| INFO: 2024-07-13 14:44:45,076: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-13 14:44:45,077: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 14:44:47,875: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.80s |
| INFO: 2024-07-13 14:44:55,693: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 7.80s |
| INFO: 2024-07-13 14:44:55,695: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: |
| INFO: 2024-07-13 14:44:55,700: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8761904761904762, 'f1_macro': 0.8733631471423589} |
| INFO: 2024-07-13 14:44:55,701: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-13 14:44:55,701: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 14:45:00,084: llmtf.base.darumeru/RWSD: Loading Dataset: 4.38s |
| INFO: 2024-07-13 14:45:19,405: llmtf.base.darumeru/RWSD: Processing Dataset: 19.32s |
| INFO: 2024-07-13 14:45:19,421: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: |
| INFO: 2024-07-13 14:45:19,425: llmtf.base.darumeru/RWSD: {'acc': 0.5441176470588235} |
| INFO: 2024-07-13 14:45:19,427: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-13 14:45:19,427: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 14:45:34,612: llmtf.base.darumeru/USE: Loading Dataset: 15.18s |
| INFO: 2024-07-13 14:46:14,635: llmtf.base.darumeru/cp_para_en: Processing Dataset: 251.85s |
| INFO: 2024-07-13 14:46:14,638: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: |
| INFO: 2024-07-13 14:46:14,657: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.485777628533072, 'len': 0.999455845790753, 'lcs': 0.9727731185644367} |
| INFO: 2024-07-13 14:46:14,658: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-13 14:46:14,684: llmtf.base.evaluator: |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree |
| 0.684 0.247 0.397 0.432 0.850 0.481 0.544 0.973 0.977 1.000 0.998 0.768 0.351 0.875 |
| INFO: 2024-07-13 14:48:58,982: llmtf.base.darumeru/USE: Processing Dataset: 204.37s |
| INFO: 2024-07-13 14:48:58,999: llmtf.base.darumeru/USE: Results for darumeru/USE: |
| INFO: 2024-07-13 14:48:59,004: llmtf.base.darumeru/USE: {'grade_norm': 0.18725490196078434} |
| INFO: 2024-07-13 14:48:59,010: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009] |
| INFO: 2024-07-13 14:48:59,010: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
| INFO: 2024-07-13 14:49:19,451: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 20.44s |
| INFO: 2024-07-13 14:50:14,250: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 1036.87s |
| INFO: 2024-07-13 14:50:14,255: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: |
| INFO: 2024-07-13 14:50:14,302: llmtf.base.nlpcoreteam/enMMLU: metric |
| subject |
| abstract_algebra 0.350000 |
| anatomy 0.696296 |
| astronomy 0.730263 |
| business_ethics 0.700000 |
| clinical_knowledge 0.754717 |
| college_biology 0.812500 |
| college_chemistry 0.500000 |
| college_computer_science 0.590000 |
| college_mathematics 0.330000 |
| college_medicine 0.670520 |
| college_physics 0.470588 |
| computer_security 0.780000 |
| conceptual_physics 0.570213 |
| econometrics 0.561404 |
| electrical_engineering 0.634483 |
| elementary_mathematics 0.439153 |
| formal_logic 0.507937 |
| global_facts 0.430000 |
| high_school_biology 0.800000 |
| high_school_chemistry 0.517241 |
| high_school_computer_science 0.760000 |
| high_school_european_history 0.787879 |
| high_school_geography 0.843434 |
| high_school_government_and_politics 0.922280 |
| high_school_macroeconomics 0.671795 |
| high_school_mathematics 0.381481 |
| high_school_microeconomics 0.764706 |
| high_school_physics 0.417219 |
| high_school_psychology 0.847706 |
| high_school_statistics 0.537037 |
| high_school_us_history 0.833333 |
| high_school_world_history 0.835443 |
| human_aging 0.730942 |
| human_sexuality 0.801527 |
| international_law 0.818182 |
| jurisprudence 0.759259 |
| logical_fallacies 0.766871 |
| machine_learning 0.544643 |
| management 0.825243 |
| marketing 0.901709 |
| medical_genetics 0.830000 |
| miscellaneous 0.842912 |
| moral_disputes 0.751445 |
| moral_scenarios 0.497207 |
| nutrition 0.754902 |
| philosophy 0.720257 |
| prehistory 0.753086 |
| professional_accounting 0.556738 |
| professional_law 0.483051 |
| professional_medicine 0.742647 |
| professional_psychology 0.717320 |
| public_relations 0.690909 |
| security_studies 0.722449 |
| sociology 0.840796 |
| us_foreign_policy 0.840000 |
| virology 0.512048 |
| world_religions 0.818713 |
| INFO: 2024-07-13 14:50:14,310: llmtf.base.nlpcoreteam/enMMLU: metric |
| subject |
| STEM 0.564712 |
| humanities 0.717897 |
| other (business, health, misc.) 0.710620 |
| social sciences 0.768694 |
| INFO: 2024-07-13 14:50:14,318: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6904807286717012} |
| INFO: 2024-07-13 14:50:14,385: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-13 14:50:14,399: llmtf.base.evaluator: |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU |
| 0.651 0.247 0.397 0.432 0.850 0.481 0.544 0.187 0.973 0.977 1.000 0.998 0.768 0.351 0.875 0.690 |
| INFO: 2024-07-13 14:51:55,784: llmtf.base.darumeru/ruMMLU: Processing Dataset: 1262.30s |
| INFO: 2024-07-13 14:51:55,788: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: |
| INFO: 2024-07-13 14:51:55,799: llmtf.base.darumeru/ruMMLU: {'acc': 0.5138182180983737} |
| INFO: 2024-07-13 14:51:55,888: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-13 14:51:55,906: llmtf.base.evaluator: |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU |
| 0.643 0.247 0.397 0.432 0.850 0.481 0.544 0.187 0.973 0.977 1.000 0.998 0.514 0.768 0.351 0.875 0.690 |
| INFO: 2024-07-13 14:52:18,001: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 178.55s |
| INFO: 2024-07-13 14:52:18,002: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: |
| INFO: 2024-07-13 14:52:18,035: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7115177610333692, 'mcc': 0.3362227509262135} |
| INFO: 2024-07-13 14:52:18,046: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-13 14:52:18,077: llmtf.base.evaluator: |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom |
| 0.636 0.247 0.397 0.432 0.850 0.481 0.544 0.187 0.973 0.977 1.000 0.998 0.514 0.768 0.351 0.875 0.690 0.524 |
| INFO: 2024-07-13 14:59:07,852: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 1522.57s |
| INFO: 2024-07-13 14:59:07,871: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: |
| INFO: 2024-07-13 14:59:07,917: llmtf.base.nlpcoreteam/ruMMLU: metric |
| subject |
| abstract_algebra 0.330000 |
| anatomy 0.511111 |
| astronomy 0.651316 |
| business_ethics 0.680000 |
| clinical_knowledge 0.588679 |
| college_biology 0.534722 |
| college_chemistry 0.480000 |
| college_computer_science 0.520000 |
| college_mathematics 0.350000 |
| college_medicine 0.549133 |
| college_physics 0.352941 |
| computer_security 0.720000 |
| conceptual_physics 0.540426 |
| econometrics 0.438596 |
| electrical_engineering 0.572414 |
| elementary_mathematics 0.417989 |
| formal_logic 0.396825 |
| global_facts 0.370000 |
| high_school_biology 0.664516 |
| high_school_chemistry 0.394089 |
| high_school_computer_science 0.690000 |
| high_school_european_history 0.763636 |
| high_school_geography 0.666667 |
| high_school_government_and_politics 0.647668 |
| high_school_macroeconomics 0.553846 |
| high_school_mathematics 0.348148 |
| high_school_microeconomics 0.546218 |
| high_school_physics 0.410596 |
| high_school_psychology 0.682569 |
| high_school_statistics 0.449074 |
| high_school_us_history 0.691176 |
| high_school_world_history 0.734177 |
| human_aging 0.538117 |
| human_sexuality 0.641221 |
| international_law 0.743802 |
| jurisprudence 0.657407 |
| logical_fallacies 0.558282 |
| machine_learning 0.401786 |
| management 0.689320 |
| marketing 0.730769 |
| medical_genetics 0.670000 |
| miscellaneous 0.650064 |
| moral_disputes 0.630058 |
| moral_scenarios 0.382123 |
| nutrition 0.604575 |
| philosophy 0.614148 |
| prehistory 0.574074 |
| professional_accounting 0.397163 |
| professional_law 0.397001 |
| professional_medicine 0.514706 |
| professional_psychology 0.514706 |
| public_relations 0.609091 |
| security_studies 0.657143 |
| sociology 0.676617 |
| us_foreign_policy 0.740000 |
| virology 0.457831 |
| world_religions 0.695906 |
| INFO: 2024-07-13 14:59:07,924: llmtf.base.nlpcoreteam/ruMMLU: metric |
| subject |
| STEM 0.490445 |
| humanities 0.602971 |
| other (business, health, misc.) 0.567962 |
| social sciences 0.614529 |
| INFO: 2024-07-13 14:59:07,947: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5689766403256171} |
| INFO: 2024-07-13 14:59:08,029: llmtf.base.evaluator: Ended eval |
| INFO: 2024-07-13 14:59:08,049: llmtf.base.evaluator: |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
| 0.632 0.247 0.397 0.432 0.850 0.481 0.544 0.187 0.973 0.977 1.000 0.998 0.514 0.768 0.351 0.875 0.690 0.569 0.524 |
|
|