RefalMachine's picture
Upload folder using huggingface_hub
0ce93cf verified
INFO: 2024-07-13 13:32:05,327: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-07-13 13:32:05,328: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 13:32:05,328: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 13:32:07,455: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-07-13 13:32:07,455: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 13:32:07,455: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 13:32:07,493: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-07-13 13:32:07,494: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 13:32:07,494: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 13:32:07,655: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-07-13 13:32:07,655: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 13:32:07,655: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 13:32:07,686: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-07-13 13:32:07,686: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 13:32:07,686: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 13:32:07,745: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-07-13 13:32:07,746: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 13:32:07,746: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 13:32:07,865: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-13 13:32:07,866: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 13:32:07,866: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 13:32:08,209: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 2.88s
INFO: 2024-07-13 13:32:11,413: llmtf.base.darumeru/MultiQ: Loading Dataset: 3.96s
INFO: 2024-07-13 13:32:12,529: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.66s
INFO: 2024-07-13 13:32:16,292: llmtf.base.darumeru/ruMMLU: Loading Dataset: 8.61s
INFO: 2024-07-13 13:32:19,927: llmtf.base.daru/treewayextractive: Loading Dataset: 12.27s
INFO: 2024-07-13 13:34:12,990: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 124.78s
INFO: 2024-07-13 13:34:12,992: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-07-13 13:34:12,996: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.82914536342449, 'len': 0.993343330079649, 'lcs': 0.953698746500658}
INFO: 2024-07-13 13:34:12,997: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 13:34:12,998: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 13:34:15,015: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 2.02s
INFO: 2024-07-13 13:34:20,882: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 133.39s
INFO: 2024-07-13 13:34:23,420: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 135.67s
INFO: 2024-07-13 13:35:39,473: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 84.46s
INFO: 2024-07-13 13:35:39,476: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-07-13 13:35:39,481: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.424907714143083, 'len': 0.9996416196590585, 'lcs': 0.995460815828734}
INFO: 2024-07-13 13:35:39,482: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 13:35:39,482: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 13:35:41,571: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.09s
INFO: 2024-07-13 13:36:31,701: llmtf.base.darumeru/MultiQ: Processing Dataset: 260.29s
INFO: 2024-07-13 13:36:31,704: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-07-13 13:36:31,722: llmtf.base.darumeru/MultiQ: {'f1': 0.3347231559740819, 'em': 0.2055449330783939}
INFO: 2024-07-13 13:36:31,726: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 13:36:31,726: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 13:36:33,871: llmtf.base.darumeru/PARus: Loading Dataset: 2.14s
INFO: 2024-07-13 13:36:36,545: llmtf.base.darumeru/PARus: Processing Dataset: 2.67s
INFO: 2024-07-13 13:36:36,562: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-07-13 13:36:36,574: llmtf.base.darumeru/PARus: {'acc': 0.64}
INFO: 2024-07-13 13:36:36,575: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 13:36:36,575: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 13:36:38,646: llmtf.base.darumeru/RCB: Loading Dataset: 2.07s
INFO: 2024-07-13 13:36:44,261: llmtf.base.darumeru/RCB: Processing Dataset: 5.61s
INFO: 2024-07-13 13:36:44,263: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-07-13 13:36:44,269: llmtf.base.darumeru/RCB: {'acc': 0.4954545454545455, 'f1_macro': 0.42697840772670775}
INFO: 2024-07-13 13:36:44,270: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 13:36:44,270: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 13:36:47,516: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 3.25s
INFO: 2024-07-13 13:37:20,884: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 33.37s
INFO: 2024-07-13 13:37:20,885: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-07-13 13:37:20,912: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.6911512027491409, 'f1_macro': 0.6914435564575607}
INFO: 2024-07-13 13:37:20,918: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 13:37:20,918: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 13:37:28,618: llmtf.base.darumeru/ruTiE: Loading Dataset: 7.70s
INFO: 2024-07-13 13:38:01,706: llmtf.base.daru/treewayabstractive: Processing Dataset: 349.18s
INFO: 2024-07-13 13:38:01,711: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-07-13 13:38:01,715: llmtf.base.daru/treewayabstractive: {'rouge1': 0.35425172563213586, 'rouge2': 0.12878361258702994}
INFO: 2024-07-13 13:38:01,717: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 13:38:01,742: llmtf.base.evaluator:
mean daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA
0.614 0.242 0.270 0.640 0.461 1.000 0.993 0.691
INFO: 2024-07-13 13:38:18,320: llmtf.base.darumeru/ruTiE: Processing Dataset: 49.68s
ERROR: 2024-07-13 13:38:18,323: llmtf.base.evaluator: CUDA out of memory. Tried to allocate 29.55 GiB. GPU
ERROR: 2024-07-13 13:38:18,344: llmtf.base.evaluator: Traceback (most recent call last):
File "/scratch/tikhomirov/workdir/projects/llmtf_open/llmtf/evaluator.py", line 42, in evaluate
self.evaluate_dataset(task, model, output_dir, prompt_max_len, few_shot_count, generation_config, batch_size, max_sample_per_dataset)
File "/scratch/tikhomirov/workdir/projects/llmtf_open/llmtf/evaluator.py", line 65, in evaluate_dataset
prompts, y_preds, infos = getattr(model, task.method + '_batch')(**messages_batch)
File "/scratch/tikhomirov/workdir/projects/llmtf_open/llmtf/model.py", line 366, in calculate_tokens_proba_batch
outputs = self.model(**data)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1196, in forward
logits = logits.float()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 29.55 GiB. GPU
INFO: 2024-07-13 13:38:40,987: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 260.10s
INFO: 2024-07-13 13:38:40,988: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-07-13 13:38:41,032: llmtf.base.nlpcoreteam/enMMLU: metric
subject
abstract_algebra 0.360000
anatomy 0.718519
astronomy 0.736842
business_ethics 0.730000
clinical_knowledge 0.735849
college_biology 0.791667
college_chemistry 0.470000
college_computer_science 0.600000
college_mathematics 0.300000
college_medicine 0.647399
college_physics 0.490196
computer_security 0.760000
conceptual_physics 0.574468
econometrics 0.517544
electrical_engineering 0.606897
elementary_mathematics 0.481481
formal_logic 0.523810
global_facts 0.430000
high_school_biology 0.806452
high_school_chemistry 0.551724
high_school_computer_science 0.730000
high_school_european_history 0.733333
high_school_geography 0.828283
high_school_government_and_politics 0.865285
high_school_macroeconomics 0.630769
high_school_mathematics 0.374074
high_school_microeconomics 0.747899
high_school_physics 0.410596
high_school_psychology 0.856881
high_school_statistics 0.546296
high_school_us_history 0.828431
high_school_world_history 0.839662
human_aging 0.721973
human_sexuality 0.778626
international_law 0.760331
jurisprudence 0.787037
logical_fallacies 0.785276
machine_learning 0.464286
management 0.805825
marketing 0.893162
medical_genetics 0.780000
miscellaneous 0.840358
moral_disputes 0.687861
moral_scenarios 0.293855
nutrition 0.764706
philosophy 0.717042
prehistory 0.700617
professional_accounting 0.539007
professional_law 0.482399
professional_medicine 0.738971
professional_psychology 0.676471
public_relations 0.645455
security_studies 0.714286
sociology 0.825871
us_foreign_policy 0.890000
virology 0.487952
world_religions 0.830409
INFO: 2024-07-13 13:38:41,039: llmtf.base.nlpcoreteam/enMMLU: metric
subject
STEM 0.558610
humanities 0.690005
other (business, health, misc.) 0.702409
social sciences 0.748114
INFO: 2024-07-13 13:38:41,047: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6747843602567992}
INFO: 2024-07-13 13:38:41,076: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 13:38:41,085: llmtf.base.evaluator:
mean daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA nlpcoreteam/enMMLU
0.621 0.242 0.270 0.640 0.461 1.000 0.993 0.691 0.675
INFO: 2024-07-13 13:38:44,800: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 183.23s
INFO: 2024-07-13 13:38:44,802: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-07-13 13:38:44,806: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.9681178664729675, 'len': 0.9946813313624652, 'lcs': 0.9149641417867646}
INFO: 2024-07-13 13:38:44,806: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [128001, 128009]
INFO: 2024-07-13 13:38:44,806: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-13 13:38:46,864: llmtf.base.darumeru/cp_para_en: Loading Dataset: 2.06s
INFO: 2024-07-13 13:39:03,671: llmtf.base.darumeru/ruMMLU: Processing Dataset: 407.38s
INFO: 2024-07-13 13:39:03,675: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-07-13 13:39:03,682: llmtf.base.darumeru/ruMMLU: {'acc': 0.5040407063753367}
INFO: 2024-07-13 13:39:03,716: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 13:39:03,724: llmtf.base.evaluator:
mean daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA nlpcoreteam/enMMLU
0.639 0.242 0.270 0.640 0.461 0.915 1.000 0.993 0.504 0.691 0.675
INFO: 2024-07-13 13:39:21,125: llmtf.base.daru/treewayextractive: Processing Dataset: 421.20s
INFO: 2024-07-13 13:39:21,127: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-07-13 13:39:21,343: llmtf.base.daru/treewayextractive: {'r-prec': 0.39497193362193367}
INFO: 2024-07-13 13:39:21,388: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 13:39:21,397: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA nlpcoreteam/enMMLU
0.617 0.242 0.395 0.270 0.640 0.461 0.915 1.000 0.993 0.504 0.691 0.675
INFO: 2024-07-13 13:40:36,169: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 372.74s
INFO: 2024-07-13 13:40:36,171: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-07-13 13:40:36,214: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
abstract_algebra 0.290000
anatomy 0.459259
astronomy 0.657895
business_ethics 0.600000
clinical_knowledge 0.562264
college_biology 0.541667
college_chemistry 0.400000
college_computer_science 0.460000
college_mathematics 0.320000
college_medicine 0.497110
college_physics 0.352941
computer_security 0.570000
conceptual_physics 0.472340
econometrics 0.359649
electrical_engineering 0.544828
elementary_mathematics 0.417989
formal_logic 0.396825
global_facts 0.350000
high_school_biology 0.632258
high_school_chemistry 0.418719
high_school_computer_science 0.610000
high_school_european_history 0.715152
high_school_geography 0.656566
high_school_government_and_politics 0.595855
high_school_macroeconomics 0.512821
high_school_mathematics 0.333333
high_school_microeconomics 0.500000
high_school_physics 0.350993
high_school_psychology 0.667890
high_school_statistics 0.462963
high_school_us_history 0.656863
high_school_world_history 0.713080
human_aging 0.547085
human_sexuality 0.648855
international_law 0.702479
jurisprudence 0.592593
logical_fallacies 0.527607
machine_learning 0.357143
management 0.669903
marketing 0.700855
medical_genetics 0.560000
miscellaneous 0.641124
moral_disputes 0.560694
moral_scenarios 0.251397
nutrition 0.594771
philosophy 0.565916
prehistory 0.561728
professional_accounting 0.386525
professional_law 0.356584
professional_medicine 0.518382
professional_psychology 0.482026
public_relations 0.572727
security_studies 0.620408
sociology 0.696517
us_foreign_policy 0.750000
virology 0.421687
world_religions 0.690058
INFO: 2024-07-13 13:40:36,222: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
STEM 0.455171
humanities 0.560844
other (business, health, misc.) 0.536355
social sciences 0.588610
INFO: 2024-07-13 13:40:36,259: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5352447672872421}
INFO: 2024-07-13 13:40:36,291: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 13:40:36,320: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA nlpcoreteam/enMMLU nlpcoreteam/ruMMLU
0.610 0.242 0.395 0.270 0.640 0.461 0.915 1.000 0.993 0.504 0.691 0.675 0.535
INFO: 2024-07-13 13:40:51,661: llmtf.base.darumeru/cp_para_en: Processing Dataset: 124.80s
INFO: 2024-07-13 13:40:51,663: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-07-13 13:40:51,667: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.463061170262149, 'len': 0.9941296296409974, 'lcs': 0.9527227031116661}
INFO: 2024-07-13 13:40:51,667: llmtf.base.evaluator: Ended eval
INFO: 2024-07-13 13:40:51,674: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA nlpcoreteam/enMMLU nlpcoreteam/ruMMLU
0.636 0.242 0.395 0.270 0.640 0.461 0.953 0.915 1.000 0.993 0.504 0.691 0.675 0.535