| /egr/research-optml/wangc168/anaconda3/envs/SOUL/lib/python3.9/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. |
| warnings.warn( |
| 2026-04-03:16:35:59,252 INFO [__main__.py:279] Verbosity set to INFO |
| 2026-04-03:16:35:59,682 INFO [__init__.py:491] `group` and `group_alias` keys in TaskConfigs are deprecated and will be removed in v0.4.5 of lm_eval. The new `tag` field will be used to allow for a shortcut to a group of tasks one does not wish to aggregate metrics across. `group`s which aggregate across subtasks must be only defined in a separate group config file, which will be the official way to create groups that support cross-task aggregation as in `mmlu`. Please see the v0.4.4 patch notes and our documentation: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/new_task_guide.md#advanced-group-configs for more information. |
| 2026-04-03:16:36:07,612 INFO [__main__.py:376] Selected Tasks: ['mmlu'] |
| 2026-04-03:16:36:07,615 INFO [evaluator.py:161] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 |
| 2026-04-03:16:36:07,615 INFO [evaluator.py:198] Initializing hf model, with arguments: {'pretrained': '/egr/research-optml/wangc168/Muon_wmdp/rmu_wmdp/models/muon_v1_test_V1_K64_batches150_bs4_alpha1200-1200_steer15-15_muonlr2.6e-3_muonmom0.95_muonwd0.1_adamlr5e-5_seed42_layer7_layers5-6-7_params6'} |
| 2026-04-03:16:36:07,760 INFO [huggingface.py:130] Using device 'cuda:0' |
| 2026-04-03:16:36:07,992 INFO [huggingface.py:366] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:0'} |
|
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 33%|ββββ | 1/3 [00:01<00:02, 1.13s/it]
Loading checkpoint shards: 67%|βββββββ | 2/3 [00:02<00:01, 1.11s/it]
Loading checkpoint shards: 100%|ββββββββββ| 3/3 [00:03<00:00, 1.03s/it]
Loading checkpoint shards: 100%|ββββββββββ| 3/3 [00:03<00:00, 1.05s/it] |
| 2026-04-03:16:37:53,518 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,518 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,518 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,519 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,519 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,519 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,519 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,519 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,519 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,519 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,519 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,519 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,519 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,519 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,519 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,519 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,519 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,519 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,519 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,519 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,519 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,519 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,520 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,520 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,520 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,520 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,520 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,520 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,520 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,520 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,520 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,520 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,520 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,520 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,520 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,520 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,520 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,520 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,520 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,520 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,520 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,521 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,521 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,521 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,521 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,521 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,521 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,521 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,521 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,521 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,521 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,521 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,521 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,521 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,521 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,521 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,521 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:37:53,521 WARNING [model.py:422] model.chat_template was called with the chat_template set to False or None. Therefore no chat template will be applied. Make sure this is an intended behavior. |
| 2026-04-03:16:37:53,529 INFO [task.py:423] Building contexts for mmlu_high_school_mathematics on rank 0... |
|
0%| | 0/270 [00:00<?, ?it/s]
30%|βββ | 82/270 [00:00<00:00, 819.04it/s]
74%|ββββββββ | 200/270 [00:00<00:00, 1029.54it/s]
100%|ββββββββββ| 270/270 [00:00<00:00, 1038.65it/s] |
| 2026-04-03:16:37:53,804 INFO [task.py:423] Building contexts for mmlu_college_chemistry on rank 0... |
|
0%| | 0/100 [00:00<?, ?it/s]
100%|ββββββββββ| 100/100 [00:00<00:00, 1178.71it/s] |
| 2026-04-03:16:37:53,891 INFO [task.py:423] Building contexts for mmlu_computer_security on rank 0... |
|
0%| | 0/100 [00:00<?, ?it/s]
100%|ββββββββββ| 100/100 [00:00<00:00, 1190.39it/s] |
| 2026-04-03:16:37:53,978 INFO [task.py:423] Building contexts for mmlu_conceptual_physics on rank 0... |
|
0%| | 0/235 [00:00<?, ?it/s]
51%|βββββ | 120/235 [00:00<00:00, 1197.33it/s]
100%|ββββββββββ| 235/235 [00:00<00:00, 1192.51it/s] |
| 2026-04-03:16:37:54,181 INFO [task.py:423] Building contexts for mmlu_college_biology on rank 0... |
|
0%| | 0/144 [00:00<?, ?it/s]
83%|βββββββββ | 120/144 [00:00<00:00, 1193.77it/s]
100%|ββββββββββ| 144/144 [00:00<00:00, 1193.02it/s] |
| 2026-04-03:16:37:54,306 INFO [task.py:423] Building contexts for mmlu_college_computer_science on rank 0... |
|
0%| | 0/100 [00:00<?, ?it/s]
100%|ββββββββββ| 100/100 [00:00<00:00, 1187.27it/s] |
| 2026-04-03:16:37:54,392 INFO [task.py:423] Building contexts for mmlu_abstract_algebra on rank 0... |
|
0%| | 0/100 [00:00<?, ?it/s]
100%|ββββββββββ| 100/100 [00:00<00:00, 1204.80it/s] |
| 2026-04-03:16:37:54,478 INFO [task.py:423] Building contexts for mmlu_machine_learning on rank 0... |
|
0%| | 0/112 [00:00<?, ?it/s]
100%|ββββββββββ| 112/112 [00:00<00:00, 1202.25it/s] |
| 2026-04-03:16:37:54,575 INFO [task.py:423] Building contexts for mmlu_high_school_computer_science on rank 0... |
|
0%| | 0/100 [00:00<?, ?it/s]
100%|ββββββββββ| 100/100 [00:00<00:00, 1191.12it/s] |
| 2026-04-03:16:37:54,661 INFO [task.py:423] Building contexts for mmlu_college_physics on rank 0... |
|
0%| | 0/102 [00:00<?, ?it/s]
100%|ββββββββββ| 102/102 [00:00<00:00, 1194.07it/s] |
| 2026-04-03:16:37:54,750 INFO [task.py:423] Building contexts for mmlu_high_school_statistics on rank 0... |
|
0%| | 0/216 [00:00<?, ?it/s]
55%|ββββββ | 119/216 [00:00<00:00, 1186.72it/s]
100%|ββββββββββ| 216/216 [00:00<00:00, 1187.95it/s] |
| 2026-04-03:16:37:54,937 INFO [task.py:423] Building contexts for mmlu_high_school_physics on rank 0... |
|
0%| | 0/151 [00:00<?, ?it/s]
79%|ββββββββ | 119/151 [00:00<00:00, 1189.56it/s]
100%|ββββββββββ| 151/151 [00:00<00:00, 1188.83it/s] |
| 2026-04-03:16:37:55,068 INFO [task.py:423] Building contexts for mmlu_elementary_mathematics on rank 0... |
|
0%| | 0/378 [00:00<?, ?it/s]
31%|βββ | 118/378 [00:00<00:00, 1179.87it/s]
63%|βββββββ | 237/378 [00:00<00:00, 1185.01it/s]
94%|ββββββββββ| 356/378 [00:00<00:00, 1184.24it/s]
100%|ββββββββββ| 378/378 [00:00<00:00, 1183.84it/s] |
| 2026-04-03:16:37:55,396 INFO [task.py:423] Building contexts for mmlu_electrical_engineering on rank 0... |
|
0%| | 0/145 [00:00<?, ?it/s]
83%|βββββββββ | 120/145 [00:00<00:00, 1198.54it/s]
100%|ββββββββββ| 145/145 [00:00<00:00, 1195.24it/s] |
| 2026-04-03:16:37:55,521 INFO [task.py:423] Building contexts for mmlu_high_school_biology on rank 0... |
|
0%| | 0/310 [00:00<?, ?it/s]
39%|ββββ | 120/310 [00:00<00:00, 1194.88it/s]
77%|ββββββββ | 240/310 [00:00<00:00, 1189.11it/s]
100%|ββββββββββ| 310/310 [00:00<00:00, 1189.86it/s] |
| 2026-04-03:16:37:55,790 INFO [task.py:423] Building contexts for mmlu_astronomy on rank 0... |
|
0%| | 0/152 [00:00<?, ?it/s]
79%|ββββββββ | 120/152 [00:00<00:00, 1194.28it/s]
100%|ββββββββββ| 152/152 [00:00<00:00, 1191.56it/s] |
| 2026-04-03:16:37:55,921 INFO [task.py:423] Building contexts for mmlu_anatomy on rank 0... |
|
0%| | 0/135 [00:00<?, ?it/s]
89%|βββββββββ | 120/135 [00:00<00:00, 1199.04it/s]
100%|ββββββββββ| 135/135 [00:00<00:00, 1196.98it/s] |
| 2026-04-03:16:37:56,038 INFO [task.py:423] Building contexts for mmlu_college_mathematics on rank 0... |
|
0%| | 0/100 [00:00<?, ?it/s]
100%|ββββββββββ| 100/100 [00:00<00:00, 1187.91it/s] |
| 2026-04-03:16:37:56,124 INFO [task.py:423] Building contexts for mmlu_high_school_chemistry on rank 0... |
|
0%| | 0/203 [00:00<?, ?it/s]
59%|ββββββ | 120/203 [00:00<00:00, 1194.98it/s]
100%|ββββββββββ| 203/203 [00:00<00:00, 1195.32it/s] |
| 2026-04-03:16:37:56,299 INFO [task.py:423] Building contexts for mmlu_management on rank 0... |
|
0%| | 0/103 [00:00<?, ?it/s]
100%|ββββββββββ| 103/103 [00:00<00:00, 1196.33it/s] |
| 2026-04-03:16:37:56,388 INFO [task.py:423] Building contexts for mmlu_clinical_knowledge on rank 0... |
|
0%| | 0/265 [00:00<?, ?it/s]
45%|βββββ | 120/265 [00:00<00:00, 1190.62it/s]
91%|βββββββββ | 240/265 [00:00<00:00, 1188.99it/s]
100%|ββββββββββ| 265/265 [00:00<00:00, 1188.80it/s] |
| 2026-04-03:16:37:56,618 INFO [task.py:423] Building contexts for mmlu_professional_medicine on rank 0... |
|
0%| | 0/272 [00:00<?, ?it/s]
44%|βββββ | 120/272 [00:00<00:00, 1192.97it/s]
88%|βββββββββ | 240/272 [00:00<00:00, 1191.79it/s]
100%|ββββββββββ| 272/272 [00:00<00:00, 1191.97it/s] |
| 2026-04-03:16:37:56,853 INFO [task.py:423] Building contexts for mmlu_human_aging on rank 0... |
|
0%| | 0/223 [00:00<?, ?it/s]
33%|ββββ | 74/223 [00:00<00:00, 340.71it/s]
87%|βββββββββ | 193/223 [00:00<00:00, 676.52it/s]
100%|ββββββββββ| 223/223 [00:00<00:00, 650.37it/s] |
| 2026-04-03:16:37:57,201 INFO [task.py:423] Building contexts for mmlu_professional_accounting on rank 0... |
|
0%| | 0/282 [00:00<?, ?it/s]
42%|βββββ | 119/282 [00:00<00:00, 1189.22it/s]
84%|βββββββββ | 238/282 [00:00<00:00, 1186.54it/s]
100%|ββββββββββ| 282/282 [00:00<00:00, 1185.98it/s] |
| 2026-04-03:16:37:57,447 INFO [task.py:423] Building contexts for mmlu_college_medicine on rank 0... |
|
0%| | 0/173 [00:00<?, ?it/s]
69%|βββββββ | 119/173 [00:00<00:00, 1189.56it/s]
100%|ββββββββββ| 173/173 [00:00<00:00, 1191.94it/s] |
| 2026-04-03:16:37:57,596 INFO [task.py:423] Building contexts for mmlu_medical_genetics on rank 0... |
|
0%| | 0/100 [00:00<?, ?it/s]
100%|ββββββββββ| 100/100 [00:00<00:00, 1192.51it/s] |
| 2026-04-03:16:37:57,683 INFO [task.py:423] Building contexts for mmlu_miscellaneous on rank 0... |
|
0%| | 0/783 [00:00<?, ?it/s]
15%|ββ | 120/783 [00:00<00:00, 1197.11it/s]
31%|βββ | 240/783 [00:00<00:00, 1193.18it/s]
46%|βββββ | 360/783 [00:00<00:00, 1190.30it/s]
61%|βββββββ | 480/783 [00:00<00:00, 1191.35it/s]
77%|ββββββββ | 600/783 [00:00<00:00, 1189.42it/s]
92%|ββββββββββ| 720/783 [00:00<00:00, 1190.78it/s]
100%|ββββββββββ| 783/783 [00:00<00:00, 1190.75it/s] |
| 2026-04-03:16:37:58,358 INFO [task.py:423] Building contexts for mmlu_global_facts on rank 0... |
|
0%| | 0/100 [00:00<?, ?it/s]
100%|ββββββββββ| 100/100 [00:00<00:00, 1197.45it/s] |
| 2026-04-03:16:37:58,445 INFO [task.py:423] Building contexts for mmlu_virology on rank 0... |
|
0%| | 0/166 [00:00<?, ?it/s]
72%|ββββββββ | 120/166 [00:00<00:00, 1198.06it/s]
100%|ββββββββββ| 166/166 [00:00<00:00, 1197.94it/s] |
| 2026-04-03:16:37:58,587 INFO [task.py:423] Building contexts for mmlu_business_ethics on rank 0... |
|
0%| | 0/100 [00:00<?, ?it/s]
100%|ββββββββββ| 100/100 [00:00<00:00, 1196.69it/s] |
| 2026-04-03:16:37:58,674 INFO [task.py:423] Building contexts for mmlu_nutrition on rank 0... |
|
0%| | 0/306 [00:00<?, ?it/s]
39%|ββββ | 120/306 [00:00<00:00, 1198.40it/s]
78%|ββββββββ | 240/306 [00:00<00:00, 1194.25it/s]
100%|ββββββββββ| 306/306 [00:00<00:00, 1193.85it/s] |
| 2026-04-03:16:37:58,937 INFO [task.py:423] Building contexts for mmlu_marketing on rank 0... |
|
0%| | 0/234 [00:00<?, ?it/s]
51%|ββββββ | 120/234 [00:00<00:00, 1194.33it/s]
100%|ββββββββββ| 234/234 [00:00<00:00, 1194.43it/s] |
| 2026-04-03:16:37:59,139 INFO [task.py:423] Building contexts for mmlu_professional_psychology on rank 0... |
|
0%| | 0/612 [00:00<?, ?it/s]
20%|ββ | 120/612 [00:00<00:00, 1190.56it/s]
39%|ββββ | 240/612 [00:00<00:00, 1191.50it/s]
59%|ββββββ | 360/612 [00:00<00:00, 1193.22it/s]
78%|ββββββββ | 480/612 [00:00<00:00, 1192.31it/s]
98%|ββββββββββ| 600/612 [00:00<00:00, 1193.37it/s]
100%|ββββββββββ| 612/612 [00:00<00:00, 1192.52it/s] |
| 2026-04-03:16:37:59,667 INFO [task.py:423] Building contexts for mmlu_high_school_geography on rank 0... |
|
0%| | 0/198 [00:00<?, ?it/s]
61%|ββββββ | 120/198 [00:00<00:00, 1190.72it/s]
100%|ββββββββββ| 198/198 [00:00<00:00, 1190.10it/s] |
| 2026-04-03:16:37:59,838 INFO [task.py:423] Building contexts for mmlu_high_school_psychology on rank 0... |
|
0%| | 0/545 [00:00<?, ?it/s]
22%|βββ | 121/545 [00:00<00:00, 1200.06it/s]
44%|βββββ | 242/545 [00:00<00:00, 1191.97it/s]
66%|βββββββ | 362/545 [00:00<00:00, 1192.51it/s]
88%|βββββββββ | 482/545 [00:00<00:00, 1191.55it/s]
100%|ββββββββββ| 545/545 [00:00<00:00, 1191.37it/s] |
| 2026-04-03:16:38:00,309 INFO [task.py:423] Building contexts for mmlu_us_foreign_policy on rank 0... |
|
0%| | 0/100 [00:00<?, ?it/s]
100%|ββββββββββ| 100/100 [00:00<00:00, 1186.03it/s] |
| 2026-04-03:16:38:00,396 INFO [task.py:423] Building contexts for mmlu_high_school_microeconomics on rank 0... |
|
0%| | 0/238 [00:00<?, ?it/s]
50%|βββββ | 120/238 [00:00<00:00, 1191.20it/s]
100%|ββββββββββ| 238/238 [00:00<00:00, 1190.68it/s] |
| 2026-04-03:16:38:00,602 INFO [task.py:423] Building contexts for mmlu_sociology on rank 0... |
|
0%| | 0/201 [00:00<?, ?it/s]
60%|ββββββ | 120/201 [00:00<00:00, 1191.49it/s]
100%|ββββββββββ| 201/201 [00:00<00:00, 1189.86it/s] |
| 2026-04-03:16:38:00,776 INFO [task.py:423] Building contexts for mmlu_security_studies on rank 0... |
|
0%| | 0/245 [00:00<?, ?it/s]
49%|βββββ | 119/245 [00:00<00:00, 1188.33it/s]
97%|ββββββββββ| 238/245 [00:00<00:00, 1187.83it/s]
100%|ββββββββββ| 245/245 [00:00<00:00, 1186.74it/s] |
| 2026-04-03:16:38:00,989 INFO [task.py:423] Building contexts for mmlu_public_relations on rank 0... |
|
0%| | 0/110 [00:00<?, ?it/s]
100%|ββββββββββ| 110/110 [00:00<00:00, 1199.83it/s] |
| 2026-04-03:16:38:01,083 INFO [task.py:423] Building contexts for mmlu_human_sexuality on rank 0... |
|
0%| | 0/131 [00:00<?, ?it/s]
92%|ββββββββββ| 120/131 [00:00<00:00, 1198.52it/s]
100%|ββββββββββ| 131/131 [00:00<00:00, 1196.95it/s] |
| 2026-04-03:16:38:01,196 INFO [task.py:423] Building contexts for mmlu_high_school_government_and_politics on rank 0... |
|
0%| | 0/193 [00:00<?, ?it/s]
62%|βββββββ | 119/193 [00:00<00:00, 1188.54it/s]
100%|ββββββββββ| 193/193 [00:00<00:00, 1188.85it/s] |
| 2026-04-03:16:38:01,364 INFO [task.py:423] Building contexts for mmlu_econometrics on rank 0... |
|
0%| | 0/114 [00:00<?, ?it/s]
100%|ββββββββββ| 114/114 [00:00<00:00, 1193.03it/s] |
| 2026-04-03:16:38:01,462 INFO [task.py:423] Building contexts for mmlu_high_school_macroeconomics on rank 0... |
|
0%| | 0/390 [00:00<?, ?it/s]
31%|βββ | 121/390 [00:00<00:00, 1200.51it/s]
62%|βββββββ | 242/390 [00:00<00:00, 1193.08it/s]
93%|ββββββββββ| 362/390 [00:00<00:00, 1192.32it/s]
100%|ββββββββββ| 390/390 [00:00<00:00, 1191.50it/s] |
| 2026-04-03:16:38:01,799 INFO [task.py:423] Building contexts for mmlu_formal_logic on rank 0... |
|
0%| | 0/126 [00:00<?, ?it/s]
94%|ββββββββββ| 119/126 [00:00<00:00, 1187.81it/s]
100%|ββββββββββ| 126/126 [00:00<00:00, 1186.32it/s] |
| 2026-04-03:16:38:01,908 INFO [task.py:423] Building contexts for mmlu_moral_disputes on rank 0... |
|
0%| | 0/346 [00:00<?, ?it/s]
34%|ββββ | 119/346 [00:00<00:00, 1186.30it/s]
69%|βββββββ | 239/346 [00:00<00:00, 1189.05it/s]
100%|ββββββββββ| 346/346 [00:00<00:00, 1187.89it/s] |
| 2026-04-03:16:38:02,208 INFO [task.py:423] Building contexts for mmlu_moral_scenarios on rank 0... |
|
0%| | 0/895 [00:00<?, ?it/s]
13%|ββ | 120/895 [00:00<00:00, 1190.24it/s]
27%|βββ | 240/895 [00:00<00:00, 1191.14it/s]
40%|ββββ | 360/895 [00:00<00:00, 1189.38it/s]
54%|ββββββ | 480/895 [00:00<00:00, 1190.82it/s]
67%|βββββββ | 600/895 [00:00<00:00, 1187.94it/s]
80%|ββββββββ | 719/895 [00:00<00:00, 1187.12it/s]
94%|ββββββββββ| 839/895 [00:00<00:00, 1189.43it/s]
100%|ββββββββββ| 895/895 [00:00<00:00, 1188.78it/s] |
| 2026-04-03:16:38:02,982 INFO [task.py:423] Building contexts for mmlu_jurisprudence on rank 0... |
|
0%| | 0/108 [00:00<?, ?it/s]
100%|ββββββββββ| 108/108 [00:00<00:00, 1196.02it/s] |
| 2026-04-03:16:38:03,075 INFO [task.py:423] Building contexts for mmlu_high_school_european_history on rank 0... |
|
0%| | 0/165 [00:00<?, ?it/s]
59%|ββββββ | 97/165 [00:00<00:00, 316.18it/s]
100%|ββββββββββ| 165/165 [00:00<00:00, 451.69it/s] |
| 2026-04-03:16:38:03,445 INFO [task.py:423] Building contexts for mmlu_professional_law on rank 0... |
|
0%| | 0/1534 [00:00<?, ?it/s]
8%|β | 119/1534 [00:00<00:01, 1181.94it/s]
16%|ββ | 238/1534 [00:00<00:01, 1183.71it/s]
23%|βββ | 357/1534 [00:00<00:00, 1185.37it/s]
31%|βββ | 477/1534 [00:00<00:00, 1188.98it/s]
39%|ββββ | 597/1534 [00:00<00:00, 1189.89it/s]
47%|βββββ | 717/1534 [00:00<00:00, 1192.61it/s]
55%|ββββββ | 837/1534 [00:00<00:00, 1191.78it/s]
62%|βββββββ | 957/1534 [00:00<00:00, 1192.38it/s]
70%|βββββββ | 1077/1534 [00:00<00:00, 1192.50it/s]
78%|ββββββββ | 1197/1534 [00:01<00:00, 1190.77it/s]
86%|βββββββββ | 1317/1534 [00:01<00:00, 1190.94it/s]
94%|ββββββββββ| 1437/1534 [00:01<00:00, 1190.67it/s]
100%|ββββββββββ| 1534/1534 [00:01<00:00, 1190.33it/s] |
| 2026-04-03:16:38:04,772 INFO [task.py:423] Building contexts for mmlu_prehistory on rank 0... |
|
0%| | 0/324 [00:00<?, ?it/s]
37%|ββββ | 119/324 [00:00<00:00, 1181.62it/s]
73%|ββββββββ | 238/324 [00:00<00:00, 1184.06it/s]
100%|ββββββββββ| 324/324 [00:00<00:00, 1184.38it/s] |
| 2026-04-03:16:38:05,054 INFO [task.py:423] Building contexts for mmlu_world_religions on rank 0... |
|
0%| | 0/171 [00:00<?, ?it/s]
70%|βββββββ | 120/171 [00:00<00:00, 1191.27it/s]
100%|ββββββββββ| 171/171 [00:00<00:00, 1191.67it/s] |
| 2026-04-03:16:38:05,202 INFO [task.py:423] Building contexts for mmlu_high_school_world_history on rank 0... |
|
0%| | 0/237 [00:00<?, ?it/s]
50%|βββββ | 119/237 [00:00<00:00, 1188.77it/s]
100%|ββββββββββ| 237/237 [00:00<00:00, 1186.20it/s] |
| 2026-04-03:16:38:05,408 INFO [task.py:423] Building contexts for mmlu_international_law on rank 0... |
|
0%| | 0/121 [00:00<?, ?it/s]
99%|ββββββββββ| 120/121 [00:00<00:00, 1193.52it/s]
100%|ββββββββββ| 121/121 [00:00<00:00, 1191.33it/s] |
| 2026-04-03:16:38:05,513 INFO [task.py:423] Building contexts for mmlu_logical_fallacies on rank 0... |
|
0%| | 0/163 [00:00<?, ?it/s]
74%|ββββββββ | 121/163 [00:00<00:00, 1205.56it/s]
100%|ββββββββββ| 163/163 [00:00<00:00, 1202.62it/s] |
| 2026-04-03:16:38:05,652 INFO [task.py:423] Building contexts for mmlu_philosophy on rank 0... |
|
0%| | 0/311 [00:00<?, ?it/s]
39%|ββββ | 121/311 [00:00<00:00, 1201.64it/s]
78%|ββββββββ | 242/311 [00:00<00:00, 1195.51it/s]
100%|ββββββββββ| 311/311 [00:00<00:00, 1196.14it/s] |
| 2026-04-03:16:38:05,920 INFO [task.py:423] Building contexts for mmlu_high_school_us_history on rank 0... |
|
0%| | 0/204 [00:00<?, ?it/s]
58%|ββββββ | 118/204 [00:00<00:00, 1174.86it/s]
100%|ββββββββββ| 204/204 [00:00<00:00, 1177.55it/s] |
| 2026-04-03:16:38:06,099 INFO [evaluator.py:465] Running loglikelihood requests |
|
Running loglikelihood requests: 0%| | 0/56168 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache) |
|
Running loglikelihood requests: 0%| | 1/56168 [00:01<16:39:24, 1.07s/it]
Running loglikelihood requests: 0%| | 65/56168 [00:01<18:32, 50.45it/s]
Running loglikelihood requests: 0%| | 129/56168 [00:02<11:48, 79.12it/s]
Running loglikelihood requests: 0%| | 193/56168 [00:02<09:29, 98.35it/s]
Running loglikelihood requests: 0%| | 257/56168 [00:02<08:15, 112.82it/s]
Running loglikelihood requests: 1%| | 321/56168 [00:03<07:32, 123.51it/s]
Running loglikelihood requests: 1%| | 385/56168 [00:03<07:01, 132.40it/s]
Running loglikelihood requests: 1%| | 449/56168 [00:04<06:38, 139.96it/s]
Running loglikelihood requests: 1%| | 513/56168 [00:04<06:21, 145.87it/s]
Running loglikelihood requests: 1%| | 577/56168 [00:05<06:05, 152.14it/s]
Running loglikelihood requests: 1%| | 641/56168 [00:05<05:53, 156.97it/s]
Running loglikelihood requests: 1%|β | 705/56168 [00:05<05:43, 161.51it/s]
Running loglikelihood requests: 1%|β | 769/56168 [00:06<05:35, 165.10it/s]
Running loglikelihood requests: 1%|β | 833/56168 [00:06<05:28, 168.64it/s]
Running loglikelihood requests: 2%|β | 897/56168 [00:06<05:21, 171.88it/s]
Running loglikelihood requests: 2%|β | 961/56168 [00:07<05:14, 175.63it/s]
Running loglikelihood requests: 2%|β | 1025/56168 [00:07<05:08, 178.72it/s]
Running loglikelihood requests: 2%|β | 1089/56168 [00:07<05:02, 182.32it/s]
Running loglikelihood requests: 2%|β | 1153/56168 [00:08<04:58, 184.28it/s]
Running loglikelihood requests: 2%|β | 1217/56168 [00:08<04:52, 187.93it/s]
Running loglikelihood requests: 2%|β | 1281/56168 [00:08<04:49, 189.47it/s]
Running loglikelihood requests: 2%|β | 1345/56168 [00:09<04:44, 192.65it/s]
Running loglikelihood requests: 3%|β | 1409/56168 [00:09<04:39, 195.93it/s]
Running loglikelihood requests: 3%|β | 1473/56168 [00:09<04:36, 197.88it/s]
Running loglikelihood requests: 3%|β | 1537/56168 [00:10<04:33, 199.88it/s]
Running loglikelihood requests: 3%|β | 1601/56168 [00:10<04:30, 201.83it/s]
Running loglikelihood requests: 3%|β | 1665/56168 [00:10<04:27, 203.61it/s]
Running loglikelihood requests: 3%|β | 1729/56168 [00:11<04:25, 205.33it/s]
Running loglikelihood requests: 3%|β | 1793/56168 [00:11<04:22, 207.19it/s]
Running loglikelihood requests: 3%|β | 1857/56168 [00:11<04:20, 208.44it/s]
Running loglikelihood requests: 3%|β | 1921/56168 [00:11<04:17, 210.57it/s]
Running loglikelihood requests: 4%|β | 1985/56168 [00:12<04:15, 212.47it/s]
Running loglikelihood requests: 4%|β | 2049/56168 [00:12<04:12, 214.06it/s]
Running loglikelihood requests: 4%|β | 2113/56168 [00:12<04:11, 214.75it/s]
Running loglikelihood requests: 4%|β | 2177/56168 [00:13<04:06, 218.86it/s]
Running loglikelihood requests: 4%|β | 2241/56168 [00:13<04:03, 221.92it/s]
Running loglikelihood requests: 4%|β | 2305/56168 [00:13<04:00, 224.10it/s]
Running loglikelihood requests: 4%|β | 2369/56168 [00:13<03:57, 226.43it/s]
Running loglikelihood requests: 4%|β | 2433/56168 [00:14<03:55, 228.38it/s]
Running loglikelihood requests: 4%|β | 2497/56168 [00:14<03:53, 229.57it/s]
Running loglikelihood requests: 5%|β | 2561/56168 [00:14<03:53, 229.68it/s]
Running loglikelihood requests: 5%|β | 2625/56168 [00:15<03:51, 231.29it/s]
Running loglikelihood requests: 5%|β | 2689/56168 [00:15<03:48, 233.58it/s]
Running loglikelihood requests: 5%|β | 2753/56168 [00:15<03:47, 234.42it/s]
Running loglikelihood requests: 5%|β | 2817/56168 [00:15<03:46, 235.91it/s]
Running loglikelihood requests: 5%|β | 2881/56168 [00:16<03:45, 236.51it/s]
Running loglikelihood requests: 5%|β | 2945/56168 [00:16<03:43, 237.73it/s]
Running loglikelihood requests: 5%|β | 3009/56168 [00:16<03:42, 238.55it/s]
Running loglikelihood requests: 5%|β | 3073/56168 [00:16<03:40, 240.39it/s]
Running loglikelihood requests: 6%|β | 3137/56168 [00:17<03:39, 241.82it/s]
Running loglikelihood requests: 6%|β | 3201/56168 [00:17<03:37, 242.99it/s]
Running loglikelihood requests: 6%|β | 3265/56168 [00:17<03:37, 243.76it/s]
Running loglikelihood requests: 6%|β | 3329/56168 [00:17<03:36, 244.51it/s]
Running loglikelihood requests: 6%|β | 3393/56168 [00:18<03:35, 245.23it/s]
Running loglikelihood requests: 6%|β | 3457/56168 [00:18<03:34, 245.73it/s]
Running loglikelihood requests: 6%|β | 3521/56168 [00:18<03:33, 246.30it/s]
Running loglikelihood requests: 6%|β | 3585/56168 [00:19<03:33, 246.52it/s]
Running loglikelihood requests: 6%|β | 3649/56168 [00:19<03:30, 249.09it/s]
Running loglikelihood requests: 7%|β | 3713/56168 [00:19<03:28, 251.33it/s]
Running loglikelihood requests: 7%|β | 3777/56168 [00:19<03:27, 253.04it/s]
Running loglikelihood requests: 7%|β | 3841/56168 [00:19<03:25, 254.08it/s]
Running loglikelihood requests: 7%|β | 3905/56168 [00:20<03:24, 255.29it/s]
Running loglikelihood requests: 7%|β | 3969/56168 [00:20<03:23, 255.90it/s]
Running loglikelihood requests: 7%|β | 4033/56168 [00:20<03:22, 256.88it/s]
Running loglikelihood requests: 7%|β | 4097/56168 [00:20<03:19, 261.12it/s]
Running loglikelihood requests: 7%|β | 4161/56168 [00:21<03:16, 265.16it/s]
Running loglikelihood requests: 8%|β | 4225/56168 [00:21<03:13, 267.79it/s]
Running loglikelihood requests: 8%|β | 4289/56168 [00:21<03:12, 269.68it/s]
Running loglikelihood requests: 8%|β | 4353/56168 [00:21<03:10, 272.16it/s]
Running loglikelihood requests: 8%|β | 4417/56168 [00:22<03:09, 273.57it/s]
Running loglikelihood requests: 8%|β | 4481/56168 [00:22<03:08, 274.88it/s]
Running loglikelihood requests: 8%|β | 4545/56168 [00:22<03:07, 275.62it/s]
Running loglikelihood requests: 8%|β | 4609/56168 [00:22<03:06, 276.16it/s]
Running loglikelihood requests: 8%|β | 4673/56168 [00:23<03:05, 277.72it/s]
Running loglikelihood requests: 8%|β | 4737/56168 [00:23<03:04, 279.36it/s]
Running loglikelihood requests: 9%|β | 4801/56168 [00:23<03:03, 280.24it/s]
Running loglikelihood requests: 9%|β | 4865/56168 [00:23<03:02, 281.25it/s]
Running loglikelihood requests: 9%|β | 4929/56168 [00:23<03:01, 282.10it/s]
Running loglikelihood requests: 9%|β | 4993/56168 [00:24<03:01, 281.72it/s]
Running loglikelihood requests: 9%|β | 5057/56168 [00:24<03:01, 282.14it/s]
Running loglikelihood requests: 9%|β | 5121/56168 [00:24<02:59, 283.75it/s]
Running loglikelihood requests: 9%|β | 5185/56168 [00:24<02:59, 284.69it/s]
Running loglikelihood requests: 9%|β | 5249/56168 [00:25<02:57, 287.34it/s]
Running loglikelihood requests: 9%|β | 5313/56168 [00:25<02:56, 288.86it/s]
Running loglikelihood requests: 10%|β | 5377/56168 [00:25<02:54, 291.25it/s]
Running loglikelihood requests: 10%|β | 5441/56168 [00:25<02:52, 293.59it/s]
Running loglikelihood requests: 10%|β | 5505/56168 [00:25<02:51, 295.06it/s]
Running loglikelihood requests: 10%|β | 5569/56168 [00:26<02:51, 295.88it/s]
Running loglikelihood requests: 10%|β | 5633/56168 [00:26<02:50, 296.84it/s]
Running loglikelihood requests: 10%|β | 5697/56168 [00:26<02:49, 297.28it/s]
Running loglikelihood requests: 10%|β | 5761/56168 [00:26<02:49, 297.85it/s]
Running loglikelihood requests: 10%|β | 5825/56168 [00:27<02:48, 298.81it/s]
Running loglikelihood requests: 10%|β | 5889/56168 [00:27<02:46, 302.12it/s]
Running loglikelihood requests: 11%|β | 5953/56168 [00:27<02:45, 304.29it/s]
Running loglikelihood requests: 11%|β | 6017/56168 [00:27<02:44, 305.50it/s]
Running loglikelihood requests: 11%|β | 6081/56168 [00:27<02:43, 307.23it/s]
Running loglikelihood requests: 11%|β | 6145/56168 [00:28<02:42, 308.55it/s]
Running loglikelihood requests: 11%|β | 6209/56168 [00:28<02:43, 306.35it/s]
Running loglikelihood requests: 11%|β | 6273/56168 [00:28<02:41, 308.87it/s]
Running loglikelihood requests: 11%|ββ | 6337/56168 [00:28<02:40, 310.66it/s]
Running loglikelihood requests: 11%|ββ | 6401/56168 [00:28<02:39, 311.50it/s]
Running loglikelihood requests: 12%|ββ | 6465/56168 [00:29<02:39, 312.29it/s]
Running loglikelihood requests: 12%|ββ | 6529/56168 [00:29<02:38, 313.66it/s]
Running loglikelihood requests: 12%|ββ | 6593/56168 [00:29<02:37, 315.02it/s]
Running loglikelihood requests: 12%|ββ | 6657/56168 [00:29<02:36, 315.88it/s]
Running loglikelihood requests: 12%|ββ | 6721/56168 [00:29<02:32, 323.83it/s]
Running loglikelihood requests: 12%|ββ | 6785/56168 [00:30<02:30, 328.42it/s]
Running loglikelihood requests: 12%|ββ | 6849/56168 [00:30<02:28, 331.72it/s]
Running loglikelihood requests: 12%|ββ | 6913/56168 [00:30<02:27, 334.33it/s]
Running loglikelihood requests: 12%|ββ | 6977/56168 [00:30<02:26, 335.32it/s]
Running loglikelihood requests: 13%|ββ | 7041/56168 [00:30<02:25, 337.54it/s]
Running loglikelihood requests: 13%|ββ | 7105/56168 [00:30<02:24, 338.73it/s]
Running loglikelihood requests: 13%|ββ | 7169/56168 [00:31<02:24, 339.93it/s]
Running loglikelihood requests: 13%|ββ | 7233/56168 [00:31<02:23, 341.61it/s]
Running loglikelihood requests: 13%|ββ | 7297/56168 [00:31<02:22, 342.35it/s]
Running loglikelihood requests: 13%|ββ | 7361/56168 [00:31<02:22, 343.01it/s]
Running loglikelihood requests: 13%|ββ | 7425/56168 [00:31<02:21, 344.28it/s]
Running loglikelihood requests: 13%|ββ | 7489/56168 [00:32<02:20, 345.46it/s]
Running loglikelihood requests: 13%|ββ | 7553/56168 [00:32<02:19, 348.02it/s]
Running loglikelihood requests: 14%|ββ | 7617/56168 [00:32<02:18, 350.25it/s]
Running loglikelihood requests: 14%|ββ | 7681/56168 [00:32<02:17, 352.31it/s]
Running loglikelihood requests: 14%|ββ | 7745/56168 [00:32<02:16, 354.28it/s]
Running loglikelihood requests: 14%|ββ | 7809/56168 [00:32<02:15, 356.08it/s]
Running loglikelihood requests: 14%|ββ | 7873/56168 [00:33<02:15, 357.69it/s]
Running loglikelihood requests: 14%|ββ | 7937/56168 [00:33<02:14, 358.91it/s]
Running loglikelihood requests: 14%|ββ | 8001/56168 [00:33<02:13, 359.72it/s]
Running loglikelihood requests: 14%|ββ | 8065/56168 [00:33<02:13, 360.44it/s]
Running loglikelihood requests: 14%|ββ | 8129/56168 [00:33<02:13, 360.89it/s]
Running loglikelihood requests: 15%|ββ | 8193/56168 [00:34<02:11, 363.94it/s]
Running loglikelihood requests: 15%|ββ | 8257/56168 [00:34<02:10, 367.19it/s]
Running loglikelihood requests: 15%|ββ | 8321/56168 [00:34<02:09, 368.52it/s]
Running loglikelihood requests: 15%|ββ | 8385/56168 [00:34<02:08, 370.95it/s]
Running loglikelihood requests: 15%|ββ | 8449/56168 [00:34<02:08, 372.00it/s]
Running loglikelihood requests: 15%|ββ | 8513/56168 [00:34<02:07, 373.46it/s]
Running loglikelihood requests: 15%|ββ | 8577/56168 [00:35<02:07, 374.18it/s]
Running loglikelihood requests: 15%|ββ | 8641/56168 [00:35<02:06, 375.41it/s]
Running loglikelihood requests: 15%|ββ | 8705/56168 [00:35<02:06, 375.84it/s]
Running loglikelihood requests: 16%|ββ | 8769/56168 [00:35<02:05, 378.20it/s]
Running loglikelihood requests: 16%|ββ | 8833/56168 [00:35<02:04, 378.96it/s]
Running loglikelihood requests: 16%|ββ | 8897/56168 [00:35<02:04, 380.68it/s]
Running loglikelihood requests: 16%|ββ | 8961/56168 [00:36<02:03, 381.59it/s]
Running loglikelihood requests: 16%|ββ | 9025/56168 [00:36<02:02, 384.85it/s]
Running loglikelihood requests: 16%|ββ | 9089/56168 [00:36<02:01, 388.13it/s]
Running loglikelihood requests: 16%|ββ | 9153/56168 [00:36<02:00, 391.06it/s]
Running loglikelihood requests: 16%|ββ | 9217/56168 [00:36<01:59, 394.06it/s]
Running loglikelihood requests: 17%|ββ | 9281/56168 [00:36<01:58, 396.43it/s]
Running loglikelihood requests: 17%|ββ | 9345/56168 [00:37<01:57, 398.24it/s]
Running loglikelihood requests: 17%|ββ | 9409/56168 [00:37<01:56, 400.10it/s]
Running loglikelihood requests: 17%|ββ | 9473/56168 [00:37<01:56, 401.15it/s]
Running loglikelihood requests: 17%|ββ | 9537/56168 [00:37<01:56, 401.96it/s]
Running loglikelihood requests: 17%|ββ | 9601/56168 [00:37<01:55, 402.61it/s]
Running loglikelihood requests: 17%|ββ | 9665/56168 [00:37<01:55, 404.16it/s]
Running loglikelihood requests: 17%|ββ | 9729/56168 [00:37<01:54, 406.12it/s]
Running loglikelihood requests: 17%|ββ | 9793/56168 [00:38<01:53, 408.03it/s]
Running loglikelihood requests: 18%|ββ | 9857/56168 [00:38<01:50, 417.88it/s]
Running loglikelihood requests: 18%|ββ | 9921/56168 [00:38<01:48, 424.88it/s]
Running loglikelihood requests: 18%|ββ | 9985/56168 [00:38<01:47, 430.87it/s]
Running loglikelihood requests: 18%|ββ | 10049/56168 [00:38<01:46, 433.66it/s]
Running loglikelihood requests: 18%|ββ | 10113/56168 [00:38<01:45, 436.69it/s]
Running loglikelihood requests: 18%|ββ | 10177/56168 [00:39<01:44, 439.13it/s]
Running loglikelihood requests: 18%|ββ | 10241/56168 [00:39<01:43, 441.95it/s]
Running loglikelihood requests: 18%|ββ | 10305/56168 [00:39<01:43, 442.74it/s]
Running loglikelihood requests: 18%|ββ | 10369/56168 [00:39<01:43, 442.15it/s]
Running loglikelihood requests: 19%|ββ | 10433/56168 [00:39<01:43, 442.86it/s]
Running loglikelihood requests: 19%|ββ | 10497/56168 [00:39<01:42, 443.64it/s]
Running loglikelihood requests: 19%|ββ | 10561/56168 [00:39<01:42, 445.49it/s]
Running loglikelihood requests: 19%|ββ | 10625/56168 [00:40<01:41, 446.51it/s]
Running loglikelihood requests: 19%|ββ | 10689/56168 [00:40<01:40, 450.46it/s]
Running loglikelihood requests: 19%|ββ | 10753/56168 [00:40<01:40, 452.07it/s]
Running loglikelihood requests: 19%|ββ | 10817/56168 [00:40<01:39, 457.77it/s]
Running loglikelihood requests: 19%|ββ | 10881/56168 [00:40<01:37, 462.47it/s]
Running loglikelihood requests: 19%|ββ | 10945/56168 [00:40<01:37, 464.88it/s]
Running loglikelihood requests: 20%|ββ | 11009/56168 [00:40<01:36, 465.88it/s]
Running loglikelihood requests: 20%|ββ | 11073/56168 [00:40<01:36, 467.75it/s]
Running loglikelihood requests: 20%|ββ | 11137/56168 [00:41<01:36, 467.46it/s]
Running loglikelihood requests: 20%|ββ | 11201/56168 [00:41<01:35, 469.75it/s]
Running loglikelihood requests: 20%|ββ | 11265/56168 [00:41<01:35, 470.90it/s]
Running loglikelihood requests: 20%|ββ | 11329/56168 [00:41<01:34, 472.05it/s]
Running loglikelihood requests: 20%|ββ | 11393/56168 [00:41<01:34, 473.01it/s]
Running loglikelihood requests: 20%|ββ | 11457/56168 [00:41<01:34, 473.98it/s]
Running loglikelihood requests: 21%|ββ | 11521/56168 [00:41<01:34, 474.18it/s]
Running loglikelihood requests: 21%|ββ | 11585/56168 [00:42<01:34, 473.78it/s]
Running loglikelihood requests: 21%|ββ | 11649/56168 [00:42<01:33, 474.26it/s]
Running loglikelihood requests: 21%|ββ | 11713/56168 [00:42<01:33, 476.23it/s]
Running loglikelihood requests: 21%|ββ | 11777/56168 [00:42<01:33, 476.85it/s]
Running loglikelihood requests: 21%|ββ | 11841/56168 [00:42<01:32, 477.72it/s]
Running loglikelihood requests: 21%|ββ | 11909/56168 [00:42<01:30, 487.06it/s]
Running loglikelihood requests: 21%|βββ | 11973/56168 [00:42<01:31, 485.07it/s]
Running loglikelihood requests: 21%|βββ | 12037/56168 [00:42<01:30, 490.30it/s]
Running loglikelihood requests: 22%|βββ | 12101/56168 [00:43<01:29, 493.88it/s]
Running loglikelihood requests: 22%|βββ | 12165/56168 [00:43<01:28, 497.46it/s]
Running loglikelihood requests: 22%|βββ | 12229/56168 [00:43<01:27, 499.91it/s]
Running loglikelihood requests: 22%|βββ | 12293/56168 [00:43<01:27, 501.32it/s]
Running loglikelihood requests: 22%|βββ | 12357/56168 [00:43<01:26, 504.28it/s]
Running loglikelihood requests: 22%|βββ | 12421/56168 [00:43<01:26, 506.45it/s]
Running loglikelihood requests: 22%|βββ | 12485/56168 [00:43<01:26, 506.91it/s]
Running loglikelihood requests: 22%|βββ | 12549/56168 [00:43<01:25, 507.81it/s]
Running loglikelihood requests: 22%|βββ | 12613/56168 [00:44<01:25, 509.37it/s]
Running loglikelihood requests: 23%|βββ | 12677/56168 [00:44<01:25, 509.98it/s]
Running loglikelihood requests: 23%|βββ | 12741/56168 [00:44<01:25, 510.84it/s]
Running loglikelihood requests: 23%|βββ | 12805/56168 [00:44<01:24, 510.62it/s]
Running loglikelihood requests: 23%|βββ | 12869/56168 [00:44<01:24, 512.37it/s]
Running loglikelihood requests: 23%|βββ | 12933/56168 [00:44<01:24, 513.35it/s]
Running loglikelihood requests: 23%|βββ | 12997/56168 [00:44<01:24, 513.13it/s]
Running loglikelihood requests: 23%|βββ | 13061/56168 [00:44<01:24, 512.71it/s]
Running loglikelihood requests: 23%|βββ | 13125/56168 [00:45<01:23, 513.88it/s]
Running loglikelihood requests: 23%|βββ | 13189/56168 [00:45<01:23, 514.65it/s]
Running loglikelihood requests: 24%|βββ | 13253/56168 [00:45<01:23, 514.41it/s]
Running loglikelihood requests: 24%|βββ | 13317/56168 [00:45<01:23, 514.12it/s]
Running loglikelihood requests: 24%|βββ | 13381/56168 [00:45<01:22, 516.30it/s]
Running loglikelihood requests: 24%|βββ | 13445/56168 [00:45<01:22, 517.73it/s]
Running loglikelihood requests: 24%|βββ | 13509/56168 [00:45<01:22, 518.59it/s]
Running loglikelihood requests: 24%|βββ | 13573/56168 [00:45<01:21, 519.82it/s]
Running loglikelihood requests: 24%|βββ | 13637/56168 [00:46<01:21, 520.31it/s]
Running loglikelihood requests: 24%|βββ | 13701/56168 [00:46<01:21, 521.59it/s]
Running loglikelihood requests: 25%|βββ | 13765/56168 [00:46<01:21, 522.62it/s]
Running loglikelihood requests: 25%|βββ | 13829/56168 [00:46<01:21, 522.62it/s]
Running loglikelihood requests: 25%|βββ | 13893/56168 [00:46<01:20, 523.36it/s]
Running loglikelihood requests: 25%|βββ | 13957/56168 [00:46<01:20, 524.81it/s]
Running loglikelihood requests: 25%|βββ | 14021/56168 [00:46<01:18, 535.11it/s]
Running loglikelihood requests: 25%|βββ | 14085/56168 [00:46<01:17, 543.72it/s]
Running loglikelihood requests: 25%|βββ | 14149/56168 [00:47<01:16, 548.69it/s]
Running loglikelihood requests: 25%|βββ | 14213/56168 [00:47<01:15, 553.24it/s]
Running loglikelihood requests: 25%|βββ | 14277/56168 [00:47<01:15, 554.69it/s]
Running loglikelihood requests: 26%|βββ | 14341/56168 [00:47<01:15, 555.34it/s]
Running loglikelihood requests: 26%|βββ | 14405/56168 [00:47<01:14, 558.36it/s]
Running loglikelihood requests: 26%|βββ | 14469/56168 [00:47<01:14, 561.01it/s]
Running loglikelihood requests: 26%|βββ | 14533/56168 [00:47<01:14, 560.88it/s]
Running loglikelihood requests: 26%|βββ | 14597/56168 [00:47<01:13, 562.33it/s]
Running loglikelihood requests: 26%|βββ | 14661/56168 [00:47<01:13, 563.45it/s]
Running loglikelihood requests: 26%|βββ | 14725/56168 [00:48<01:13, 563.84it/s]
Running loglikelihood requests: 26%|βββ | 14789/56168 [00:48<01:13, 564.70it/s]
Running loglikelihood requests: 26%|βββ | 14853/56168 [00:48<01:13, 565.71it/s]
Running loglikelihood requests: 27%|βββ | 14917/56168 [00:48<01:12, 566.31it/s]
Running loglikelihood requests: 27%|βββ | 14981/56168 [00:48<01:12, 566.67it/s]
Running loglikelihood requests: 27%|βββ | 15045/56168 [00:48<01:12, 566.69it/s]
Running loglikelihood requests: 27%|βββ | 15109/56168 [00:48<01:12, 564.55it/s]
Running loglikelihood requests: 27%|βββ | 15173/56168 [00:48<01:12, 567.34it/s]
Running loglikelihood requests: 27%|βββ | 15237/56168 [00:48<01:12, 566.77it/s]
Running loglikelihood requests: 27%|βββ | 15301/56168 [00:49<01:12, 566.24it/s]
Running loglikelihood requests: 27%|βββ | 15365/56168 [00:49<01:11, 568.40it/s]
Running loglikelihood requests: 27%|βββ | 15429/56168 [00:49<01:11, 567.08it/s]
Running loglikelihood requests: 28%|βββ | 15493/56168 [00:49<01:11, 567.89it/s]
Running loglikelihood requests: 28%|βββ | 15557/56168 [00:49<01:11, 569.00it/s]
Running loglikelihood requests: 28%|βββ | 15621/56168 [00:49<01:11, 567.93it/s]
Running loglikelihood requests: 28%|βββ | 15685/56168 [00:49<01:11, 568.91it/s]
Running loglikelihood requests: 28%|βββ | 15749/56168 [00:49<01:10, 569.49it/s]
Running loglikelihood requests: 28%|βββ | 15813/56168 [00:49<01:10, 569.58it/s]
Running loglikelihood requests: 28%|βββ | 15877/56168 [00:50<01:10, 570.61it/s]
Running loglikelihood requests: 28%|βββ | 15941/56168 [00:50<01:10, 569.72it/s]
Running loglikelihood requests: 28%|βββ | 16005/56168 [00:50<01:10, 570.14it/s]
Running loglikelihood requests: 29%|βββ | 16069/56168 [00:50<01:10, 569.00it/s]
Running loglikelihood requests: 29%|βββ | 16133/56168 [00:50<01:10, 567.09it/s]
Running loglikelihood requests: 29%|βββ | 16197/56168 [00:50<01:10, 567.70it/s]
Running loglikelihood requests: 29%|βββ | 16261/56168 [00:50<01:10, 567.06it/s]
Running loglikelihood requests: 29%|βββ | 16325/56168 [00:50<01:10, 566.18it/s]
Running loglikelihood requests: 29%|βββ | 16389/56168 [00:51<01:10, 568.22it/s]
Running loglikelihood requests: 29%|βββ | 16453/56168 [00:51<01:09, 568.73it/s]
Running loglikelihood requests: 29%|βββ | 16517/56168 [00:51<01:09, 568.30it/s]
Running loglikelihood requests: 30%|βββ | 16581/56168 [00:51<01:09, 568.65it/s]
Running loglikelihood requests: 30%|βββ | 16645/56168 [00:51<01:09, 568.86it/s]
Running loglikelihood requests: 30%|βββ | 16709/56168 [00:51<01:09, 569.88it/s]
Running loglikelihood requests: 30%|βββ | 16773/56168 [00:51<01:09, 569.79it/s]
Running loglikelihood requests: 30%|βββ | 16837/56168 [00:51<01:09, 568.06it/s]
Running loglikelihood requests: 30%|βββ | 16901/56168 [00:51<01:09, 569.01it/s]
Running loglikelihood requests: 30%|βββ | 16965/56168 [00:52<01:08, 568.73it/s]
Running loglikelihood requests: 30%|βββ | 17029/56168 [00:52<01:08, 569.07it/s]
Running loglikelihood requests: 30%|βββ | 17093/56168 [00:52<01:08, 569.72it/s]
Running loglikelihood requests: 31%|βββ | 17157/56168 [00:52<01:08, 569.27it/s]
Running loglikelihood requests: 31%|βββ | 17221/56168 [00:52<01:08, 570.19it/s]
Running loglikelihood requests: 31%|βββ | 17285/56168 [00:52<01:08, 570.32it/s]
Running loglikelihood requests: 31%|βββ | 17349/56168 [00:52<01:08, 569.54it/s]
Running loglikelihood requests: 31%|βββ | 17413/56168 [00:52<01:07, 570.74it/s]
Running loglikelihood requests: 31%|βββ | 17477/56168 [00:52<01:07, 570.90it/s]
Running loglikelihood requests: 31%|βββ | 17541/56168 [00:53<01:07, 571.46it/s]
Running loglikelihood requests: 31%|ββββ | 17605/56168 [00:53<01:07, 571.23it/s]
Running loglikelihood requests: 31%|ββββ | 17669/56168 [00:53<01:07, 570.47it/s]
Running loglikelihood requests: 32%|ββββ | 17733/56168 [00:53<01:07, 571.50it/s]
Running loglikelihood requests: 32%|ββββ | 17797/56168 [00:53<01:07, 571.23it/s]
Running loglikelihood requests: 32%|ββββ | 17861/56168 [00:53<01:07, 571.24it/s]
Running loglikelihood requests: 32%|ββββ | 17925/56168 [00:53<01:06, 571.91it/s]
Running loglikelihood requests: 32%|ββββ | 17989/56168 [00:53<01:06, 572.21it/s]
Running loglikelihood requests: 32%|ββββ | 18053/56168 [00:53<01:06, 573.09it/s]
Running loglikelihood requests: 32%|ββββ | 18117/56168 [00:54<01:06, 572.33it/s]
Running loglikelihood requests: 32%|ββββ | 18181/56168 [00:54<01:06, 572.68it/s]
Running loglikelihood requests: 32%|ββββ | 18245/56168 [00:54<01:06, 572.05it/s]
Running loglikelihood requests: 33%|ββββ | 18309/56168 [00:54<01:06, 571.79it/s]
Running loglikelihood requests: 33%|ββββ | 18373/56168 [00:54<01:06, 571.43it/s]
Running loglikelihood requests: 33%|ββββ | 18437/56168 [00:54<01:06, 571.12it/s]
Running loglikelihood requests: 33%|ββββ | 18501/56168 [00:54<01:05, 572.29it/s]
Running loglikelihood requests: 33%|ββββ | 18565/56168 [00:54<01:05, 572.12it/s]
Running loglikelihood requests: 33%|ββββ | 18629/56168 [00:54<01:05, 573.06it/s]
Running loglikelihood requests: 33%|ββββ | 18697/56168 [00:55<01:04, 585.09it/s]
Running loglikelihood requests: 33%|ββββ | 18761/56168 [00:55<01:04, 581.17it/s]
Running loglikelihood requests: 34%|ββββ | 18825/56168 [00:55<01:04, 579.75it/s]
Running loglikelihood requests: 34%|ββββ | 18889/56168 [00:55<01:04, 577.21it/s]
Running loglikelihood requests: 34%|ββββ | 18967/56168 [00:55<00:58, 633.48it/s]
Running loglikelihood requests: 34%|ββββ | 19043/56168 [00:55<00:55, 669.33it/s]
Running loglikelihood requests: 34%|ββββ | 19128/56168 [00:55<00:51, 721.43it/s]
Running loglikelihood requests: 34%|ββββ | 19209/56168 [00:55<01:03, 586.31it/s]
Running loglikelihood requests: 34%|ββββ | 19293/56168 [00:55<00:56, 648.82it/s]
Running loglikelihood requests: 35%|ββββ | 19382/56168 [00:56<00:51, 711.51it/s]
Running loglikelihood requests: 35%|ββββ | 19465/56168 [00:56<01:01, 593.81it/s]
Running loglikelihood requests: 35%|ββββ | 19551/56168 [00:56<00:55, 656.67it/s]
Running loglikelihood requests: 35%|ββββ | 19623/56168 [00:56<00:54, 672.52it/s]
Running loglikelihood requests: 35%|ββββ | 19698/56168 [00:56<00:52, 693.04it/s]
Running loglikelihood requests: 35%|ββββ | 19775/56168 [00:56<00:50, 714.13it/s]
Running loglikelihood requests: 35%|ββββ | 19853/56168 [00:56<01:02, 580.62it/s]
Running loglikelihood requests: 35%|ββββ | 19938/56168 [00:56<00:56, 645.71it/s]
Running loglikelihood requests: 36%|ββββ | 20040/56168 [00:57<00:48, 741.75it/s]
Running loglikelihood requests: 36%|ββββ | 20120/56168 [00:57<00:59, 608.55it/s]
Running loglikelihood requests: 36%|ββββ | 20215/56168 [00:57<00:52, 688.94it/s]
Running loglikelihood requests: 36%|ββββ | 20305/56168 [00:57<00:59, 601.15it/s]
Running loglikelihood requests: 36%|ββββ | 20386/56168 [00:57<00:55, 648.26it/s]
Running loglikelihood requests: 36%|ββββ | 20477/56168 [00:57<00:50, 712.00it/s]
Running loglikelihood requests: 37%|ββββ | 20561/56168 [00:57<00:59, 600.84it/s]
Running loglikelihood requests: 37%|ββββ | 20652/56168 [00:58<00:52, 671.71it/s]
Running loglikelihood requests: 37%|ββββ | 20750/56168 [00:58<00:47, 747.70it/s]
Running loglikelihood requests: 37%|ββββ | 20832/56168 [00:58<00:57, 618.11it/s]
Running loglikelihood requests: 37%|ββββ | 20931/56168 [00:58<00:50, 703.97it/s]
Running loglikelihood requests: 37%|ββββ | 21013/56168 [00:58<00:59, 595.77it/s]
Running loglikelihood requests: 38%|ββββ | 21099/56168 [00:58<00:53, 654.79it/s]
Running loglikelihood requests: 38%|ββββ | 21195/56168 [00:58<00:47, 728.67it/s]
Running loglikelihood requests: 38%|ββββ | 21276/56168 [00:59<00:57, 607.36it/s]
Running loglikelihood requests: 38%|ββββ | 21376/56168 [00:59<00:49, 697.36it/s]
Running loglikelihood requests: 38%|ββββ | 21469/56168 [00:59<00:56, 616.46it/s]
Running loglikelihood requests: 38%|ββββ | 21566/56168 [00:59<00:49, 695.40it/s]
Running loglikelihood requests: 39%|ββββ | 21661/56168 [00:59<00:55, 621.42it/s]
Running loglikelihood requests: 39%|ββββ | 21772/56168 [00:59<00:47, 729.41it/s]
Running loglikelihood requests: 39%|ββββ | 21857/56168 [00:59<00:55, 622.22it/s]
Running loglikelihood requests: 39%|ββββ | 21961/56168 [00:59<00:47, 713.81it/s]
Running loglikelihood requests: 39%|ββββ | 22049/56168 [01:00<00:54, 620.38it/s]
Running loglikelihood requests: 39%|ββββ | 22165/56168 [01:00<00:46, 738.62it/s]
Running loglikelihood requests: 40%|ββββ | 22250/56168 [01:00<00:53, 630.78it/s]
Running loglikelihood requests: 40%|ββββ | 22361/56168 [01:00<00:45, 735.76it/s]
Running loglikelihood requests: 40%|ββββ | 22445/56168 [01:00<00:53, 629.38it/s]
Running loglikelihood requests: 40%|ββββ | 22561/56168 [01:00<00:54, 622.07it/s]
Running loglikelihood requests: 40%|ββββ | 22681/56168 [01:01<00:45, 741.97it/s]
Running loglikelihood requests: 41%|ββββ | 22766/56168 [01:01<00:52, 638.74it/s]
Running loglikelihood requests: 41%|ββββ | 22881/56168 [01:01<00:53, 626.11it/s]
Running loglikelihood requests: 41%|ββββ | 23009/56168 [01:01<00:51, 641.57it/s]
Running loglikelihood requests: 41%|ββββ | 23137/56168 [01:01<00:50, 651.31it/s]
Running loglikelihood requests: 41%|βββββ | 23262/56168 [01:01<00:42, 766.84it/s]
Running loglikelihood requests: 42%|βββββ | 23348/56168 [01:02<00:49, 663.00it/s]
Running loglikelihood requests: 42%|βββββ | 23457/56168 [01:02<00:51, 634.07it/s]
Running loglikelihood requests: 42%|βββββ | 23585/56168 [01:02<00:50, 650.50it/s]
Running loglikelihood requests: 42%|βββββ | 23713/56168 [01:02<00:48, 664.96it/s]
Running loglikelihood requests: 42%|βββββ | 23841/56168 [01:02<00:47, 676.36it/s]
Running loglikelihood requests: 43%|βββββ | 23969/56168 [01:03<00:47, 684.46it/s]
Running loglikelihood requests: 43%|βββββ | 24097/56168 [01:03<00:46, 689.40it/s]
Running loglikelihood requests: 43%|βββββ | 24225/56168 [01:03<00:46, 693.95it/s]
Running loglikelihood requests: 43%|βββββ | 24353/56168 [01:03<00:45, 696.78it/s]
Running loglikelihood requests: 44%|βββββ | 24481/56168 [01:03<00:45, 699.35it/s]
Running loglikelihood requests: 44%|βββββ | 24609/56168 [01:03<00:44, 702.46it/s]
Running loglikelihood requests: 44%|βββββ | 24737/56168 [01:04<00:44, 704.30it/s]
Running loglikelihood requests: 44%|βββββ | 24865/56168 [01:04<00:44, 706.57it/s]
Running loglikelihood requests: 44%|βββββ | 24993/56168 [01:04<00:44, 707.46it/s]
Running loglikelihood requests: 45%|βββββ | 25125/56168 [01:04<00:43, 715.49it/s]
Running loglikelihood requests: 45%|βββββ | 25253/56168 [01:04<00:43, 715.16it/s]
Running loglikelihood requests: 45%|βββββ | 25381/56168 [01:04<00:43, 714.75it/s]
Running loglikelihood requests: 45%|βββββ | 25509/56168 [01:05<00:42, 715.00it/s]
Running loglikelihood requests: 46%|βββββ | 25637/56168 [01:05<00:42, 715.50it/s]
Running loglikelihood requests: 46%|βββββ | 25765/56168 [01:05<00:42, 717.99it/s]
Running loglikelihood requests: 46%|βββββ | 25893/56168 [01:05<00:42, 720.01it/s]
Running loglikelihood requests: 46%|βββββ | 26021/56168 [01:05<00:41, 720.99it/s]
Running loglikelihood requests: 47%|βββββ | 26149/56168 [01:06<00:41, 720.77it/s]
Running loglikelihood requests: 47%|βββββ | 26277/56168 [01:06<00:41, 719.89it/s]
Running loglikelihood requests: 47%|βββββ | 26405/56168 [01:06<00:41, 719.18it/s]
Running loglikelihood requests: 47%|βββββ | 26533/56168 [01:06<00:41, 719.47it/s]
Running loglikelihood requests: 47%|βββββ | 26661/56168 [01:06<00:40, 719.74it/s]
Running loglikelihood requests: 48%|βββββ | 26789/56168 [01:06<00:40, 719.75it/s]
Running loglikelihood requests: 48%|βββββ | 26917/56168 [01:07<00:40, 720.48it/s]
Running loglikelihood requests: 48%|βββββ | 27045/56168 [01:07<00:40, 721.68it/s]
Running loglikelihood requests: 48%|βββββ | 27173/56168 [01:07<00:40, 722.43it/s]
Running loglikelihood requests: 49%|βββββ | 27301/56168 [01:07<00:39, 722.81it/s]
Running loglikelihood requests: 49%|βββββ | 27429/56168 [01:07<00:39, 721.90it/s]
Running loglikelihood requests: 49%|βββββ | 27557/56168 [01:08<00:39, 723.27it/s]
Running loglikelihood requests: 49%|βββββ | 27685/56168 [01:08<00:39, 724.80it/s]
Running loglikelihood requests: 50%|βββββ | 27813/56168 [01:08<00:39, 726.03it/s]
Running loglikelihood requests: 50%|βββββ | 27941/56168 [01:08<00:38, 726.67it/s]
Running loglikelihood requests: 50%|βββββ | 28069/56168 [01:08<00:38, 726.60it/s]
Running loglikelihood requests: 50%|βββββ | 28197/56168 [01:08<00:38, 728.15it/s]
Running loglikelihood requests: 50%|βββββ | 28329/56168 [01:09<00:37, 736.07it/s]
Running loglikelihood requests: 51%|βββββ | 28457/56168 [01:09<00:37, 734.52it/s]
Running loglikelihood requests: 51%|βββββ | 28585/56168 [01:09<00:37, 732.34it/s]
Running loglikelihood requests: 51%|βββββ | 28713/56168 [01:09<00:37, 731.45it/s]
Running loglikelihood requests: 51%|ββββββ | 28841/56168 [01:09<00:37, 731.69it/s]
Running loglikelihood requests: 52%|ββββββ | 28969/56168 [01:09<00:37, 732.33it/s]
Running loglikelihood requests: 52%|ββββββ | 29097/56168 [01:10<00:36, 731.81it/s]
Running loglikelihood requests: 52%|ββββββ | 29225/56168 [01:10<00:36, 738.39it/s]
Running loglikelihood requests: 52%|ββββββ | 29353/56168 [01:10<00:35, 750.00it/s]
Running loglikelihood requests: 52%|ββββββ | 29485/56168 [01:10<00:34, 767.27it/s]
Running loglikelihood requests: 53%|ββββββ | 29613/56168 [01:10<00:34, 772.81it/s]
Running loglikelihood requests: 53%|ββββββ | 29741/56168 [01:10<00:34, 775.62it/s]
Running loglikelihood requests: 53%|ββββββ | 29869/56168 [01:11<00:33, 776.97it/s]
Running loglikelihood requests: 53%|ββββββ | 29997/56168 [01:11<00:33, 779.36it/s]
Running loglikelihood requests: 54%|ββββββ | 30125/56168 [01:11<00:33, 781.83it/s]
Running loglikelihood requests: 54%|ββββββ | 30253/56168 [01:11<00:33, 784.35it/s]
Running loglikelihood requests: 54%|ββββββ | 30381/56168 [01:11<00:32, 785.78it/s]
Running loglikelihood requests: 54%|ββββββ | 30509/56168 [01:11<00:32, 787.55it/s]
Running loglikelihood requests: 55%|ββββββ | 30637/56168 [01:12<00:32, 789.07it/s]
Running loglikelihood requests: 55%|ββββββ | 30765/56168 [01:12<00:32, 788.35it/s]
Running loglikelihood requests: 55%|ββββββ | 30897/56168 [01:12<00:31, 795.14it/s]
Running loglikelihood requests: 55%|ββββββ | 31025/56168 [01:12<00:31, 794.43it/s]
Running loglikelihood requests: 55%|ββββββ | 31153/56168 [01:12<00:31, 793.93it/s]
Running loglikelihood requests: 56%|ββββββ | 31281/56168 [01:12<00:31, 793.32it/s]
Running loglikelihood requests: 56%|ββββββ | 31409/56168 [01:13<00:31, 792.94it/s]
Running loglikelihood requests: 56%|ββββββ | 31537/56168 [01:13<00:31, 793.87it/s]
Running loglikelihood requests: 56%|ββββββ | 31665/56168 [01:13<00:30, 794.94it/s]
Running loglikelihood requests: 57%|ββββββ | 31793/56168 [01:13<00:30, 794.66it/s]
Running loglikelihood requests: 57%|ββββββ | 31925/56168 [01:13<00:30, 802.98it/s]
Running loglikelihood requests: 57%|ββββββ | 32053/56168 [01:13<00:30, 801.48it/s]
Running loglikelihood requests: 57%|ββββββ | 32181/56168 [01:14<00:29, 801.07it/s]
Running loglikelihood requests: 58%|ββββββ | 32309/56168 [01:14<00:29, 801.14it/s]
Running loglikelihood requests: 58%|ββββββ | 32437/56168 [01:14<00:29, 800.83it/s]
Running loglikelihood requests: 58%|ββββββ | 32569/56168 [01:14<00:29, 809.13it/s]
Running loglikelihood requests: 58%|ββββββ | 32697/56168 [01:14<00:29, 806.39it/s]
Running loglikelihood requests: 58%|ββββββ | 32829/56168 [01:14<00:28, 812.76it/s]
Running loglikelihood requests: 59%|ββββββ | 32957/56168 [01:14<00:28, 809.46it/s]
Running loglikelihood requests: 59%|ββββββ | 33085/56168 [01:15<00:28, 807.25it/s]
Running loglikelihood requests: 59%|ββββββ | 33213/56168 [01:15<00:28, 805.72it/s]
Running loglikelihood requests: 59%|ββββββ | 33341/56168 [01:15<00:28, 804.21it/s]
Running loglikelihood requests: 60%|ββββββ | 33469/56168 [01:15<00:28, 804.65it/s]
Running loglikelihood requests: 60%|ββββββ | 33597/56168 [01:15<00:28, 804.60it/s]
Running loglikelihood requests: 60%|ββββββ | 33725/56168 [01:15<00:27, 804.21it/s]
Running loglikelihood requests: 60%|ββββββ | 33853/56168 [01:16<00:27, 802.84it/s]
Running loglikelihood requests: 60%|ββββββ | 33981/56168 [01:16<00:27, 802.74it/s]
Running loglikelihood requests: 61%|ββββββ | 34109/56168 [01:16<00:27, 804.18it/s]
Running loglikelihood requests: 61%|ββββββ | 34237/56168 [01:16<00:27, 804.33it/s]
Running loglikelihood requests: 61%|ββββββ | 34365/56168 [01:16<00:27, 804.48it/s]
Running loglikelihood requests: 61%|βββββββ | 34493/56168 [01:16<00:26, 804.20it/s]
Running loglikelihood requests: 62%|βββββββ | 34621/56168 [01:17<00:26, 804.48it/s]
Running loglikelihood requests: 62%|βββββββ | 34749/56168 [01:17<00:26, 804.94it/s]
Running loglikelihood requests: 62%|βββββββ | 34877/56168 [01:17<00:26, 804.76it/s]
Running loglikelihood requests: 62%|βββββββ | 35005/56168 [01:17<00:26, 804.04it/s]
Running loglikelihood requests: 63%|βββββββ | 35133/56168 [01:17<00:26, 805.45it/s]
Running loglikelihood requests: 63%|βββββββ | 35261/56168 [01:17<00:25, 807.67it/s]
Running loglikelihood requests: 63%|βββββββ | 35389/56168 [01:17<00:25, 810.16it/s]
Running loglikelihood requests: 63%|βββββββ | 35517/56168 [01:18<00:25, 811.00it/s]
Running loglikelihood requests: 63%|βββββββ | 35645/56168 [01:18<00:25, 812.93it/s]
Running loglikelihood requests: 64%|βββββββ | 35773/56168 [01:18<00:25, 813.49it/s]
Running loglikelihood requests: 64%|βββββββ | 35901/56168 [01:18<00:24, 813.20it/s]
Running loglikelihood requests: 64%|βββββββ | 36029/56168 [01:18<00:24, 812.13it/s]
Running loglikelihood requests: 64%|βββββββ | 36157/56168 [01:18<00:24, 812.12it/s]
Running loglikelihood requests: 65%|βββββββ | 36285/56168 [01:19<00:24, 815.69it/s]
Running loglikelihood requests: 65%|βββββββ | 36413/56168 [01:19<00:24, 817.60it/s]
Running loglikelihood requests: 65%|βββββββ | 36541/56168 [01:19<00:23, 819.31it/s]
Running loglikelihood requests: 65%|βββββββ | 36669/56168 [01:19<00:23, 818.87it/s]
Running loglikelihood requests: 66%|βββββββ | 36797/56168 [01:19<00:23, 819.40it/s]
Running loglikelihood requests: 66%|βββββββ | 36925/56168 [01:19<00:23, 821.46it/s]
Running loglikelihood requests: 66%|βββββββ | 37057/56168 [01:20<00:23, 829.10it/s]
Running loglikelihood requests: 66%|βββββββ | 37185/56168 [01:20<00:22, 826.33it/s]
Running loglikelihood requests: 66%|βββββββ | 37313/56168 [01:20<00:22, 823.79it/s]
Running loglikelihood requests: 67%|βββββββ | 37441/56168 [01:20<00:22, 834.81it/s]
Running loglikelihood requests: 67%|βββββββ | 37569/56168 [01:20<00:21, 852.74it/s]
Running loglikelihood requests: 67%|βββββββ | 37697/56168 [01:20<00:21, 866.22it/s]
Running loglikelihood requests: 67%|βββββββ | 37825/56168 [01:20<00:20, 875.98it/s]
Running loglikelihood requests: 68%|βββββββ | 37953/56168 [01:21<00:20, 884.03it/s]
Running loglikelihood requests: 68%|βββββββ | 38081/56168 [01:21<00:20, 889.73it/s]
Running loglikelihood requests: 68%|βββββββ | 38209/56168 [01:21<00:20, 893.42it/s]
Running loglikelihood requests: 68%|βββββββ | 38337/56168 [01:21<00:19, 895.84it/s]
Running loglikelihood requests: 68%|βββββββ | 38465/56168 [01:21<00:19, 897.99it/s]
Running loglikelihood requests: 69%|βββββββ | 38593/56168 [01:21<00:19, 898.26it/s]
Running loglikelihood requests: 69%|βββββββ | 38721/56168 [01:21<00:19, 901.51it/s]
Running loglikelihood requests: 69%|βββββββ | 38853/56168 [01:22<00:19, 911.19it/s]
Running loglikelihood requests: 69%|βββββββ | 38981/56168 [01:22<00:18, 911.38it/s]
Running loglikelihood requests: 70%|βββββββ | 39109/56168 [01:22<00:18, 909.63it/s]
Running loglikelihood requests: 70%|βββββββ | 39237/56168 [01:22<00:18, 910.66it/s]
Running loglikelihood requests: 70%|βββββββ | 39365/56168 [01:22<00:18, 910.98it/s]
Running loglikelihood requests: 70%|βββββββ | 39493/56168 [01:22<00:18, 911.43it/s]
Running loglikelihood requests: 71%|βββββββ | 39621/56168 [01:22<00:18, 912.88it/s]
Running loglikelihood requests: 71%|βββββββ | 39749/56168 [01:23<00:17, 913.20it/s]
Running loglikelihood requests: 71%|βββββββ | 39877/56168 [01:23<00:17, 914.40it/s]
Running loglikelihood requests: 71%|βββββββ | 40005/56168 [01:23<00:17, 915.96it/s]
Running loglikelihood requests: 71%|ββββββββ | 40133/56168 [01:23<00:17, 916.98it/s]
Running loglikelihood requests: 72%|ββββββββ | 40261/56168 [01:23<00:17, 918.50it/s]
Running loglikelihood requests: 72%|ββββββββ | 40389/56168 [01:23<00:17, 919.36it/s]
Running loglikelihood requests: 72%|ββββββββ | 40517/56168 [01:23<00:16, 921.83it/s]
Running loglikelihood requests: 72%|ββββββββ | 40645/56168 [01:24<00:16, 923.59it/s]
Running loglikelihood requests: 73%|ββββββββ | 40773/56168 [01:24<00:16, 923.34it/s]
Running loglikelihood requests: 73%|ββββββββ | 40901/56168 [01:24<00:16, 923.74it/s]
Running loglikelihood requests: 73%|ββββββββ | 41029/56168 [01:24<00:16, 924.79it/s]
Running loglikelihood requests: 73%|ββββββββ | 41157/56168 [01:24<00:16, 926.98it/s]
Running loglikelihood requests: 74%|ββββββββ | 41285/56168 [01:24<00:16, 929.53it/s]
Running loglikelihood requests: 74%|ββββββββ | 41413/56168 [01:24<00:15, 929.00it/s]
Running loglikelihood requests: 74%|ββββββββ | 41545/56168 [01:24<00:15, 938.80it/s]
Running loglikelihood requests: 74%|ββββββββ | 41673/56168 [01:25<00:15, 937.70it/s]
Running loglikelihood requests: 74%|ββββββββ | 41801/56168 [01:25<00:15, 936.28it/s]
Running loglikelihood requests: 75%|ββββββββ | 41929/56168 [01:25<00:15, 936.11it/s]
Running loglikelihood requests: 75%|ββββββββ | 42065/56168 [01:25<00:14, 952.95it/s]
Running loglikelihood requests: 75%|ββββββββ | 42193/56168 [01:25<00:14, 948.26it/s]
Running loglikelihood requests: 75%|ββββββββ | 42321/56168 [01:25<00:14, 946.00it/s]
Running loglikelihood requests: 76%|ββββββββ | 42449/56168 [01:25<00:14, 944.07it/s]
Running loglikelihood requests: 76%|ββββββββ | 42577/56168 [01:26<00:14, 942.21it/s]
Running loglikelihood requests: 76%|ββββββββ | 42709/56168 [01:26<00:14, 950.16it/s]
Running loglikelihood requests: 76%|ββββββββ | 42837/56168 [01:26<00:14, 947.14it/s]
Running loglikelihood requests: 76%|ββββββββ | 42965/56168 [01:26<00:13, 945.79it/s]
Running loglikelihood requests: 77%|ββββββββ | 43093/56168 [01:26<00:13, 945.12it/s]
Running loglikelihood requests: 77%|ββββββββ | 43221/56168 [01:26<00:13, 944.76it/s]
Running loglikelihood requests: 77%|ββββββββ | 43349/56168 [01:26<00:13, 945.43it/s]
Running loglikelihood requests: 77%|ββββββββ | 43477/56168 [01:27<00:13, 944.66it/s]
Running loglikelihood requests: 78%|ββββββββ | 43605/56168 [01:27<00:13, 944.54it/s]
Running loglikelihood requests: 78%|ββββββββ | 43733/56168 [01:27<00:13, 945.51it/s]
Running loglikelihood requests: 78%|ββββββββ | 43861/56168 [01:27<00:13, 946.17it/s]
Running loglikelihood requests: 78%|ββββββββ | 43989/56168 [01:27<00:12, 946.59it/s]
Running loglikelihood requests: 79%|ββββββββ | 44121/56168 [01:27<00:12, 958.40it/s]
Running loglikelihood requests: 79%|ββββββββ | 44249/56168 [01:27<00:12, 958.78it/s]
Running loglikelihood requests: 79%|ββββββββ | 44377/56168 [01:27<00:12, 957.10it/s]
Running loglikelihood requests: 79%|ββββββββ | 44505/56168 [01:28<00:12, 955.90it/s]
Running loglikelihood requests: 79%|ββββββββ | 44633/56168 [01:28<00:12, 957.87it/s]
Running loglikelihood requests: 80%|ββββββββ | 44761/56168 [01:28<00:11, 957.35it/s]
Running loglikelihood requests: 80%|ββββββββ | 44889/56168 [01:28<00:11, 959.97it/s]
Running loglikelihood requests: 80%|ββββββββ | 45017/56168 [01:28<00:11, 961.40it/s]
Running loglikelihood requests: 80%|ββββββββ | 45145/56168 [01:28<00:11, 959.55it/s]
Running loglikelihood requests: 81%|ββββββββ | 45273/56168 [01:28<00:11, 959.51it/s]
Running loglikelihood requests: 81%|ββββββββ | 45401/56168 [01:29<00:11, 962.32it/s]
Running loglikelihood requests: 81%|ββββββββ | 45529/56168 [01:29<00:11, 962.62it/s]
Running loglikelihood requests: 81%|βββββββββ | 45657/56168 [01:29<00:10, 963.62it/s]
Running loglikelihood requests: 82%|βββββββββ | 45785/56168 [01:29<00:10, 962.91it/s]
Running loglikelihood requests: 82%|βββββββββ | 45913/56168 [01:29<00:10, 963.03it/s]
Running loglikelihood requests: 82%|βββββββββ | 46041/56168 [01:29<00:10, 964.20it/s]
Running loglikelihood requests: 82%|βββββββββ | 46169/56168 [01:29<00:10, 963.66it/s]
Running loglikelihood requests: 82%|βββββββββ | 46297/56168 [01:29<00:10, 962.80it/s]
Running loglikelihood requests: 83%|βββββββββ | 46425/56168 [01:30<00:10, 964.13it/s]
Running loglikelihood requests: 83%|βββββββββ | 46553/56168 [01:30<00:09, 962.53it/s]
Running loglikelihood requests: 83%|βββββββββ | 46681/56168 [01:30<00:09, 962.55it/s]
Running loglikelihood requests: 83%|βββββββββ | 46809/56168 [01:30<00:09, 965.11it/s]
Running loglikelihood requests: 84%|βββββββββ | 46937/56168 [01:30<00:09, 964.52it/s]
Running loglikelihood requests: 84%|βββββββββ | 47065/56168 [01:30<00:09, 965.56it/s]
Running loglikelihood requests: 84%|βββββββββ | 47193/56168 [01:30<00:09, 966.97it/s]
Running loglikelihood requests: 84%|βββββββββ | 47321/56168 [01:31<00:09, 967.13it/s]
Running loglikelihood requests: 84%|βββββββββ | 47449/56168 [01:31<00:09, 967.36it/s]
Running loglikelihood requests: 85%|βββββββββ | 47577/56168 [01:31<00:08, 967.28it/s]
Running loglikelihood requests: 85%|βββββββββ | 47705/56168 [01:31<00:08, 967.68it/s]
Running loglikelihood requests: 85%|βββββββββ | 47833/56168 [01:31<00:08, 965.38it/s]
Running loglikelihood requests: 85%|βββββββββ | 47961/56168 [01:31<00:08, 967.24it/s]
Running loglikelihood requests: 86%|βββββββββ | 48089/56168 [01:31<00:08, 967.74it/s]
Running loglikelihood requests: 86%|βββββββββ | 48217/56168 [01:31<00:08, 965.68it/s]
Running loglikelihood requests: 86%|βββββββββ | 48345/56168 [01:32<00:07, 1009.63it/s]
Running loglikelihood requests: 86%|βββββββββ | 48477/56168 [01:32<00:07, 1051.68it/s]
Running loglikelihood requests: 87%|βββββββββ | 48605/56168 [01:32<00:07, 1068.19it/s]
Running loglikelihood requests: 87%|βββββββββ | 48733/56168 [01:32<00:06, 1086.65it/s]
Running loglikelihood requests: 87%|βββββββββ | 48861/56168 [01:32<00:06, 1098.53it/s]
Running loglikelihood requests: 87%|βββββββββ | 48989/56168 [01:32<00:06, 1104.96it/s]
Running loglikelihood requests: 87%|βββββββββ | 49125/56168 [01:32<00:06, 1133.19it/s]
Running loglikelihood requests: 88%|βββββββββ | 49253/56168 [01:32<00:06, 1133.90it/s]
Running loglikelihood requests: 88%|βββββββββ | 49381/56168 [01:32<00:05, 1131.72it/s]
Running loglikelihood requests: 88%|βββββββββ | 49509/56168 [01:33<00:05, 1132.97it/s]
Running loglikelihood requests: 88%|βββββββββ | 49637/56168 [01:33<00:05, 1131.26it/s]
Running loglikelihood requests: 89%|βββββββββ | 49765/56168 [01:33<00:05, 1130.76it/s]
Running loglikelihood requests: 89%|βββββββββ | 49893/56168 [01:33<00:05, 1136.39it/s]
Running loglikelihood requests: 89%|βββββββββ | 50021/56168 [01:33<00:05, 1135.41it/s]
Running loglikelihood requests: 89%|βββββββββ | 50149/56168 [01:33<00:05, 1135.74it/s]
Running loglikelihood requests: 90%|βββββββββ | 50277/56168 [01:33<00:05, 1138.30it/s]
Running loglikelihood requests: 90%|βββββββββ | 50405/56168 [01:33<00:05, 1136.81it/s]
Running loglikelihood requests: 90%|βββββββββ | 50533/56168 [01:33<00:04, 1139.02it/s]
Running loglikelihood requests: 90%|βββββββββ | 50661/56168 [01:34<00:04, 1139.66it/s]
Running loglikelihood requests: 90%|βββββββββ | 50793/56168 [01:34<00:04, 1149.67it/s]
Running loglikelihood requests: 91%|βββββββββ | 50921/56168 [01:34<00:04, 1151.39it/s]
Running loglikelihood requests: 91%|βββββββββ | 51049/56168 [01:34<00:04, 1145.37it/s]
Running loglikelihood requests: 91%|βββββββββ | 51177/56168 [01:34<00:04, 1144.66it/s]
Running loglikelihood requests: 91%|ββββββββββ| 51305/56168 [01:34<00:04, 1146.43it/s]
Running loglikelihood requests: 92%|ββββββββββ| 51433/56168 [01:34<00:04, 1147.68it/s]
Running loglikelihood requests: 92%|ββββββββββ| 51561/56168 [01:34<00:04, 1150.98it/s]
Running loglikelihood requests: 92%|ββββββββββ| 51689/56168 [01:34<00:03, 1148.89it/s]
Running loglikelihood requests: 92%|ββββββββββ| 51821/56168 [01:35<00:03, 1162.38it/s]
Running loglikelihood requests: 92%|ββββββββββ| 51949/56168 [01:35<00:03, 1160.37it/s]
Running loglikelihood requests: 93%|ββββββββββ| 52077/56168 [01:35<00:03, 1162.26it/s]
Running loglikelihood requests: 93%|ββββββββββ| 52205/56168 [01:35<00:03, 1159.22it/s]
Running loglikelihood requests: 93%|ββββββββββ| 52333/56168 [01:35<00:03, 1158.32it/s]
Running loglikelihood requests: 93%|ββββββββββ| 52461/56168 [01:35<00:03, 1160.24it/s]
Running loglikelihood requests: 94%|ββββββββββ| 52589/56168 [01:35<00:03, 1162.88it/s]
Running loglikelihood requests: 94%|ββββββββββ| 52717/56168 [01:35<00:02, 1164.07it/s]
Running loglikelihood requests: 94%|ββββββββββ| 52845/56168 [01:35<00:02, 1166.39it/s]
Running loglikelihood requests: 94%|ββββββββββ| 52973/56168 [01:36<00:02, 1164.62it/s]
Running loglikelihood requests: 95%|ββββββββββ| 53101/56168 [01:36<00:02, 1168.26it/s]
Running loglikelihood requests: 95%|ββββββββββ| 53229/56168 [01:36<00:02, 1168.52it/s]
Running loglikelihood requests: 95%|ββββββββββ| 53357/56168 [01:36<00:02, 1172.65it/s]
Running loglikelihood requests: 95%|ββββββββββ| 53485/56168 [01:36<00:02, 1170.18it/s]
Running loglikelihood requests: 95%|ββββββββββ| 53613/56168 [01:36<00:02, 1169.35it/s]
Running loglikelihood requests: 96%|ββββββββββ| 53741/56168 [01:36<00:02, 1164.74it/s]
Running loglikelihood requests: 96%|ββββββββββ| 53869/56168 [01:36<00:01, 1165.90it/s]
Running loglikelihood requests: 96%|ββββββββββ| 53997/56168 [01:36<00:01, 1163.81it/s]
Running loglikelihood requests: 96%|ββββββββββ| 54125/56168 [01:37<00:01, 1167.55it/s]
Running loglikelihood requests: 97%|ββββββββββ| 54253/56168 [01:37<00:01, 1166.58it/s]
Running loglikelihood requests: 97%|ββββββββββ| 54381/56168 [01:37<00:01, 1167.61it/s]
Running loglikelihood requests: 97%|ββββββββββ| 54509/56168 [01:37<00:01, 1169.42it/s]
Running loglikelihood requests: 97%|ββββββββββ| 54637/56168 [01:37<00:01, 1171.81it/s]
Running loglikelihood requests: 98%|ββββββββββ| 54765/56168 [01:37<00:01, 1172.63it/s]
Running loglikelihood requests: 98%|ββββββββββ| 54893/56168 [01:37<00:01, 1174.86it/s]
Running loglikelihood requests: 98%|ββββββββββ| 55021/56168 [01:37<00:00, 1176.58it/s]
Running loglikelihood requests: 98%|ββββββββββ| 55149/56168 [01:37<00:00, 1178.41it/s]
Running loglikelihood requests: 98%|ββββββββββ| 55277/56168 [01:38<00:00, 1188.30it/s]
Running loglikelihood requests: 99%|ββββββββββ| 55405/56168 [01:38<00:00, 1193.29it/s]
Running loglikelihood requests: 99%|ββββββββββ| 55533/56168 [01:38<00:00, 1198.21it/s]
Running loglikelihood requests: 99%|ββββββββββ| 55661/56168 [01:38<00:00, 1205.15it/s]
Running loglikelihood requests: 99%|ββββββββββ| 55789/56168 [01:38<00:00, 1212.22it/s]
Running loglikelihood requests: 100%|ββββββββββ| 55917/56168 [01:38<00:00, 1218.90it/s]
Running loglikelihood requests: 100%|ββββββββββ| 56109/56168 [01:38<00:00, 1296.36it/s]
Running loglikelihood requests: 100%|ββββββββββ| 56168/56168 [01:38<00:00, 569.05it/s] |
| 2026-04-03:16:40:08,272 WARNING [huggingface.py:1344] Failed to get model SHA for /egr/research-optml/wangc168/Muon_wmdp/rmu_wmdp/models/muon_v1_test_V1_K64_batches150_bs4_alpha1200-1200_steer15-15_muonlr2.6e-3_muonmom0.95_muonwd0.1_adamlr5e-5_seed42_layer7_layers5-6-7_params6 at revision main. Error: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/egr/research-optml/wangc168/Muon_wmdp/rmu_wmdp/models/muon_v1_test_V1_K64_batches150_bs4_alpha1200-1200_steer15-15_muonlr2.6e-3_muonmom0.95_muonwd0.1_adamlr5e-5_seed42_layer7_layers5-6-7_params6'. Use `repo_type` argument if needed. |
| 2026-04-03:16:40:13,621 INFO [evaluation_tracker.py:206] Saving results aggregated |
| 2026-04-03:16:40:13,641 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_abstract_algebra |
| 2026-04-03:16:40:13,711 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_anatomy |
| 2026-04-03:16:40:13,809 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_astronomy |
| 2026-04-03:16:40:13,927 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_business_ethics |
| 2026-04-03:16:40:14,011 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_clinical_knowledge |
| 2026-04-03:16:40:14,215 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_college_biology |
| 2026-04-03:16:40:14,331 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_college_chemistry |
| 2026-04-03:16:40:14,423 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_college_computer_science |
| 2026-04-03:16:40:14,517 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_college_mathematics |
| 2026-04-03:16:40:14,607 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_college_medicine |
| 2026-04-03:16:40:14,749 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_college_physics |
| 2026-04-03:16:40:14,828 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_computer_security |
| 2026-04-03:16:40:14,915 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_conceptual_physics |
| 2026-04-03:16:40:15,093 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_econometrics |
| 2026-04-03:16:40:15,183 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_electrical_engineering |
| 2026-04-03:16:40:15,308 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_elementary_mathematics |
| 2026-04-03:16:40:15,632 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_formal_logic |
| 2026-04-03:16:40:15,732 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_global_facts |
| 2026-04-03:16:40:15,808 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_high_school_biology |
| 2026-04-03:16:40:16,059 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_high_school_chemistry |
| 2026-04-03:16:40:16,266 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_high_school_computer_science |
| 2026-04-03:16:40:16,375 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_high_school_european_history |
| 2026-04-03:16:40:16,583 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_high_school_geography |
| 2026-04-03:16:40:16,806 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_high_school_government_and_politics |
| 2026-04-03:16:40:17,022 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_high_school_macroeconomics |
| 2026-04-03:16:40:17,385 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_high_school_mathematics |
| 2026-04-03:16:40:17,684 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_high_school_microeconomics |
| 2026-04-03:16:40:17,929 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_high_school_physics |
| 2026-04-03:16:40:18,109 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_high_school_psychology |
| 2026-04-03:16:40:18,727 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_high_school_statistics |
| 2026-04-03:16:40:18,908 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_high_school_us_history |
| 2026-04-03:16:40:19,537 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_high_school_world_history |
| 2026-04-03:16:40:19,726 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_human_aging |
| 2026-04-03:16:40:19,895 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_human_sexuality |
| 2026-04-03:16:40:20,007 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_international_law |
| 2026-04-03:16:40:20,097 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_jurisprudence |
| 2026-04-03:16:40:20,195 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_logical_fallacies |
| 2026-04-03:16:40:20,341 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_machine_learning |
| 2026-04-03:16:40:20,436 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_management |
| 2026-04-03:16:40:20,524 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_marketing |
| 2026-04-03:16:40:20,789 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_medical_genetics |
| 2026-04-03:16:40:20,897 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_miscellaneous |
| 2026-04-03:16:40:21,555 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_moral_disputes |
| 2026-04-03:16:40:21,850 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_moral_scenarios |
| 2026-04-03:16:40:22,676 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_nutrition |
| 2026-04-03:16:40:22,944 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_philosophy |
| 2026-04-03:16:40:23,198 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_prehistory |
| 2026-04-03:16:40:23,473 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_professional_accounting |
| 2026-04-03:16:40:23,719 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_professional_law |
| 2026-04-03:16:40:25,243 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_professional_medicine |
| 2026-04-03:16:40:25,522 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_professional_psychology |
| 2026-04-03:16:40:26,010 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_public_relations |
| 2026-04-03:16:40:26,094 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_security_studies |
| 2026-04-03:16:40:26,264 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_sociology |
| 2026-04-03:16:40:26,404 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_us_foreign_policy |
| 2026-04-03:16:40:26,475 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_virology |
| 2026-04-03:16:40:26,587 INFO [evaluation_tracker.py:287] Saving per-sample results for: mmlu_world_religions |
| hf (pretrained=/egr/research-optml/wangc168/Muon_wmdp/rmu_wmdp/models/muon_v1_test_V1_K64_batches150_bs4_alpha1200-1200_steer15-15_muonlr2.6e-3_muonmom0.95_muonwd0.1_adamlr5e-5_seed42_layer7_layers5-6-7_params6), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 16 |
| | Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr| |
| | |
| |mmlu | 2|none | |acc |β |0.5713|Β± |0.0040| |
| | - humanities | 2|none | |acc |β |0.5186|Β± |0.0069| |
| | - formal_logic | 1|none | 0|acc |β |0.3810|Β± |0.0434| |
| | - high_school_european_history | 1|none | 0|acc |β |0.6970|Β± |0.0359| |
| | - high_school_us_history | 1|none | 0|acc |β |0.7500|Β± |0.0304| |
| | - high_school_world_history | 1|none | 0|acc |β |0.7384|Β± |0.0286| |
| | - international_law | 1|none | 0|acc |β |0.7107|Β± |0.0414| |
| | - jurisprudence | 1|none | 0|acc |β |0.6944|Β± |0.0445| |
| | - logical_fallacies | 1|none | 0|acc |β |0.6810|Β± |0.0366| |
| | - moral_disputes | 1|none | 0|acc |β |0.6590|Β± |0.0255| |
| | - moral_scenarios | 1|none | 0|acc |β |0.2804|Β± |0.0150| |
| | - philosophy | 1|none | 0|acc |β |0.6463|Β± |0.0272| |
| | - prehistory | 1|none | 0|acc |β |0.6451|Β± |0.0266| |
| | - professional_law | 1|none | 0|acc |β |0.4231|Β± |0.0126| |
| | - world_religions | 1|none | 0|acc |β |0.8129|Β± |0.0299| |
| | - other | 2|none | |acc |β |0.6350|Β± |0.0082| |
| | - business_ethics | 1|none | 0|acc |β |0.5200|Β± |0.0502| |
| | - clinical_knowledge | 1|none | 0|acc |β |0.6491|Β± |0.0294| |
| | - college_medicine | 1|none | 0|acc |β |0.5896|Β± |0.0375| |
| | - global_facts | 1|none | 0|acc |β |0.3000|Β± |0.0461| |
| | - human_aging | 1|none | 0|acc |β |0.6368|Β± |0.0323| |
| | - management | 1|none | 0|acc |β |0.7476|Β± |0.0430| |
| | - marketing | 1|none | 0|acc |β |0.8248|Β± |0.0249| |
| | - medical_genetics | 1|none | 0|acc |β |0.6400|Β± |0.0482| |
| | - miscellaneous | 1|none | 0|acc |β |0.7778|Β± |0.0149| |
| | - nutrition | 1|none | 0|acc |β |0.6601|Β± |0.0271| |
| | - professional_accounting | 1|none | 0|acc |β |0.4149|Β± |0.0294| |
| | - professional_medicine | 1|none | 0|acc |β |0.5699|Β± |0.0301| |
| | - virology | 1|none | 0|acc |β |0.3494|Β± |0.0371| |
| | - social sciences | 2|none | |acc |β |0.6783|Β± |0.0082| |
| | - econometrics | 1|none | 0|acc |β |0.4298|Β± |0.0466| |
| | - high_school_geography | 1|none | 0|acc |β |0.7172|Β± |0.0321| |
| | - high_school_government_and_politics| 1|none | 0|acc |β |0.8290|Β± |0.0272| |
| | - high_school_macroeconomics | 1|none | 0|acc |β |0.5795|Β± |0.0250| |
| | - high_school_microeconomics | 1|none | 0|acc |β |0.6050|Β± |0.0318| |
| | - high_school_psychology | 1|none | 0|acc |β |0.7890|Β± |0.0175| |
| | - human_sexuality | 1|none | 0|acc |β |0.6565|Β± |0.0416| |
| | - professional_psychology | 1|none | 0|acc |β |0.5997|Β± |0.0198| |
| | - public_relations | 1|none | 0|acc |β |0.6545|Β± |0.0455| |
| | - security_studies | 1|none | 0|acc |β |0.6490|Β± |0.0306| |
| | - sociology | 1|none | 0|acc |β |0.8557|Β± |0.0248| |
| | - us_foreign_policy | 1|none | 0|acc |β |0.8000|Β± |0.0402| |
| | - stem | 2|none | |acc |β |0.4827|Β± |0.0087| |
| | - abstract_algebra | 1|none | 0|acc |β |0.3000|Β± |0.0461| |
| | - anatomy | 1|none | 0|acc |β |0.5630|Β± |0.0428| |
| | - astronomy | 1|none | 0|acc |β |0.5855|Β± |0.0401| |
| | - college_biology | 1|none | 0|acc |β |0.6389|Β± |0.0402| |
| | - college_chemistry | 1|none | 0|acc |β |0.4900|Β± |0.0502| |
| | - college_computer_science | 1|none | 0|acc |β |0.5100|Β± |0.0502| |
| | - college_mathematics | 1|none | 0|acc |β |0.3700|Β± |0.0485| |
| | - college_physics | 1|none | 0|acc |β |0.5294|Β± |0.0497| |
| | - computer_security | 1|none | 0|acc |β |0.4800|Β± |0.0502| |
| | - conceptual_physics | 1|none | 0|acc |β |0.4936|Β± |0.0327| |
| | - electrical_engineering | 1|none | 0|acc |β |0.5310|Β± |0.0416| |
| | - elementary_mathematics | 1|none | 0|acc |β |0.3995|Β± |0.0252| |
| | - high_school_biology | 1|none | 0|acc |β |0.6581|Β± |0.0270| |
| | - high_school_chemistry | 1|none | 0|acc |β |0.4828|Β± |0.0352| |
| | - high_school_computer_science | 1|none | 0|acc |β |0.5700|Β± |0.0498| |
| | - high_school_mathematics | 1|none | 0|acc |β |0.3370|Β± |0.0288| |
| | - high_school_physics | 1|none | 0|acc |β |0.2914|Β± |0.0371| |
| | - high_school_statistics | 1|none | 0|acc |β |0.4907|Β± |0.0341| |
| | - machine_learning | 1|none | 0|acc |β |0.4643|Β± |0.0473| |
|
|
| | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |
| | |
| |mmlu | 2|none | |acc |β |0.5713|Β± |0.0040| |
| | - humanities | 2|none | |acc |β |0.5186|Β± |0.0069| |
| | - other | 2|none | |acc |β |0.6350|Β± |0.0082| |
| | - social sciences| 2|none | |acc |β |0.6783|Β± |0.0082| |
| | - stem | 2|none | |acc |β |0.4827|Β± |0.0087| |
|
|
|
|