| /egr/research-optml/wangc168/anaconda3/envs/SOUL/lib/python3.9/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. |
| warnings.warn( |
| 2026-04-03:16:33:15,348 INFO [__main__.py:279] Verbosity set to INFO |
| 2026-04-03:16:33:16,808 INFO [__init__.py:491] `group` and `group_alias` keys in TaskConfigs are deprecated and will be removed in v0.4.5 of lm_eval. The new `tag` field will be used to allow for a shortcut to a group of tasks one does not wish to aggregate metrics across. `group`s which aggregate across subtasks must be only defined in a separate group config file, which will be the official way to create groups that support cross-task aggregation as in `mmlu`. Please see the v0.4.4 patch notes and our documentation: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/new_task_guide.md#advanced-group-configs for more information. |
| 2026-04-03:16:33:40,965 INFO [__main__.py:376] Selected Tasks: ['wmdp'] |
| 2026-04-03:16:33:40,998 INFO [evaluator.py:161] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 |
| 2026-04-03:16:33:40,998 INFO [evaluator.py:198] Initializing hf model, with arguments: {'pretrained': '/egr/research-optml/wangc168/Muon_wmdp/rmu_wmdp/models/muon_v1_test_V1_K64_batches150_bs4_alpha1200-1200_steer15-15_muonlr2.6e-3_muonmom0.95_muonwd0.1_adamlr5e-5_seed42_layer7_layers5-6-7_params6'} |
| 2026-04-03:16:33:41,195 INFO [huggingface.py:130] Using device 'cuda:0' |
| 2026-04-03:16:33:41,721 INFO [huggingface.py:366] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:0'} |
|
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 33%|ββββ | 1/3 [00:10<00:20, 10.37s/it]
Loading checkpoint shards: 67%|βββββββ | 2/3 [00:20<00:10, 10.51s/it]
Loading checkpoint shards: 100%|ββββββββββ| 3/3 [00:31<00:00, 10.46s/it]
Loading checkpoint shards: 100%|ββββββββββ| 3/3 [00:31<00:00, 10.46s/it] |
| 2026-04-03:16:34:15,990 WARNING [task.py:337] [Task: wmdp_cyber] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended. |
| 2026-04-03:16:34:15,990 WARNING [task.py:337] [Task: wmdp_cyber] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended. |
| 2026-04-03:16:34:17,327 WARNING [task.py:337] [Task: wmdp_chem] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended. |
| 2026-04-03:16:34:17,327 WARNING [task.py:337] [Task: wmdp_chem] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended. |
| 2026-04-03:16:34:18,114 WARNING [task.py:337] [Task: wmdp_bio] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended. |
| 2026-04-03:16:34:18,114 WARNING [task.py:337] [Task: wmdp_bio] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended. |
| 2026-04-03:16:34:18,191 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:34:18,191 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:34:18,191 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234 |
| 2026-04-03:16:34:18,191 WARNING [model.py:422] model.chat_template was called with the chat_template set to False or None. Therefore no chat template will be applied. Make sure this is an intended behavior. |
| 2026-04-03:16:34:18,192 INFO [task.py:423] Building contexts for wmdp_bio on rank 0... |
|
0%| | 0/1273 [00:00<?, ?it/s]
8%|β | 104/1273 [00:00<00:01, 1039.82it/s]
18%|ββ | 223/1273 [00:00<00:00, 1123.37it/s]
27%|βββ | 342/1273 [00:00<00:00, 1150.04it/s]
36%|ββββ | 461/1273 [00:00<00:00, 1162.06it/s]
46%|βββββ | 580/1273 [00:00<00:00, 1170.58it/s]
55%|ββββββ | 699/1273 [00:00<00:00, 1175.36it/s]
64%|βββββββ | 818/1273 [00:00<00:00, 1176.99it/s]
74%|ββββββββ | 937/1273 [00:00<00:00, 1179.19it/s]
83%|βββββββββ | 1056/1273 [00:00<00:00, 1180.14it/s]
92%|ββββββββββ| 1175/1273 [00:01<00:00, 1182.95it/s]
100%|ββββββββββ| 1273/1273 [00:01<00:00, 1170.01it/s] |
| 2026-04-03:16:34:19,318 INFO [task.py:423] Building contexts for wmdp_chem on rank 0... |
|
0%| | 0/408 [00:00<?, ?it/s]
29%|βββ | 119/408 [00:00<00:00, 1187.88it/s]
58%|ββββββ | 238/408 [00:00<00:00, 1185.27it/s]
88%|βββββββββ | 357/408 [00:00<00:00, 1186.87it/s]
100%|ββββββββββ| 408/408 [00:00<00:00, 1186.07it/s] |
| 2026-04-03:16:34:19,671 INFO [task.py:423] Building contexts for wmdp_cyber on rank 0... |
|
0%| | 0/1987 [00:00<?, ?it/s]
6%|β | 118/1987 [00:00<00:01, 1172.69it/s]
12%|ββ | 237/1987 [00:00<00:01, 1178.36it/s]
18%|ββ | 355/1987 [00:00<00:01, 1176.40it/s]
24%|βββ | 473/1987 [00:00<00:01, 1177.06it/s]
30%|βββ | 591/1987 [00:00<00:01, 1178.02it/s]
36%|ββββ | 709/1987 [00:00<00:01, 1178.19it/s]
42%|βββββ | 828/1987 [00:00<00:00, 1180.62it/s]
48%|βββββ | 947/1987 [00:00<00:00, 1181.94it/s]
54%|ββββββ | 1066/1987 [00:00<00:00, 1182.56it/s]
60%|ββββββ | 1185/1987 [00:01<00:00, 1181.05it/s]
66%|βββββββ | 1304/1987 [00:01<00:00, 1180.09it/s]
72%|ββββββββ | 1423/1987 [00:01<00:00, 1180.01it/s]
78%|ββββββββ | 1542/1987 [00:01<00:00, 1180.03it/s]
84%|βββββββββ | 1661/1987 [00:01<00:00, 1180.34it/s]
90%|βββββββββ | 1780/1987 [00:01<00:00, 1180.22it/s]
96%|ββββββββββ| 1899/1987 [00:01<00:00, 1179.68it/s]
100%|ββββββββββ| 1987/1987 [00:01<00:00, 1179.97it/s] |
| 2026-04-03:16:34:21,395 INFO [evaluator.py:465] Running loglikelihood requests |
|
Running loglikelihood requests: 0%| | 0/14672 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache) |
|
Running loglikelihood requests: 0%| | 1/14672 [00:07<28:36:15, 7.02s/it]
Running loglikelihood requests: 0%| | 65/14672 [00:08<24:55, 9.77it/s]
Running loglikelihood requests: 1%| | 129/14672 [00:10<13:49, 17.52it/s]
Running loglikelihood requests: 1%|β | 193/14672 [00:11<10:01, 24.08it/s]
Running loglikelihood requests: 2%|β | 257/14672 [00:13<08:04, 29.77it/s]
Running loglikelihood requests: 2%|β | 321/14672 [00:14<06:53, 34.71it/s]
Running loglikelihood requests: 3%|β | 385/14672 [00:15<06:03, 39.31it/s]
Running loglikelihood requests: 3%|β | 449/14672 [00:16<05:27, 43.45it/s]
Running loglikelihood requests: 3%|β | 513/14672 [00:17<05:00, 47.14it/s]
Running loglikelihood requests: 4%|β | 577/14672 [00:19<04:39, 50.43it/s]
Running loglikelihood requests: 4%|β | 641/14672 [00:20<04:20, 53.83it/s]
Running loglikelihood requests: 5%|β | 705/14672 [00:20<04:04, 57.05it/s]
Running loglikelihood requests: 5%|β | 769/14672 [00:21<03:52, 59.81it/s]
Running loglikelihood requests: 6%|β | 833/14672 [00:22<03:40, 62.74it/s]
Running loglikelihood requests: 6%|β | 897/14672 [00:23<03:30, 65.36it/s]
Running loglikelihood requests: 7%|β | 961/14672 [00:24<03:22, 67.78it/s]
Running loglikelihood requests: 7%|β | 1025/14672 [00:25<03:15, 69.87it/s]
Running loglikelihood requests: 7%|β | 1089/14672 [00:26<03:08, 71.88it/s]
Running loglikelihood requests: 8%|β | 1153/14672 [00:27<03:00, 74.76it/s]
Running loglikelihood requests: 8%|β | 1217/14672 [00:27<02:53, 77.72it/s]
Running loglikelihood requests: 9%|β | 1281/14672 [00:28<02:44, 81.20it/s]
Running loglikelihood requests: 9%|β | 1345/14672 [00:29<02:37, 84.47it/s]
Running loglikelihood requests: 10%|β | 1409/14672 [00:29<02:31, 87.47it/s]
Running loglikelihood requests: 10%|β | 1473/14672 [00:30<02:26, 90.20it/s]
Running loglikelihood requests: 10%|β | 1537/14672 [00:31<02:22, 92.17it/s]
Running loglikelihood requests: 11%|β | 1601/14672 [00:31<02:19, 93.56it/s]
Running loglikelihood requests: 11%|ββ | 1665/14672 [00:32<02:14, 96.35it/s]
Running loglikelihood requests: 12%|ββ | 1729/14672 [00:33<02:11, 98.52it/s]
Running loglikelihood requests: 12%|ββ | 1793/14672 [00:33<02:05, 102.71it/s]
Running loglikelihood requests: 13%|ββ | 1857/14672 [00:34<02:00, 106.25it/s]
Running loglikelihood requests: 13%|ββ | 1921/14672 [00:34<01:55, 110.12it/s]
Running loglikelihood requests: 14%|ββ | 1985/14672 [00:35<01:51, 113.59it/s]
Running loglikelihood requests: 14%|ββ | 2049/14672 [00:35<01:47, 117.22it/s]
Running loglikelihood requests: 14%|ββ | 2113/14672 [00:36<01:44, 120.67it/s]
Running loglikelihood requests: 15%|ββ | 2177/14672 [00:36<01:39, 125.08it/s]
Running loglikelihood requests: 15%|ββ | 2241/14672 [00:37<01:36, 129.12it/s]
Running loglikelihood requests: 16%|ββ | 2305/14672 [00:37<01:32, 133.10it/s]
Running loglikelihood requests: 16%|ββ | 2369/14672 [00:38<01:29, 137.36it/s]
Running loglikelihood requests: 17%|ββ | 2433/14672 [00:38<01:26, 141.13it/s]
Running loglikelihood requests: 17%|ββ | 2497/14672 [00:38<01:23, 145.27it/s]
Running loglikelihood requests: 17%|ββ | 2561/14672 [00:39<01:20, 149.61it/s]
Running loglikelihood requests: 18%|ββ | 2625/14672 [00:39<01:17, 154.58it/s]
Running loglikelihood requests: 18%|ββ | 2689/14672 [00:40<01:15, 159.42it/s]
Running loglikelihood requests: 19%|ββ | 2753/14672 [00:40<01:12, 164.14it/s]
Running loglikelihood requests: 19%|ββ | 2817/14672 [00:40<01:10, 169.25it/s]
Running loglikelihood requests: 20%|ββ | 2881/14672 [00:41<01:07, 174.44it/s]
Running loglikelihood requests: 20%|ββ | 2945/14672 [00:41<01:05, 179.07it/s]
Running loglikelihood requests: 21%|ββ | 3009/14672 [00:41<01:02, 185.17it/s]
Running loglikelihood requests: 21%|ββ | 3073/14672 [00:42<01:00, 190.69it/s]
Running loglikelihood requests: 21%|βββ | 3137/14672 [00:42<00:58, 197.64it/s]
Running loglikelihood requests: 22%|βββ | 3201/14672 [00:42<00:55, 206.18it/s]
Running loglikelihood requests: 22%|βββ | 3265/14672 [00:42<00:53, 213.77it/s]
Running loglikelihood requests: 23%|βββ | 3329/14672 [00:43<00:51, 221.86it/s]
Running loglikelihood requests: 23%|βββ | 3393/14672 [00:43<00:49, 228.72it/s]
Running loglikelihood requests: 24%|βββ | 3457/14672 [00:43<00:47, 236.26it/s]
Running loglikelihood requests: 24%|βββ | 3521/14672 [00:43<00:45, 245.82it/s]
Running loglikelihood requests: 24%|βββ | 3585/14672 [00:44<00:43, 255.11it/s]
Running loglikelihood requests: 25%|βββ | 3649/14672 [00:44<00:41, 263.32it/s]
Running loglikelihood requests: 25%|βββ | 3713/14672 [00:44<00:40, 269.28it/s]
Running loglikelihood requests: 26%|βββ | 3777/14672 [00:44<00:39, 276.43it/s]
Running loglikelihood requests: 26%|βββ | 3841/14672 [00:45<00:38, 282.67it/s]
Running loglikelihood requests: 27%|βββ | 3905/14672 [00:45<00:37, 290.03it/s]
Running loglikelihood requests: 27%|βββ | 3969/14672 [00:45<00:36, 296.27it/s]
Running loglikelihood requests: 27%|βββ | 4033/14672 [00:45<00:35, 301.75it/s]
Running loglikelihood requests: 28%|βββ | 4097/14672 [00:45<00:33, 313.41it/s]
Running loglikelihood requests: 28%|βββ | 4161/14672 [00:46<00:32, 321.67it/s]
Running loglikelihood requests: 29%|βββ | 4225/14672 [00:46<00:31, 332.12it/s]
Running loglikelihood requests: 29%|βββ | 4289/14672 [00:46<00:30, 343.22it/s]
Running loglikelihood requests: 30%|βββ | 4353/14672 [00:46<00:29, 353.59it/s]
Running loglikelihood requests: 30%|βββ | 4417/14672 [00:46<00:28, 365.45it/s]
Running loglikelihood requests: 31%|βββ | 4481/14672 [00:46<00:27, 377.39it/s]
Running loglikelihood requests: 31%|βββ | 4545/14672 [00:46<00:25, 394.12it/s]
Running loglikelihood requests: 31%|ββββ | 4609/14672 [00:47<00:24, 407.45it/s]
Running loglikelihood requests: 32%|ββββ | 4673/14672 [00:47<00:23, 418.20it/s]
Running loglikelihood requests: 32%|ββββ | 4737/14672 [00:47<00:23, 426.88it/s]
Running loglikelihood requests: 33%|ββββ | 4801/14672 [00:47<00:22, 432.46it/s]
Running loglikelihood requests: 33%|ββββ | 4865/14672 [00:47<00:22, 441.18it/s]
Running loglikelihood requests: 34%|ββββ | 4929/14672 [00:47<00:21, 449.29it/s]
Running loglikelihood requests: 34%|ββββ | 4993/14672 [00:47<00:21, 457.18it/s]
Running loglikelihood requests: 34%|ββββ | 5057/14672 [00:48<00:20, 462.30it/s]
Running loglikelihood requests: 35%|ββββ | 5121/14672 [00:48<00:20, 466.40it/s]
Running loglikelihood requests: 35%|ββββ | 5185/14672 [00:48<00:20, 469.62it/s]
Running loglikelihood requests: 36%|ββββ | 5249/14672 [00:48<00:19, 479.28it/s]
Running loglikelihood requests: 36%|ββββ | 5313/14672 [00:48<00:19, 486.88it/s]
Running loglikelihood requests: 37%|ββββ | 5377/14672 [00:48<00:18, 491.95it/s]
Running loglikelihood requests: 37%|ββββ | 5441/14672 [00:48<00:18, 497.51it/s]
Running loglikelihood requests: 38%|ββββ | 5505/14672 [00:49<00:18, 500.18it/s]
Running loglikelihood requests: 38%|ββββ | 5569/14672 [00:49<00:18, 503.95it/s]
Running loglikelihood requests: 38%|ββββ | 5633/14672 [00:49<00:17, 508.41it/s]
Running loglikelihood requests: 39%|ββββ | 5697/14672 [00:49<00:17, 512.21it/s]
Running loglikelihood requests: 39%|ββββ | 5761/14672 [00:49<00:17, 515.34it/s]
Running loglikelihood requests: 40%|ββββ | 5825/14672 [00:49<00:17, 517.41it/s]
Running loglikelihood requests: 40%|ββββ | 5889/14672 [00:49<00:16, 528.06it/s]
Running loglikelihood requests: 41%|ββββ | 5953/14672 [00:49<00:16, 537.50it/s]
Running loglikelihood requests: 41%|ββββ | 6017/14672 [00:49<00:15, 544.64it/s]
Running loglikelihood requests: 41%|βββββ | 6081/14672 [00:50<00:15, 549.42it/s]
Running loglikelihood requests: 42%|βββββ | 6145/14672 [00:50<00:15, 553.12it/s]
Running loglikelihood requests: 42%|βββββ | 6209/14672 [00:50<00:15, 557.93it/s]
Running loglikelihood requests: 43%|βββββ | 6273/14672 [00:50<00:14, 560.26it/s]
Running loglikelihood requests: 43%|βββββ | 6337/14672 [00:50<00:14, 563.14it/s]
Running loglikelihood requests: 44%|βββββ | 6401/14672 [00:50<00:14, 565.75it/s]
Running loglikelihood requests: 44%|βββββ | 6465/14672 [00:50<00:14, 567.08it/s]
Running loglikelihood requests: 44%|βββββ | 6529/14672 [00:50<00:14, 569.49it/s]
Running loglikelihood requests: 45%|βββββ | 6593/14672 [00:50<00:14, 569.81it/s]
Running loglikelihood requests: 45%|βββββ | 6657/14672 [00:51<00:14, 568.82it/s]
Running loglikelihood requests: 46%|βββββ | 6721/14672 [00:51<00:13, 569.32it/s]
Running loglikelihood requests: 46%|βββββ | 6785/14672 [00:51<00:13, 570.94it/s]
Running loglikelihood requests: 47%|βββββ | 6872/14672 [00:51<00:11, 653.60it/s]
Running loglikelihood requests: 47%|βββββ | 6967/14672 [00:51<00:10, 737.51it/s]
Running loglikelihood requests: 48%|βββββ | 7042/14672 [00:51<00:13, 586.29it/s]
Running loglikelihood requests: 49%|βββββ | 7140/14672 [00:51<00:11, 682.88it/s]
Running loglikelihood requests: 49%|βββββ | 7233/14672 [00:52<00:12, 602.75it/s]
Running loglikelihood requests: 50%|βββββ | 7310/14672 [00:52<00:11, 641.27it/s]
Running loglikelihood requests: 50%|βββββ | 7394/14672 [00:52<00:10, 690.46it/s]
Running loglikelihood requests: 51%|βββββ | 7486/14672 [00:52<00:09, 750.40it/s]
Running loglikelihood requests: 52%|ββββββ | 7566/14672 [00:52<00:11, 613.66it/s]
Running loglikelihood requests: 52%|ββββββ | 7681/14672 [00:52<00:11, 609.26it/s]
Running loglikelihood requests: 53%|ββββββ | 7792/14672 [00:52<00:09, 718.48it/s]
Running loglikelihood requests: 54%|ββββββ | 7873/14672 [00:52<00:11, 613.85it/s]
Running loglikelihood requests: 55%|ββββββ | 8001/14672 [00:53<00:10, 634.32it/s]
Running loglikelihood requests: 55%|ββββββ | 8129/14672 [00:53<00:10, 647.81it/s]
Running loglikelihood requests: 56%|ββββββ | 8255/14672 [00:53<00:08, 770.20it/s]
Running loglikelihood requests: 57%|ββββββ | 8341/14672 [00:53<00:09, 671.09it/s]
Running loglikelihood requests: 58%|ββββββ | 8449/14672 [00:53<00:09, 647.92it/s]
Running loglikelihood requests: 58%|ββββββ | 8577/14672 [00:53<00:09, 668.34it/s]
Running loglikelihood requests: 59%|ββββββ | 8705/14672 [00:54<00:08, 682.42it/s]
Running loglikelihood requests: 60%|ββββββ | 8833/14672 [00:54<00:08, 694.59it/s]
Running loglikelihood requests: 61%|ββββββ | 8961/14672 [00:54<00:08, 703.87it/s]
Running loglikelihood requests: 62%|βββββββ | 9089/14672 [00:54<00:07, 709.22it/s]
Running loglikelihood requests: 63%|βββββββ | 9217/14672 [00:54<00:07, 712.80it/s]
Running loglikelihood requests: 64%|βββββββ | 9345/14672 [00:55<00:07, 716.78it/s]
Running loglikelihood requests: 65%|βββββββ | 9473/14672 [00:55<00:07, 720.24it/s]
Running loglikelihood requests: 65%|βββββββ | 9601/14672 [00:55<00:07, 723.19it/s]
Running loglikelihood requests: 66%|βββββββ | 9729/14672 [00:55<00:06, 733.11it/s]
Running loglikelihood requests: 67%|βββββββ | 9857/14672 [00:55<00:06, 749.42it/s]
Running loglikelihood requests: 68%|βββββββ | 9985/14672 [00:55<00:06, 761.63it/s]
Running loglikelihood requests: 69%|βββββββ | 10113/14672 [00:56<00:05, 771.50it/s]
Running loglikelihood requests: 70%|βββββββ | 10241/14672 [00:56<00:05, 779.97it/s]
Running loglikelihood requests: 71%|βββββββ | 10369/14672 [00:56<00:05, 784.02it/s]
Running loglikelihood requests: 72%|ββββββββ | 10497/14672 [00:56<00:05, 789.79it/s]
Running loglikelihood requests: 72%|ββββββββ | 10625/14672 [00:56<00:05, 794.82it/s]
Running loglikelihood requests: 73%|ββββββββ | 10753/14672 [00:56<00:04, 797.44it/s]
Running loglikelihood requests: 74%|ββββββββ | 10881/14672 [00:57<00:04, 801.15it/s]
Running loglikelihood requests: 75%|ββββββββ | 11009/14672 [00:57<00:04, 803.77it/s]
Running loglikelihood requests: 76%|ββββββββ | 11137/14672 [00:57<00:04, 805.88it/s]
Running loglikelihood requests: 77%|ββββββββ | 11265/14672 [00:57<00:04, 808.79it/s]
Running loglikelihood requests: 78%|ββββββββ | 11393/14672 [00:57<00:04, 810.42it/s]
Running loglikelihood requests: 79%|ββββββββ | 11521/14672 [00:57<00:03, 812.74it/s]
Running loglikelihood requests: 79%|ββββββββ | 11649/14672 [00:57<00:03, 816.03it/s]
Running loglikelihood requests: 80%|ββββββββ | 11777/14672 [00:58<00:03, 839.68it/s]
Running loglikelihood requests: 81%|ββββββββ | 11905/14672 [00:58<00:03, 858.12it/s]
Running loglikelihood requests: 82%|βββββββββ | 12033/14672 [00:58<00:03, 873.49it/s]
Running loglikelihood requests: 83%|βββββββββ | 12161/14672 [00:58<00:02, 886.31it/s]
Running loglikelihood requests: 84%|βββββββββ | 12289/14672 [00:58<00:02, 896.07it/s]
Running loglikelihood requests: 85%|βββββββββ | 12417/14672 [00:58<00:02, 904.78it/s]
Running loglikelihood requests: 86%|βββββββββ | 12545/14672 [00:58<00:02, 911.92it/s]
Running loglikelihood requests: 86%|βββββββββ | 12673/14672 [00:59<00:02, 919.28it/s]
Running loglikelihood requests: 87%|βββββββββ | 12801/14672 [00:59<00:02, 925.84it/s]
Running loglikelihood requests: 88%|βββββββββ | 12929/14672 [00:59<00:01, 932.72it/s]
Running loglikelihood requests: 89%|βββββββββ | 13057/14672 [00:59<00:01, 936.23it/s]
Running loglikelihood requests: 90%|βββββββββ | 13185/14672 [00:59<00:01, 942.82it/s]
Running loglikelihood requests: 91%|βββββββββ | 13313/14672 [00:59<00:01, 948.53it/s]
Running loglikelihood requests: 92%|ββββββββββ| 13441/14672 [00:59<00:01, 951.85it/s]
Running loglikelihood requests: 92%|ββββββββββ| 13569/14672 [01:00<00:01, 956.08it/s]
Running loglikelihood requests: 93%|ββββββββββ| 13697/14672 [01:00<00:01, 961.39it/s]
Running loglikelihood requests: 94%|ββββββββββ| 13825/14672 [01:00<00:00, 983.74it/s]
Running loglikelihood requests: 95%|ββββββββββ| 13953/14672 [01:00<00:00, 1022.58it/s]
Running loglikelihood requests: 96%|ββββββββββ| 14081/14672 [01:00<00:00, 1057.84it/s]
Running loglikelihood requests: 97%|ββββββββββ| 14209/14672 [01:00<00:00, 1082.58it/s]
Running loglikelihood requests: 98%|ββββββββββ| 14337/14672 [01:00<00:00, 1106.97it/s]
Running loglikelihood requests: 99%|ββββββββββ| 14465/14672 [01:00<00:00, 1127.11it/s]
Running loglikelihood requests: 99%|ββββββββββ| 14593/14672 [01:00<00:00, 1144.64it/s]
Running loglikelihood requests: 100%|ββββββββββ| 14672/14672 [01:00<00:00, 240.60it/s] |
| 2026-04-03:16:35:29,404 WARNING [huggingface.py:1344] Failed to get model SHA for /egr/research-optml/wangc168/Muon_wmdp/rmu_wmdp/models/muon_v1_test_V1_K64_batches150_bs4_alpha1200-1200_steer15-15_muonlr2.6e-3_muonmom0.95_muonwd0.1_adamlr5e-5_seed42_layer7_layers5-6-7_params6 at revision main. Error: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/egr/research-optml/wangc168/Muon_wmdp/rmu_wmdp/models/muon_v1_test_V1_K64_batches150_bs4_alpha1200-1200_steer15-15_muonlr2.6e-3_muonmom0.95_muonwd0.1_adamlr5e-5_seed42_layer7_layers5-6-7_params6'. Use `repo_type` argument if needed. |
| 2026-04-03:16:35:44,231 INFO [evaluation_tracker.py:206] Saving results aggregated |
| 2026-04-03:16:35:44,240 INFO [evaluation_tracker.py:287] Saving per-sample results for: wmdp_bio |
| 2026-04-03:16:35:45,155 INFO [evaluation_tracker.py:287] Saving per-sample results for: wmdp_chem |
| 2026-04-03:16:35:45,449 INFO [evaluation_tracker.py:287] Saving per-sample results for: wmdp_cyber |
| hf (pretrained=/egr/research-optml/wangc168/Muon_wmdp/rmu_wmdp/models/muon_v1_test_V1_K64_batches150_bs4_alpha1200-1200_steer15-15_muonlr2.6e-3_muonmom0.95_muonwd0.1_adamlr5e-5_seed42_layer7_layers5-6-7_params6), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 16 |
| | Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr| |
| | |
| |wmdp | 1|none | |acc |β |0.3132|Β± |0.0076| |
| | - wmdp_bio | 1|none | 0|acc |β |0.3009|Β± |0.0129| |
| | - wmdp_chem | 1|none | 0|acc |β |0.4583|Β± |0.0247| |
| | - wmdp_cyber| 1|none | 0|acc |β |0.2914|Β± |0.0102| |
|
|
| |Groups|Version|Filter|n-shot|Metric| |Value | |Stderr| |
| | |
| |wmdp | 1|none | |acc |β |0.3132|Β± |0.0076| |
|
|
|
|