wcs2024's picture
Upload folder using huggingface_hub
adb8cd1 verified
/egr/research-optml/wangc168/anaconda3/envs/SOUL/lib/python3.9/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
2026-04-03:16:33:15,348 INFO [__main__.py:279] Verbosity set to INFO
2026-04-03:16:33:16,808 INFO [__init__.py:491] `group` and `group_alias` keys in TaskConfigs are deprecated and will be removed in v0.4.5 of lm_eval. The new `tag` field will be used to allow for a shortcut to a group of tasks one does not wish to aggregate metrics across. `group`s which aggregate across subtasks must be only defined in a separate group config file, which will be the official way to create groups that support cross-task aggregation as in `mmlu`. Please see the v0.4.4 patch notes and our documentation: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/new_task_guide.md#advanced-group-configs for more information.
2026-04-03:16:33:40,965 INFO [__main__.py:376] Selected Tasks: ['wmdp']
2026-04-03:16:33:40,998 INFO [evaluator.py:161] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234
2026-04-03:16:33:40,998 INFO [evaluator.py:198] Initializing hf model, with arguments: {'pretrained': '/egr/research-optml/wangc168/Muon_wmdp/rmu_wmdp/models/muon_v1_test_V1_K64_batches150_bs4_alpha1200-1200_steer15-15_muonlr2.6e-3_muonmom0.95_muonwd0.1_adamlr5e-5_seed42_layer7_layers5-6-7_params6'}
2026-04-03:16:33:41,195 INFO [huggingface.py:130] Using device 'cuda:0'
2026-04-03:16:33:41,721 INFO [huggingface.py:366] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:0'}
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s] Loading checkpoint shards: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1/3 [00:10<00:20, 10.37s/it] Loading checkpoint shards: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2/3 [00:20<00:10, 10.51s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:31<00:00, 10.46s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:31<00:00, 10.46s/it]
2026-04-03:16:34:15,990 WARNING [task.py:337] [Task: wmdp_cyber] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.
2026-04-03:16:34:15,990 WARNING [task.py:337] [Task: wmdp_cyber] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.
2026-04-03:16:34:17,327 WARNING [task.py:337] [Task: wmdp_chem] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.
2026-04-03:16:34:17,327 WARNING [task.py:337] [Task: wmdp_chem] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.
2026-04-03:16:34:18,114 WARNING [task.py:337] [Task: wmdp_bio] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.
2026-04-03:16:34:18,114 WARNING [task.py:337] [Task: wmdp_bio] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.
2026-04-03:16:34:18,191 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234
2026-04-03:16:34:18,191 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234
2026-04-03:16:34:18,191 INFO [evaluator.py:279] Setting fewshot random generator seed to 1234
2026-04-03:16:34:18,191 WARNING [model.py:422] model.chat_template was called with the chat_template set to False or None. Therefore no chat template will be applied. Make sure this is an intended behavior.
2026-04-03:16:34:18,192 INFO [task.py:423] Building contexts for wmdp_bio on rank 0...
0%| | 0/1273 [00:00<?, ?it/s] 8%|β–Š | 104/1273 [00:00<00:01, 1039.82it/s] 18%|β–ˆβ–Š | 223/1273 [00:00<00:00, 1123.37it/s] 27%|β–ˆβ–ˆβ–‹ | 342/1273 [00:00<00:00, 1150.04it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 461/1273 [00:00<00:00, 1162.06it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 580/1273 [00:00<00:00, 1170.58it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 699/1273 [00:00<00:00, 1175.36it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 818/1273 [00:00<00:00, 1176.99it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 937/1273 [00:00<00:00, 1179.19it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1056/1273 [00:00<00:00, 1180.14it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1175/1273 [00:01<00:00, 1182.95it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1273/1273 [00:01<00:00, 1170.01it/s]
2026-04-03:16:34:19,318 INFO [task.py:423] Building contexts for wmdp_chem on rank 0...
0%| | 0/408 [00:00<?, ?it/s] 29%|β–ˆβ–ˆβ–‰ | 119/408 [00:00<00:00, 1187.88it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 238/408 [00:00<00:00, 1185.27it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 357/408 [00:00<00:00, 1186.87it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 408/408 [00:00<00:00, 1186.07it/s]
2026-04-03:16:34:19,671 INFO [task.py:423] Building contexts for wmdp_cyber on rank 0...
0%| | 0/1987 [00:00<?, ?it/s] 6%|β–Œ | 118/1987 [00:00<00:01, 1172.69it/s] 12%|β–ˆβ– | 237/1987 [00:00<00:01, 1178.36it/s] 18%|β–ˆβ–Š | 355/1987 [00:00<00:01, 1176.40it/s] 24%|β–ˆβ–ˆβ– | 473/1987 [00:00<00:01, 1177.06it/s] 30%|β–ˆβ–ˆβ–‰ | 591/1987 [00:00<00:01, 1178.02it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 709/1987 [00:00<00:01, 1178.19it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 828/1987 [00:00<00:00, 1180.62it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 947/1987 [00:00<00:00, 1181.94it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1066/1987 [00:00<00:00, 1182.56it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1185/1987 [00:01<00:00, 1181.05it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1304/1987 [00:01<00:00, 1180.09it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1423/1987 [00:01<00:00, 1180.01it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1542/1987 [00:01<00:00, 1180.03it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1661/1987 [00:01<00:00, 1180.34it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1780/1987 [00:01<00:00, 1180.22it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1899/1987 [00:01<00:00, 1179.68it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1987/1987 [00:01<00:00, 1179.97it/s]
2026-04-03:16:34:21,395 INFO [evaluator.py:465] Running loglikelihood requests
Running loglikelihood requests: 0%| | 0/14672 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
Running loglikelihood requests: 0%| | 1/14672 [00:07<28:36:15, 7.02s/it] Running loglikelihood requests: 0%| | 65/14672 [00:08<24:55, 9.77it/s] Running loglikelihood requests: 1%| | 129/14672 [00:10<13:49, 17.52it/s] Running loglikelihood requests: 1%|▏ | 193/14672 [00:11<10:01, 24.08it/s] Running loglikelihood requests: 2%|▏ | 257/14672 [00:13<08:04, 29.77it/s] Running loglikelihood requests: 2%|▏ | 321/14672 [00:14<06:53, 34.71it/s] Running loglikelihood requests: 3%|β–Ž | 385/14672 [00:15<06:03, 39.31it/s] Running loglikelihood requests: 3%|β–Ž | 449/14672 [00:16<05:27, 43.45it/s] Running loglikelihood requests: 3%|β–Ž | 513/14672 [00:17<05:00, 47.14it/s] Running loglikelihood requests: 4%|▍ | 577/14672 [00:19<04:39, 50.43it/s] Running loglikelihood requests: 4%|▍ | 641/14672 [00:20<04:20, 53.83it/s] Running loglikelihood requests: 5%|▍ | 705/14672 [00:20<04:04, 57.05it/s] Running loglikelihood requests: 5%|β–Œ | 769/14672 [00:21<03:52, 59.81it/s] Running loglikelihood requests: 6%|β–Œ | 833/14672 [00:22<03:40, 62.74it/s] Running loglikelihood requests: 6%|β–Œ | 897/14672 [00:23<03:30, 65.36it/s] Running loglikelihood requests: 7%|β–‹ | 961/14672 [00:24<03:22, 67.78it/s] Running loglikelihood requests: 7%|β–‹ | 1025/14672 [00:25<03:15, 69.87it/s] Running loglikelihood requests: 7%|β–‹ | 1089/14672 [00:26<03:08, 71.88it/s] Running loglikelihood requests: 8%|β–Š | 1153/14672 [00:27<03:00, 74.76it/s] Running loglikelihood requests: 8%|β–Š | 1217/14672 [00:27<02:53, 77.72it/s] Running loglikelihood requests: 9%|β–Š | 1281/14672 [00:28<02:44, 81.20it/s] Running loglikelihood requests: 9%|β–‰ | 1345/14672 [00:29<02:37, 84.47it/s] Running loglikelihood requests: 10%|β–‰ | 1409/14672 [00:29<02:31, 87.47it/s] Running loglikelihood requests: 10%|β–ˆ | 1473/14672 [00:30<02:26, 90.20it/s] Running loglikelihood requests: 10%|β–ˆ | 1537/14672 [00:31<02:22, 92.17it/s] Running loglikelihood requests: 11%|β–ˆ | 1601/14672 [00:31<02:19, 93.56it/s] Running loglikelihood requests: 11%|β–ˆβ– | 1665/14672 [00:32<02:14, 96.35it/s] Running loglikelihood requests: 12%|β–ˆβ– | 1729/14672 [00:33<02:11, 98.52it/s] Running loglikelihood requests: 12%|β–ˆβ– | 1793/14672 [00:33<02:05, 102.71it/s] Running loglikelihood requests: 13%|β–ˆβ–Ž | 1857/14672 [00:34<02:00, 106.25it/s] Running loglikelihood requests: 13%|β–ˆβ–Ž | 1921/14672 [00:34<01:55, 110.12it/s] Running loglikelihood requests: 14%|β–ˆβ–Ž | 1985/14672 [00:35<01:51, 113.59it/s] Running loglikelihood requests: 14%|β–ˆβ– | 2049/14672 [00:35<01:47, 117.22it/s] Running loglikelihood requests: 14%|β–ˆβ– | 2113/14672 [00:36<01:44, 120.67it/s] Running loglikelihood requests: 15%|β–ˆβ– | 2177/14672 [00:36<01:39, 125.08it/s] Running loglikelihood requests: 15%|β–ˆβ–Œ | 2241/14672 [00:37<01:36, 129.12it/s] Running loglikelihood requests: 16%|β–ˆβ–Œ | 2305/14672 [00:37<01:32, 133.10it/s] Running loglikelihood requests: 16%|β–ˆβ–Œ | 2369/14672 [00:38<01:29, 137.36it/s] Running loglikelihood requests: 17%|β–ˆβ–‹ | 2433/14672 [00:38<01:26, 141.13it/s] Running loglikelihood requests: 17%|β–ˆβ–‹ | 2497/14672 [00:38<01:23, 145.27it/s] Running loglikelihood requests: 17%|β–ˆβ–‹ | 2561/14672 [00:39<01:20, 149.61it/s] Running loglikelihood requests: 18%|β–ˆβ–Š | 2625/14672 [00:39<01:17, 154.58it/s] Running loglikelihood requests: 18%|β–ˆβ–Š | 2689/14672 [00:40<01:15, 159.42it/s] Running loglikelihood requests: 19%|β–ˆβ–‰ | 2753/14672 [00:40<01:12, 164.14it/s] Running loglikelihood requests: 19%|β–ˆβ–‰ | 2817/14672 [00:40<01:10, 169.25it/s] Running loglikelihood requests: 20%|β–ˆβ–‰ | 2881/14672 [00:41<01:07, 174.44it/s] Running loglikelihood requests: 20%|β–ˆβ–ˆ | 2945/14672 [00:41<01:05, 179.07it/s] Running loglikelihood requests: 21%|β–ˆβ–ˆ | 3009/14672 [00:41<01:02, 185.17it/s] Running loglikelihood requests: 21%|β–ˆβ–ˆ | 3073/14672 [00:42<01:00, 190.69it/s] Running loglikelihood requests: 21%|β–ˆβ–ˆβ– | 3137/14672 [00:42<00:58, 197.64it/s] Running loglikelihood requests: 22%|β–ˆβ–ˆβ– | 3201/14672 [00:42<00:55, 206.18it/s] Running loglikelihood requests: 22%|β–ˆβ–ˆβ– | 3265/14672 [00:42<00:53, 213.77it/s] Running loglikelihood requests: 23%|β–ˆβ–ˆβ–Ž | 3329/14672 [00:43<00:51, 221.86it/s] Running loglikelihood requests: 23%|β–ˆβ–ˆβ–Ž | 3393/14672 [00:43<00:49, 228.72it/s] Running loglikelihood requests: 24%|β–ˆβ–ˆβ–Ž | 3457/14672 [00:43<00:47, 236.26it/s] Running loglikelihood requests: 24%|β–ˆβ–ˆβ– | 3521/14672 [00:43<00:45, 245.82it/s] Running loglikelihood requests: 24%|β–ˆβ–ˆβ– | 3585/14672 [00:44<00:43, 255.11it/s] Running loglikelihood requests: 25%|β–ˆβ–ˆβ– | 3649/14672 [00:44<00:41, 263.32it/s] Running loglikelihood requests: 25%|β–ˆβ–ˆβ–Œ | 3713/14672 [00:44<00:40, 269.28it/s] Running loglikelihood requests: 26%|β–ˆβ–ˆβ–Œ | 3777/14672 [00:44<00:39, 276.43it/s] Running loglikelihood requests: 26%|β–ˆβ–ˆβ–Œ | 3841/14672 [00:45<00:38, 282.67it/s] Running loglikelihood requests: 27%|β–ˆβ–ˆβ–‹ | 3905/14672 [00:45<00:37, 290.03it/s] Running loglikelihood requests: 27%|β–ˆβ–ˆβ–‹ | 3969/14672 [00:45<00:36, 296.27it/s] Running loglikelihood requests: 27%|β–ˆβ–ˆβ–‹ | 4033/14672 [00:45<00:35, 301.75it/s] Running loglikelihood requests: 28%|β–ˆβ–ˆβ–Š | 4097/14672 [00:45<00:33, 313.41it/s] Running loglikelihood requests: 28%|β–ˆβ–ˆβ–Š | 4161/14672 [00:46<00:32, 321.67it/s] Running loglikelihood requests: 29%|β–ˆβ–ˆβ–‰ | 4225/14672 [00:46<00:31, 332.12it/s] Running loglikelihood requests: 29%|β–ˆβ–ˆβ–‰ | 4289/14672 [00:46<00:30, 343.22it/s] Running loglikelihood requests: 30%|β–ˆβ–ˆβ–‰ | 4353/14672 [00:46<00:29, 353.59it/s] Running loglikelihood requests: 30%|β–ˆβ–ˆβ–ˆ | 4417/14672 [00:46<00:28, 365.45it/s] Running loglikelihood requests: 31%|β–ˆβ–ˆβ–ˆ | 4481/14672 [00:46<00:27, 377.39it/s] Running loglikelihood requests: 31%|β–ˆβ–ˆβ–ˆ | 4545/14672 [00:46<00:25, 394.12it/s] Running loglikelihood requests: 31%|β–ˆβ–ˆβ–ˆβ– | 4609/14672 [00:47<00:24, 407.45it/s] Running loglikelihood requests: 32%|β–ˆβ–ˆβ–ˆβ– | 4673/14672 [00:47<00:23, 418.20it/s] Running loglikelihood requests: 32%|β–ˆβ–ˆβ–ˆβ– | 4737/14672 [00:47<00:23, 426.88it/s] Running loglikelihood requests: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 4801/14672 [00:47<00:22, 432.46it/s] Running loglikelihood requests: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 4865/14672 [00:47<00:22, 441.18it/s] Running loglikelihood requests: 34%|β–ˆβ–ˆβ–ˆβ–Ž | 4929/14672 [00:47<00:21, 449.29it/s] Running loglikelihood requests: 34%|β–ˆβ–ˆβ–ˆβ– | 4993/14672 [00:47<00:21, 457.18it/s] Running loglikelihood requests: 34%|β–ˆβ–ˆβ–ˆβ– | 5057/14672 [00:48<00:20, 462.30it/s] Running loglikelihood requests: 35%|β–ˆβ–ˆβ–ˆβ– | 5121/14672 [00:48<00:20, 466.40it/s] Running loglikelihood requests: 35%|β–ˆβ–ˆβ–ˆβ–Œ | 5185/14672 [00:48<00:20, 469.62it/s] Running loglikelihood requests: 36%|β–ˆβ–ˆβ–ˆβ–Œ | 5249/14672 [00:48<00:19, 479.28it/s] Running loglikelihood requests: 36%|β–ˆβ–ˆβ–ˆβ–Œ | 5313/14672 [00:48<00:19, 486.88it/s] Running loglikelihood requests: 37%|β–ˆβ–ˆβ–ˆβ–‹ | 5377/14672 [00:48<00:18, 491.95it/s] Running loglikelihood requests: 37%|β–ˆβ–ˆβ–ˆβ–‹ | 5441/14672 [00:48<00:18, 497.51it/s] Running loglikelihood requests: 38%|β–ˆβ–ˆβ–ˆβ–Š | 5505/14672 [00:49<00:18, 500.18it/s] Running loglikelihood requests: 38%|β–ˆβ–ˆβ–ˆβ–Š | 5569/14672 [00:49<00:18, 503.95it/s] Running loglikelihood requests: 38%|β–ˆβ–ˆβ–ˆβ–Š | 5633/14672 [00:49<00:17, 508.41it/s] Running loglikelihood requests: 39%|β–ˆβ–ˆβ–ˆβ–‰ | 5697/14672 [00:49<00:17, 512.21it/s] Running loglikelihood requests: 39%|β–ˆβ–ˆβ–ˆβ–‰ | 5761/14672 [00:49<00:17, 515.34it/s] Running loglikelihood requests: 40%|β–ˆβ–ˆβ–ˆβ–‰ | 5825/14672 [00:49<00:17, 517.41it/s] Running loglikelihood requests: 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 5889/14672 [00:49<00:16, 528.06it/s] Running loglikelihood requests: 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 5953/14672 [00:49<00:16, 537.50it/s] Running loglikelihood requests: 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 6017/14672 [00:49<00:15, 544.64it/s] Running loglikelihood requests: 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 6081/14672 [00:50<00:15, 549.42it/s] Running loglikelihood requests: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 6145/14672 [00:50<00:15, 553.12it/s] Running loglikelihood requests: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 6209/14672 [00:50<00:15, 557.93it/s] Running loglikelihood requests: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 6273/14672 [00:50<00:14, 560.26it/s] Running loglikelihood requests: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 6337/14672 [00:50<00:14, 563.14it/s] Running loglikelihood requests: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 6401/14672 [00:50<00:14, 565.75it/s] Running loglikelihood requests: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 6465/14672 [00:50<00:14, 567.08it/s] Running loglikelihood requests: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 6529/14672 [00:50<00:14, 569.49it/s] Running loglikelihood requests: 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 6593/14672 [00:50<00:14, 569.81it/s] Running loglikelihood requests: 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6657/14672 [00:51<00:14, 568.82it/s] Running loglikelihood requests: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6721/14672 [00:51<00:13, 569.32it/s] Running loglikelihood requests: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6785/14672 [00:51<00:13, 570.94it/s] Running loglikelihood requests: 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 6872/14672 [00:51<00:11, 653.60it/s] Running loglikelihood requests: 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 6967/14672 [00:51<00:10, 737.51it/s] Running loglikelihood requests: 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 7042/14672 [00:51<00:13, 586.29it/s] Running loglikelihood requests: 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 7140/14672 [00:51<00:11, 682.88it/s] Running loglikelihood requests: 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 7233/14672 [00:52<00:12, 602.75it/s] Running loglikelihood requests: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 7310/14672 [00:52<00:11, 641.27it/s] Running loglikelihood requests: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 7394/14672 [00:52<00:10, 690.46it/s] Running loglikelihood requests: 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 7486/14672 [00:52<00:09, 750.40it/s] Running loglikelihood requests: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 7566/14672 [00:52<00:11, 613.66it/s] Running loglikelihood requests: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 7681/14672 [00:52<00:11, 609.26it/s] Running loglikelihood requests: 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 7792/14672 [00:52<00:09, 718.48it/s] Running loglikelihood requests: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 7873/14672 [00:52<00:11, 613.85it/s] Running loglikelihood requests: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 8001/14672 [00:53<00:10, 634.32it/s] Running loglikelihood requests: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 8129/14672 [00:53<00:10, 647.81it/s] Running loglikelihood requests: 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 8255/14672 [00:53<00:08, 770.20it/s] Running loglikelihood requests: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 8341/14672 [00:53<00:09, 671.09it/s] Running loglikelihood requests: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 8449/14672 [00:53<00:09, 647.92it/s] Running loglikelihood requests: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 8577/14672 [00:53<00:09, 668.34it/s] Running loglikelihood requests: 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 8705/14672 [00:54<00:08, 682.42it/s] Running loglikelihood requests: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 8833/14672 [00:54<00:08, 694.59it/s] Running loglikelihood requests: 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 8961/14672 [00:54<00:08, 703.87it/s] Running loglikelihood requests: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 9089/14672 [00:54<00:07, 709.22it/s] Running loglikelihood requests: 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 9217/14672 [00:54<00:07, 712.80it/s] Running loglikelihood requests: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 9345/14672 [00:55<00:07, 716.78it/s] Running loglikelihood requests: 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 9473/14672 [00:55<00:07, 720.24it/s] Running loglikelihood requests: 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 9601/14672 [00:55<00:07, 723.19it/s] Running loglikelihood requests: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 9729/14672 [00:55<00:06, 733.11it/s] Running loglikelihood requests: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 9857/14672 [00:55<00:06, 749.42it/s] Running loglikelihood requests: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 9985/14672 [00:55<00:06, 761.63it/s] Running loglikelihood requests: 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 10113/14672 [00:56<00:05, 771.50it/s] Running loglikelihood requests: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 10241/14672 [00:56<00:05, 779.97it/s] Running loglikelihood requests: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 10369/14672 [00:56<00:05, 784.02it/s] Running loglikelihood requests: 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 10497/14672 [00:56<00:05, 789.79it/s] Running loglikelihood requests: 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 10625/14672 [00:56<00:05, 794.82it/s] Running loglikelihood requests: 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 10753/14672 [00:56<00:04, 797.44it/s] Running loglikelihood requests: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 10881/14672 [00:57<00:04, 801.15it/s] Running loglikelihood requests: 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 11009/14672 [00:57<00:04, 803.77it/s] Running loglikelihood requests: 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 11137/14672 [00:57<00:04, 805.88it/s] Running loglikelihood requests: 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 11265/14672 [00:57<00:04, 808.79it/s] Running loglikelihood requests: 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 11393/14672 [00:57<00:04, 810.42it/s] Running loglikelihood requests: 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 11521/14672 [00:57<00:03, 812.74it/s] Running loglikelihood requests: 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 11649/14672 [00:57<00:03, 816.03it/s] Running loglikelihood requests: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 11777/14672 [00:58<00:03, 839.68it/s] Running loglikelihood requests: 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 11905/14672 [00:58<00:03, 858.12it/s] Running loglikelihood requests: 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 12033/14672 [00:58<00:03, 873.49it/s] Running loglikelihood requests: 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 12161/14672 [00:58<00:02, 886.31it/s] Running loglikelihood requests: 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 12289/14672 [00:58<00:02, 896.07it/s] Running loglikelihood requests: 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 12417/14672 [00:58<00:02, 904.78it/s] Running loglikelihood requests: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 12545/14672 [00:58<00:02, 911.92it/s] Running loglikelihood requests: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 12673/14672 [00:59<00:02, 919.28it/s] Running loglikelihood requests: 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 12801/14672 [00:59<00:02, 925.84it/s] Running loglikelihood requests: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 12929/14672 [00:59<00:01, 932.72it/s] Running loglikelihood requests: 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 13057/14672 [00:59<00:01, 936.23it/s] Running loglikelihood requests: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 13185/14672 [00:59<00:01, 942.82it/s] Running loglikelihood requests: 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 13313/14672 [00:59<00:01, 948.53it/s] Running loglikelihood requests: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 13441/14672 [00:59<00:01, 951.85it/s] Running loglikelihood requests: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 13569/14672 [01:00<00:01, 956.08it/s] Running loglikelihood requests: 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 13697/14672 [01:00<00:01, 961.39it/s] Running loglikelihood requests: 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 13825/14672 [01:00<00:00, 983.74it/s] Running loglikelihood requests: 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 13953/14672 [01:00<00:00, 1022.58it/s] Running loglikelihood requests: 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 14081/14672 [01:00<00:00, 1057.84it/s] Running loglikelihood requests: 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 14209/14672 [01:00<00:00, 1082.58it/s] Running loglikelihood requests: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 14337/14672 [01:00<00:00, 1106.97it/s] Running loglikelihood requests: 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 14465/14672 [01:00<00:00, 1127.11it/s] Running loglikelihood requests: 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 14593/14672 [01:00<00:00, 1144.64it/s] Running loglikelihood requests: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 14672/14672 [01:00<00:00, 240.60it/s]
2026-04-03:16:35:29,404 WARNING [huggingface.py:1344] Failed to get model SHA for /egr/research-optml/wangc168/Muon_wmdp/rmu_wmdp/models/muon_v1_test_V1_K64_batches150_bs4_alpha1200-1200_steer15-15_muonlr2.6e-3_muonmom0.95_muonwd0.1_adamlr5e-5_seed42_layer7_layers5-6-7_params6 at revision main. Error: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/egr/research-optml/wangc168/Muon_wmdp/rmu_wmdp/models/muon_v1_test_V1_K64_batches150_bs4_alpha1200-1200_steer15-15_muonlr2.6e-3_muonmom0.95_muonwd0.1_adamlr5e-5_seed42_layer7_layers5-6-7_params6'. Use `repo_type` argument if needed.
2026-04-03:16:35:44,231 INFO [evaluation_tracker.py:206] Saving results aggregated
2026-04-03:16:35:44,240 INFO [evaluation_tracker.py:287] Saving per-sample results for: wmdp_bio
2026-04-03:16:35:45,155 INFO [evaluation_tracker.py:287] Saving per-sample results for: wmdp_chem
2026-04-03:16:35:45,449 INFO [evaluation_tracker.py:287] Saving per-sample results for: wmdp_cyber
hf (pretrained=/egr/research-optml/wangc168/Muon_wmdp/rmu_wmdp/models/muon_v1_test_V1_K64_batches150_bs4_alpha1200-1200_steer15-15_muonlr2.6e-3_muonmom0.95_muonwd0.1_adamlr5e-5_seed42_layer7_layers5-6-7_params6), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 16
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|-------------|------:|------|-----:|------|---|-----:|---|-----:|
|wmdp | 1|none | |acc |↑ |0.3132|Β± |0.0076|
| - wmdp_bio | 1|none | 0|acc |↑ |0.3009|Β± |0.0129|
| - wmdp_chem | 1|none | 0|acc |↑ |0.4583|Β± |0.0247|
| - wmdp_cyber| 1|none | 0|acc |↑ |0.2914|Β± |0.0102|
|Groups|Version|Filter|n-shot|Metric| |Value | |Stderr|
|------|------:|------|------|------|---|-----:|---|-----:|
|wmdp | 1|none | |acc |↑ |0.3132|Β± |0.0076|