| /egr/research-optml/wangc168/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. |
| warnings.warn( |
|
Downloading shards: 0%| | 0/8 [00:00<?, ?it/s]
Downloading shards: 100%|ββββββββββ| 8/8 [00:00<00:00, 2715.42it/s] |
|
Loading checkpoint shards: 0%| | 0/8 [00:00<?, ?it/s]
Loading checkpoint shards: 12%|ββ | 1/8 [00:00<00:02, 2.59it/s]
Loading checkpoint shards: 25%|βββ | 2/8 [00:00<00:02, 2.66it/s]
Loading checkpoint shards: 38%|ββββ | 3/8 [00:01<00:01, 2.55it/s]
Loading checkpoint shards: 50%|βββββ | 4/8 [00:01<00:01, 2.57it/s]
Loading checkpoint shards: 62%|βββββββ | 5/8 [00:01<00:01, 2.51it/s]
Loading checkpoint shards: 75%|ββββββββ | 6/8 [00:02<00:00, 2.50it/s]
Loading checkpoint shards: 88%|βββββββββ | 7/8 [00:02<00:00, 2.48it/s]
Loading checkpoint shards: 100%|ββββββββββ| 8/8 [00:03<00:00, 2.82it/s]
Loading checkpoint shards: 100%|ββββββββββ| 8/8 [00:03<00:00, 2.64it/s] |
|
Downloading shards: 0%| | 0/8 [00:00<?, ?it/s]
Downloading shards: 100%|ββββββββββ| 8/8 [00:00<00:00, 3622.02it/s] |
|
Loading checkpoint shards: 0%| | 0/8 [00:00<?, ?it/s]
Loading checkpoint shards: 12%|ββ | 1/8 [00:00<00:02, 2.70it/s]
Loading checkpoint shards: 25%|βββ | 2/8 [00:00<00:02, 2.74it/s]
Loading checkpoint shards: 38%|ββββ | 3/8 [00:01<00:01, 2.71it/s]
Loading checkpoint shards: 50%|βββββ | 4/8 [00:01<00:01, 2.66it/s]
Loading checkpoint shards: 62%|βββββββ | 5/8 [00:01<00:01, 2.55it/s]
Loading checkpoint shards: 75%|ββββββββ | 6/8 [00:02<00:00, 2.53it/s]
Loading checkpoint shards: 88%|βββββββββ | 7/8 [00:02<00:00, 2.50it/s]
Loading checkpoint shards: 100%|ββββββββββ| 8/8 [00:02<00:00, 2.92it/s]
Loading checkpoint shards: 100%|ββββββββββ| 8/8 [00:02<00:00, 2.72it/s] |
| wandb: Currently logged in as: wangc168 (wcs23187) to https://api.wandb.ai. Use `wandb login |
| wandb: setting up run kstsc84q |
| wandb: Tracking run with wandb version 0.23.1 |
| wandb: Run data is saved locally in /egr/research-optml/wangc168/Muon_wmdp/rmu_wmdp/wandb/run-20260403_161129-kstsc84q |
| wandb: Run `wandb offline` to turn off syncing. |
| wandb: Syncing run muon-v1-rank-models/muon_v1_test_V1_K64_batches150_bs4_alpha1200-1200_steer15-15_muonlr2.6e-3_muonmom0.95_muonwd0.1_adamlr5e-5_seed42_layer7_layers5-6-7_params6 |
| wandb: βοΈ View project at https://wandb.ai/wcs23187/Muon-constraint |
| wandb: π View run at https://wandb.ai/wcs23187/Muon-constraint/runs/kstsc84q |
| ====rmu Config==== |
| model_name_or_path=HuggingFaceH4/zephyr-7b-beta |
| module_str={model_name}.model.layers[{layer_id}] |
| output_dir=models/muon_v1_test_V1_K64_batches150_bs4_alpha1200-1200_steer15-15_muonlr2.6e-3_muonmom0.95_muonwd0.1_adamlr5e-5_seed42_layer7_layers5-6-7_params6 |
| retain_corpora=['wikitext', 'wikitext'] |
| forget_corpora=['bio_remove_dataset', 'cyber-forget-corpus'] |
| alpha=[1200.0, 1200.0] |
| steering_coeffs=15,15 |
| muon_lr=0.0026 |
| muon_momentum=0.95 |
| muon_weight_decay=0.1 |
| muon_ns_steps=5 |
| muon_nesterov=True |
| adam_emb_lr=5e-05 |
| adam_scalar_lr=5e-05 |
| enable_muon_svd_logging=False |
| M_stats_path= |
| M_check_T=0 |
| V1_K=64 |
| min_len=0 |
| max_len=2000 |
| batch_size=4 |
| max_num_batches=150 |
| layer_id=7 |
| layer_ids=[5, 6, 7] |
| param_ids=[6] |
| seed=42 |
| verbose=False |
| steering_coeff_list=[15.0, 15.0] |
| ===== |
| MUON model.layers.5.mlp.down_proj.weight (4096, 14336) |
| MUON model.layers.6.mlp.down_proj.weight (4096, 14336) |
| MUON model.layers.7.mlp.down_proj.weight (4096, 14336) |
| ======= Epoch 0 ======= |
|
0%| | 0/150 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache) |
| /egr/research-optml/wangc168/Muon_wmdp/rmu_wmdp/rmu_wb/unlearn_muon_v1.py:201: UserWarning: Using a target size (torch.Size([1, 1, 4096])) that is different to the input size (torch.Size([4, 512, 4096])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. |
| unlearn_loss = torch.nn.functional.mse_loss( |
|
1%| | 1/150 [00:04<12:20, 4.97s/it]/egr/research-optml/wangc168/Muon_wmdp/rmu_wmdp/rmu_wb/unlearn_muon_v1.py:201: UserWarning: Using a target size (torch.Size([1, 1, 4096])) that is different to the input size (torch.Size([4, 768, 4096])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. |
| unlearn_loss = torch.nn.functional.mse_loss( |
|
1%|β | 2/150 [00:13<16:51, 6.84s/it]
2%|β | 3/150 [00:21<18:04, 7.38s/it]
3%|β | 4/150 [00:29<18:34, 7.63s/it]
3%|β | 5/150 [00:37<18:45, 7.76s/it]
4%|β | 6/150 [00:45<18:48, 7.84s/it]
5%|β | 7/150 [00:53<18:48, 7.89s/it]
5%|β | 8/150 [01:01<18:43, 7.91s/it]
6%|β | 9/150 [01:09<18:40, 7.94s/it]
7%|β | 10/150 [01:17<18:32, 7.95s/it]
7%|β | 11/150 [01:25<18:28, 7.97s/it]
8%|β | 12/150 [01:33<18:21, 7.98s/it]
9%|β | 13/150 [01:41<18:15, 8.00s/it]
9%|β | 14/150 [01:49<18:10, 8.02s/it]
10%|β | 15/150 [01:57<17:57, 7.98s/it]
11%|β | 16/150 [02:04<17:46, 7.96s/it]
11%|ββ | 17/150 [02:12<17:36, 7.95s/it]
12%|ββ | 18/150 [02:20<17:29, 7.95s/it]
13%|ββ | 19/150 [02:28<17:19, 7.94s/it]
13%|ββ | 20/150 [02:36<17:10, 7.93s/it]
14%|ββ | 21/150 [02:44<17:02, 7.92s/it]
15%|ββ | 22/150 [02:52<16:55, 7.94s/it]
15%|ββ | 23/150 [03:00<16:47, 7.93s/it]
16%|ββ | 24/150 [03:08<16:37, 7.92s/it]
17%|ββ | 25/150 [03:16<16:33, 7.94s/it]
17%|ββ | 26/150 [03:24<16:24, 7.94s/it]
18%|ββ | 27/150 [03:32<16:17, 7.94s/it]
19%|ββ | 28/150 [03:40<16:09, 7.95s/it]
19%|ββ | 29/150 [03:48<15:59, 7.93s/it]
20%|ββ | 30/150 [03:56<15:50, 7.92s/it]
21%|ββ | 31/150 [04:04<15:45, 7.95s/it]
21%|βββ | 32/150 [04:11<15:37, 7.94s/it]
22%|βββ | 33/150 [04:19<15:29, 7.94s/it]
23%|βββ | 34/150 [04:27<15:21, 7.95s/it]
23%|βββ | 35/150 [04:35<15:17, 7.98s/it]
24%|βββ | 36/150 [04:43<15:07, 7.96s/it]
25%|βββ | 37/150 [04:51<15:00, 7.97s/it]
25%|βββ | 38/150 [04:59<14:53, 7.98s/it]
26%|βββ | 39/150 [05:07<14:41, 7.94s/it]
27%|βββ | 40/150 [05:15<14:33, 7.94s/it]
27%|βββ | 41/150 [05:23<14:26, 7.95s/it]
28%|βββ | 42/150 [05:31<14:19, 7.96s/it]
29%|βββ | 43/150 [05:39<14:10, 7.95s/it]
29%|βββ | 44/150 [05:47<14:02, 7.95s/it]
30%|βββ | 45/150 [05:55<13:54, 7.95s/it]
31%|βββ | 46/150 [06:03<13:46, 7.95s/it]
31%|ββββ | 47/150 [06:11<13:46, 8.02s/it]
32%|ββββ | 48/150 [06:19<13:33, 7.97s/it]
33%|ββββ | 49/150 [06:27<13:22, 7.95s/it]
33%|ββββ | 50/150 [06:35<13:15, 7.96s/it]
34%|ββββ | 51/150 [06:43<13:05, 7.94s/it]
35%|ββββ | 52/150 [06:51<12:56, 7.93s/it]
35%|ββββ | 53/150 [06:58<12:49, 7.93s/it]
36%|ββββ | 54/150 [07:06<12:41, 7.93s/it]
37%|ββββ | 55/150 [07:14<12:33, 7.94s/it]
37%|ββββ | 56/150 [07:22<12:26, 7.94s/it]
38%|ββββ | 57/150 [07:30<12:18, 7.95s/it]
39%|ββββ | 58/150 [07:38<12:12, 7.96s/it]
39%|ββββ | 59/150 [07:46<12:05, 7.97s/it]
40%|ββββ | 60/150 [07:54<11:59, 7.99s/it]
41%|ββββ | 61/150 [08:02<11:52, 8.00s/it]now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1377 | unlearn_loss: 0.1377 | retain_loss: 0 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.0791 | unlearn_loss: 0.0791 | retain_loss: 4.816e-05 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1426 | unlearn_loss: 0.1416 | retain_loss: 0.001022 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07959 | unlearn_loss: 0.0791 | retain_loss: 0.0006828 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1387 | unlearn_loss: 0.1377 | retain_loss: 0.0008354 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07959 | unlearn_loss: 0.07861 | retain_loss: 0.0009842 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1357 | unlearn_loss: 0.1357 | retain_loss: 0.0004139 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07959 | unlearn_loss: 0.07861 | retain_loss: 0.0009918 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1416 | unlearn_loss: 0.1406 | retain_loss: 0.0009003 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.0791 | unlearn_loss: 0.07861 | retain_loss: 0.0004692 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1377 | unlearn_loss: 0.1367 | retain_loss: 0.001373 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.0791 | unlearn_loss: 0.07861 | retain_loss: 0.0004501 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1377 | unlearn_loss: 0.1367 | retain_loss: 0.0005684 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.0791 | unlearn_loss: 0.07861 | retain_loss: 0.0007057 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1436 | unlearn_loss: 0.1396 | retain_loss: 0.003998 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.08008 | unlearn_loss: 0.07861 | retain_loss: 0.001457 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1377 | unlearn_loss: 0.1367 | retain_loss: 0.001244 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.08008 | unlearn_loss: 0.07861 | retain_loss: 0.001556 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1396 | unlearn_loss: 0.1387 | retain_loss: 0.001022 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.0791 | unlearn_loss: 0.07812 | retain_loss: 0.0009079 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1406 | unlearn_loss: 0.1377 | retain_loss: 0.003036 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07959 | unlearn_loss: 0.07812 | retain_loss: 0.001228 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1406 | unlearn_loss: 0.1387 | retain_loss: 0.001526 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.0791 | unlearn_loss: 0.07812 | retain_loss: 0.0009193 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1406 | unlearn_loss: 0.1396 | retain_loss: 0.001152 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.0791 | unlearn_loss: 0.07812 | retain_loss: 0.001007 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1387 | unlearn_loss: 0.1357 | retain_loss: 0.003036 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.08008 | unlearn_loss: 0.07812 | retain_loss: 0.001801 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1396 | unlearn_loss: 0.1367 | retain_loss: 0.002579 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.0791 | unlearn_loss: 0.07764 | retain_loss: 0.001518 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1377 | unlearn_loss: 0.1367 | retain_loss: 0.001404 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07861 | unlearn_loss: 0.07764 | retain_loss: 0.001099 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1387 | unlearn_loss: 0.1357 | retain_loss: 0.002899 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.08008 | unlearn_loss: 0.07812 | retain_loss: 0.002121 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1406 | unlearn_loss: 0.1387 | retain_loss: 0.001907 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07959 | unlearn_loss: 0.07812 | retain_loss: 0.001358 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1416 | unlearn_loss: 0.1367 | retain_loss: 0.004425 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.08057 | unlearn_loss: 0.07812 | retain_loss: 0.002625 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1426 | unlearn_loss: 0.1396 | retain_loss: 0.002457 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07959 | unlearn_loss: 0.07764 | retain_loss: 0.001823 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1416 | unlearn_loss: 0.1367 | retain_loss: 0.004456 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.08008 | unlearn_loss: 0.07715 | retain_loss: 0.002975 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1455 | unlearn_loss: 0.1416 | retain_loss: 0.003616 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07959 | unlearn_loss: 0.07715 | retain_loss: 0.002594 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1416 | unlearn_loss: 0.1387 | retain_loss: 0.002579 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.0791 | unlearn_loss: 0.07715 | retain_loss: 0.002182 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1445 | unlearn_loss: 0.1396 | retain_loss: 0.004578 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.08057 | unlearn_loss: 0.07764 | retain_loss: 0.002914 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1416 | unlearn_loss: 0.1377 | retain_loss: 0.00351 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07959 | unlearn_loss: 0.07715 | retain_loss: 0.002396 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1396 | unlearn_loss: 0.1367 | retain_loss: 0.003021 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07861 | unlearn_loss: 0.07666 | retain_loss: 0.001945 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1377 | unlearn_loss: 0.1348 | retain_loss: 0.002563 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.0791 | unlearn_loss: 0.07715 | retain_loss: 0.001762 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1416 | unlearn_loss: 0.1338 | retain_loss: 0.00824 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.08252 | unlearn_loss: 0.07617 | retain_loss: 0.006256 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1279 | unlearn_loss: 0.1226 | retain_loss: 0.005066 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.08057 | unlearn_loss: 0.07617 | retain_loss: 0.004272 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1406 | unlearn_loss: 0.1309 | retain_loss: 0.009583 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.08447 | unlearn_loss: 0.07617 | retain_loss: 0.008179 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1396 | unlearn_loss: 0.1309 | retain_loss: 0.008667 | param_change: 0 |
| now we use the mode .. v1
41%|βββββ | 62/150 [08:10<11:45, 8.02s/it]
42%|βββββ | 63/150 [08:18<11:38, 8.03s/it]
43%|βββββ | 64/150 [08:26<11:31, 8.05s/it]
43%|βββββ | 65/150 [08:34<11:21, 8.01s/it]
44%|βββββ | 66/150 [08:42<11:12, 8.01s/it]
45%|βββββ | 67/150 [08:50<11:03, 7.99s/it]
45%|βββββ | 68/150 [08:58<10:54, 7.98s/it]
46%|βββββ | 69/150 [09:06<10:46, 7.98s/it]
47%|βββββ | 70/150 [09:14<10:37, 7.97s/it]
47%|βββββ | 71/150 [09:22<10:29, 7.96s/it]
48%|βββββ | 72/150 [09:30<10:21, 7.96s/it]
49%|βββββ | 73/150 [09:38<10:14, 7.98s/it]
49%|βββββ | 74/150 [09:46<10:04, 7.95s/it]
50%|βββββ | 75/150 [09:54<09:59, 7.99s/it]
51%|βββββ | 76/150 [10:02<09:51, 7.99s/it]
51%|ββββββ | 77/150 [10:10<09:41, 7.97s/it]
52%|ββββββ | 78/150 [10:18<09:32, 7.95s/it]
53%|ββββββ | 79/150 [10:26<09:24, 7.95s/it]/egr/research-optml/wangc168/Muon_wmdp/rmu_wmdp/rmu_wb/unlearn_muon_v1.py:201: UserWarning: Using a target size (torch.Size([1, 1, 4096])) that is different to the input size (torch.Size([4, 679, 4096])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. |
| unlearn_loss = torch.nn.functional.mse_loss( |
|
53%|ββββββ | 80/150 [10:34<09:15, 7.94s/it]
54%|ββββββ | 81/150 [10:42<09:09, 7.97s/it]
55%|ββββββ | 82/150 [10:50<09:01, 7.97s/it]
55%|ββββββ | 83/150 [10:58<08:53, 7.96s/it]
56%|ββββββ | 84/150 [11:06<08:44, 7.95s/it]
57%|ββββββ | 85/150 [11:14<08:37, 7.96s/it]
57%|ββββββ | 86/150 [11:22<08:29, 7.96s/it]
58%|ββββββ | 87/150 [11:30<08:20, 7.95s/it]
59%|ββββββ | 88/150 [11:38<08:12, 7.94s/it]
59%|ββββββ | 89/150 [11:45<08:04, 7.94s/it]
60%|ββββββ | 90/150 [11:53<07:56, 7.94s/it]
61%|ββββββ | 91/150 [12:01<07:48, 7.94s/it]
61%|βββββββ | 92/150 [12:09<07:40, 7.95s/it]
62%|βββββββ | 93/150 [12:17<07:32, 7.93s/it]
63%|βββββββ | 94/150 [12:25<07:23, 7.92s/it]
63%|βββββββ | 95/150 [12:33<07:17, 7.95s/it]
64%|βββββββ | 96/150 [12:41<07:09, 7.95s/it]
65%|βββββββ | 97/150 [12:49<07:02, 7.97s/it]
65%|βββββββ | 98/150 [12:57<06:54, 7.98s/it]
66%|βββββββ | 99/150 [13:05<06:45, 7.96s/it]
67%|βββββββ | 100/150 [13:13<06:38, 7.98s/it]
67%|βββββββ | 101/150 [13:21<06:30, 7.96s/it]
68%|βββββββ | 102/150 [13:29<06:22, 7.97s/it]
69%|βββββββ | 103/150 [13:37<06:14, 7.97s/it]
69%|βββββββ | 104/150 [13:45<06:06, 7.96s/it]
70%|βββββββ | 105/150 [13:53<05:58, 7.97s/it]
71%|βββββββ | 106/150 [14:01<05:51, 7.99s/it]
71%|ββββββββ | 107/150 [14:09<05:43, 7.98s/it]
72%|ββββββββ | 108/150 [14:17<05:35, 8.00s/it]
73%|ββββββββ | 109/150 [14:25<05:26, 7.97s/it]
73%|ββββββββ | 110/150 [14:33<05:23, 8.08s/it]
74%|ββββββββ | 111/150 [14:41<05:12, 8.02s/it]
75%|ββββββββ | 112/150 [14:49<05:04, 8.00s/it]
75%|ββββββββ | 113/150 [14:57<04:55, 7.99s/it]
76%|ββββββββ | 114/150 [15:05<04:47, 7.98s/it]
77%|ββββββββ | 115/150 [15:13<04:38, 7.97s/it]
77%|ββββββββ | 116/150 [15:21<04:31, 7.98s/it]
78%|ββββββββ | 117/150 [15:29<04:23, 7.99s/it]
79%|ββββββββ | 118/150 [15:37<04:16, 8.00s/it]
79%|ββββββββ | 119/150 [15:45<04:07, 7.99s/it]
80%|ββββββββ | 120/150 [15:53<03:59, 7.98s/it]
81%|ββββββββ | 121/150 [16:01<03:51, 7.98s/it]
81%|βββββββββ | 122/150 [16:09<03:43, 7.98s/it] |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.08398 | unlearn_loss: 0.07666 | retain_loss: 0.007538 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1416 | unlearn_loss: 0.1328 | retain_loss: 0.008362 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.08301 | unlearn_loss: 0.07568 | retain_loss: 0.007507 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1426 | unlearn_loss: 0.1377 | retain_loss: 0.004974 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07959 | unlearn_loss: 0.0752 | retain_loss: 0.004547 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1367 | unlearn_loss: 0.1318 | retain_loss: 0.004517 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07812 | unlearn_loss: 0.07422 | retain_loss: 0.00386 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1318 | unlearn_loss: 0.127 | retain_loss: 0.004547 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07861 | unlearn_loss: 0.07471 | retain_loss: 0.003769 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1309 | unlearn_loss: 0.127 | retain_loss: 0.00415 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07764 | unlearn_loss: 0.07373 | retain_loss: 0.003708 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1289 | unlearn_loss: 0.125 | retain_loss: 0.004395 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07666 | unlearn_loss: 0.07275 | retain_loss: 0.003891 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1357 | unlearn_loss: 0.1279 | retain_loss: 0.007477 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07715 | unlearn_loss: 0.07178 | retain_loss: 0.005463 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1318 | unlearn_loss: 0.126 | retain_loss: 0.005585 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07568 | unlearn_loss: 0.07129 | retain_loss: 0.004608 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1289 | unlearn_loss: 0.1245 | retain_loss: 0.004852 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07764 | unlearn_loss: 0.07324 | retain_loss: 0.004211 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.124 | unlearn_loss: 0.1196 | retain_loss: 0.004608 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07471 | unlearn_loss: 0.07031 | retain_loss: 0.004242 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.126 | unlearn_loss: 0.1196 | retain_loss: 0.006012 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07422 | unlearn_loss: 0.06934 | retain_loss: 0.004913 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1309 | unlearn_loss: 0.127 | retain_loss: 0.004333 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07275 | unlearn_loss: 0.06885 | retain_loss: 0.003906 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1216 | unlearn_loss: 0.1177 | retain_loss: 0.003967 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07129 | unlearn_loss: 0.06787 | retain_loss: 0.003448 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1245 | unlearn_loss: 0.1211 | retain_loss: 0.003632 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07129 | unlearn_loss: 0.06787 | retain_loss: 0.003189 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1162 | unlearn_loss: 0.1128 | retain_loss: 0.003662 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07178 | unlearn_loss: 0.06885 | retain_loss: 0.003113 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.125 | unlearn_loss: 0.1191 | retain_loss: 0.006256 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.0708 | unlearn_loss: 0.06592 | retain_loss: 0.004822 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1172 | unlearn_loss: 0.1113 | retain_loss: 0.005798 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07324 | unlearn_loss: 0.06836 | retain_loss: 0.004913 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.126 | unlearn_loss: 0.1206 | retain_loss: 0.005646 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.07178 | unlearn_loss: 0.06689 | retain_loss: 0.004944 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.124 | unlearn_loss: 0.1196 | retain_loss: 0.004517 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.06787 | unlearn_loss: 0.06396 | retain_loss: 0.00386 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1191 | unlearn_loss: 0.1128 | retain_loss: 0.006439 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.06982 | unlearn_loss: 0.06445 | retain_loss: 0.005249 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1167 | unlearn_loss: 0.1123 | retain_loss: 0.004242 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.06592 | unlearn_loss: 0.0625 | retain_loss: 0.003296 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1104 | unlearn_loss: 0.1055 | retain_loss: 0.0047 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.06689 | unlearn_loss: 0.06299 | retain_loss: 0.003937 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.106 | unlearn_loss: 0.1021 | retain_loss: 0.004059 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.06177 | unlearn_loss: 0.05811 | retain_loss: 0.003555 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1157 | unlearn_loss: 0.1113 | retain_loss: 0.004517 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.06079 | unlearn_loss: 0.05688 | retain_loss: 0.00383 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.105 | unlearn_loss: 0.1011 | retain_loss: 0.00412 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.06055 | unlearn_loss: 0.05688 | retain_loss: 0.00354 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1094 | unlearn_loss: 0.105 | retain_loss: 0.004242 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.06006 | unlearn_loss: 0.05615 | retain_loss: 0.003815 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.104 | unlearn_loss: 0.1001 | retain_loss: 0.003906 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.05859 | unlearn_loss: 0.05518 | retain_loss: 0.003357 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1128 | unlearn_loss: 0.1084 | retain_loss: 0.004456 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.05664 | unlearn_loss: 0.05298 | retain_loss: 0.003769 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1123 | unlearn_loss: 0.1084 | retain_loss: 0.003708 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.05615 | unlearn_loss: 0.05298 | retain_loss: 0.003235 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.124 | unlearn_loss: 0.1191 | retain_loss: 0.004852 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.05591 | unlearn_loss: 0.05176 | retain_loss: 0.004211 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
|
82%|βββββββββ | 123/150 [16:17<03:35, 7.97s/it]
83%|βββββββββ | 124/150 [16:25<03:27, 7.99s/it]
83%|βββββββββ | 125/150 [16:33<03:19, 7.99s/it]
84%|βββββββββ | 126/150 [16:41<03:11, 7.98s/it]
85%|βββββββββ | 127/150 [16:49<03:03, 7.96s/it]
85%|βββββββββ | 128/150 [16:57<02:55, 7.98s/it]
86%|βββββββββ | 129/150 [17:04<02:47, 7.96s/it]
87%|βββββββββ | 130/150 [17:12<02:39, 7.96s/it]
87%|βββββββββ | 131/150 [17:20<02:31, 7.95s/it]
88%|βββββββββ | 132/150 [17:28<02:23, 7.94s/it]
89%|βββββββββ | 133/150 [17:36<02:14, 7.93s/it]
89%|βββββββββ | 134/150 [17:44<02:07, 7.95s/it]
90%|βββββββββ | 135/150 [17:52<01:59, 7.95s/it]
91%|βββββββββ | 136/150 [18:00<01:51, 8.00s/it]
91%|ββββββββββ| 137/150 [18:08<01:44, 8.01s/it]
92%|ββββββββββ| 138/150 [18:17<01:37, 8.15s/it]
93%|ββββββββββ| 139/150 [18:25<01:29, 8.11s/it]
93%|ββββββββββ| 140/150 [18:33<01:20, 8.07s/it]
94%|ββββββββββ| 141/150 [18:41<01:12, 8.03s/it]
95%|ββββββββββ| 142/150 [18:49<01:04, 8.01s/it]
95%|ββββββββββ| 143/150 [18:57<00:55, 7.99s/it]
96%|ββββββββββ| 144/150 [19:05<00:47, 7.97s/it]
97%|ββββββββββ| 145/150 [19:12<00:39, 7.95s/it]
97%|ββββββββββ| 146/150 [19:20<00:31, 7.95s/it]
98%|ββββββββββ| 147/150 [19:28<00:23, 7.96s/it]
99%|ββββββββββ| 148/150 [19:36<00:15, 7.96s/it]
99%|ββββββββββ| 149/150 [19:44<00:07, 7.94s/it]
100%|ββββββββββ| 150/150 [19:52<00:00, 7.94s/it]
100%|ββββββββββ| 150/150 [19:52<00:00, 7.95s/it] |
| wandb: updating run metadata |
| wandb: uploading config.yaml |
| wandb: |
| wandb: Run history: |
| wandb: loss ββββββββββββββββββββββββββββββββ
ββββββ
ββ |
| wandb: param_change ββββββββββββββββββββββββββββββββββββββββ |
| wandb: retain_loss ββββββββββββββββββββββββββββββββββββββββ |
| wandb: step βββββββββββββββββββββββ
β
β
β
β
βββββββββββββ |
| wandb: unlearn_loss ββββββββββββββββββββββββββββ
β
βββ
β
βββββ
ββ |
| wandb: |
| wandb: Run summary: |
| wandb: loss 0.06348 |
| wandb: param_change 0 |
| wandb: retain_loss 0.0249 |
| wandb: step 149 |
| wandb: unlearn_loss 0.03882 |
| wandb: |
| wandb: π View run muon-v1-rank-models/muon_v1_test_V1_K64_batches150_bs4_alpha1200-1200_steer15-15_muonlr2.6e-3_muonmom0.95_muonwd0.1_adamlr5e-5_seed42_layer7_layers5-6-7_params6 at: https://wandb.ai/wcs23187/Muon-constraint/runs/kstsc84q |
| wandb: βοΈ View project at: https://wandb.ai/wcs23187/Muon-constraint |
| wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) |
| wandb: Find logs at: ./wandb/run-20260403_161129-kstsc84q/logs |
| loss: 0.103 | unlearn_loss: 0.09912 | retain_loss: 0.003784 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.05322 | unlearn_loss: 0.05005 | retain_loss: 0.003296 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1099 | unlearn_loss: 0.1055 | retain_loss: 0.004242 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.05298 | unlearn_loss: 0.04956 | retain_loss: 0.003494 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1064 | unlearn_loss: 0.1025 | retain_loss: 0.003967 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.05298 | unlearn_loss: 0.04956 | retain_loss: 0.003494 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.09619 | unlearn_loss: 0.0918 | retain_loss: 0.004456 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.05127 | unlearn_loss: 0.04736 | retain_loss: 0.003891 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1006 | unlearn_loss: 0.09619 | retain_loss: 0.004608 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.05029 | unlearn_loss: 0.04663 | retain_loss: 0.003723 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.09863 | unlearn_loss: 0.09473 | retain_loss: 0.003815 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.05273 | unlearn_loss: 0.04956 | retain_loss: 0.00325 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.09961 | unlearn_loss: 0.09619 | retain_loss: 0.003571 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.04468 | unlearn_loss: 0.04175 | retain_loss: 0.002975 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.09668 | unlearn_loss: 0.09229 | retain_loss: 0.004425 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.04419 | unlearn_loss: 0.04053 | retain_loss: 0.003708 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1025 | unlearn_loss: 0.09863 | retain_loss: 0.003754 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.04395 | unlearn_loss: 0.04077 | retain_loss: 0.003204 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.08496 | unlearn_loss: 0.08105 | retain_loss: 0.00386 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.04517 | unlearn_loss: 0.04199 | retain_loss: 0.003098 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.09717 | unlearn_loss: 0.09326 | retain_loss: 0.004089 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.04492 | unlearn_loss: 0.0415 | retain_loss: 0.003464 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1064 | unlearn_loss: 0.1011 | retain_loss: 0.005493 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.04321 | unlearn_loss: 0.03882 | retain_loss: 0.004303 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.1064 | unlearn_loss: 0.1006 | retain_loss: 0.005829 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.04443 | unlearn_loss: 0.03955 | retain_loss: 0.004913 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.125 | unlearn_loss: 0.09082 | retain_loss: 0.03394 | param_change: 0 |
| now we use the mode .. v1 |
| k_eff 64 |
| k_eff 64 |
| k_eff 64 |
| loss: 0.06348 | unlearn_loss: 0.03882 | retain_loss: 0.0249 | param_change: 0 |
| Saved model to models/muon_v1_test_V1_K64_batches150_bs4_alpha1200-1200_steer15-15_muonlr2.6e-3_muonmom0.95_muonwd0.1_adamlr5e-5_seed42_layer7_layers5-6-7_params6 |
| Saved model path to /egr/research-optml/wangc168/Muon_wmdp/rmu_wmdp/model_dir.txt with index 298 |
|
|