NeNet / goodlogs /debug_downsample_20251221_2142 /logs /train.debug_downsample_20251221_2142.log
HaiwenXia's picture
Upload folder using huggingface_hub
5eb0aae verified
2025-12-21T21:42:50.888577+0800 | INFO | Log file: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/logs/train.debug_downsample_20251221_2142.log
2025-12-21T21:42:50.889841+0800 | INFO | Random seed set to 42
2025-12-21T21:42:51.052332+0800 | INFO | Loaded 1000 comparison IDs from ['/data/yrb/musicarena/Haiwen/offline_data/cmi-arena/CMI-Training/cleaned_qwen_2025-12-11_15-23-32.json'] via inner join
2025-12-21T21:43:28.675286+0800 | INFO | Created PreferenceDataset with 992 samples, duration=30s
2025-12-21T21:43:37.449931+0800 | INFO | Created RewardAttentionModel with attention_mode=CA
2025-12-21T21:43:37.450286+0800 | INFO | Created PreferenceLoss with filter_ties=True
2025-12-21T21:43:37.811279+0800 | INFO | ✓ Gradient checkpointing enabled
2025-12-21T21:43:37.845048+0800 | INFO | ✓ EMA enabled with decay=0.9999, update_every=1 (CPU offload)
2025-12-21T21:43:37.849737+0800 | INFO | Using lr_schedule=linear_cosine warmup_steps=300 total_steps=None
2025-12-21T21:43:37.849948+0800 | INFO | Training with random split dataset as valid set
2025-12-21T21:43:37.850456+0800 | INFO | Training: 892 samples, Validation: 100 samples
2025-12-21T21:43:37.851368+0800 | INFO | Train batch_size: 8, Valid batch_size: 8
2025-12-21T21:43:37.856999+0800 | INFO | Parameters: 705.657M total, 43.080M trainable
2025-12-21T21:43:38.417392+0800 | INFO | ============================================================
2025-12-21T21:43:38.417558+0800 | INFO | Ready to start training
2025-12-21T21:43:38.417602+0800 | INFO | ============================================================
2025-12-21T21:43:38.417683+0800 | INFO | Starting training from step 0
2025-12-21T21:43:38.417728+0800 | INFO | ===== Accelerator / CUDA Debug Info =====
2025-12-21T21:43:38.417781+0800 | INFO | accelerator.device = cuda:0
2025-12-21T21:43:38.417830+0800 | INFO | distributed_type = MULTI_GPU
2025-12-21T21:43:38.417869+0800 | INFO | num_processes = 1
2025-12-21T21:43:38.417905+0800 | INFO | process_index = 0
2025-12-21T21:43:38.417947+0800 | INFO | is_main_process = True
2025-12-21T21:43:38.417996+0800 | INFO | torch.cuda.is_available() = True
2025-12-21T21:43:38.418212+0800 | INFO | torch.cuda.device_count() = 1
2025-12-21T21:43:38.418272+0800 | INFO | current_device = 0
2025-12-21T21:43:38.418329+0800 | INFO | device_name = NVIDIA GeForce RTX 4090
2025-12-21T21:43:38.418407+0800 | INFO | model parameter device = cuda:0
2025-12-21T21:43:46.609960+0800 | INFO | Step 0: loss=0.6856, acc=0.479 (IF=0.333, MQ=0.625)
2025-12-21T21:43:59.335816+0800 | INFO |
============================================================
Validation Results (took 12.70s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.4483
Quality Acc: 0.6000
Average Acc: 0.5241
Total Loss: 0.6916
Instruction Loss: 0.7008
Quality Loss: 0.6824
============================================================
2025-12-21T21:44:02.045288+0800 | INFO | Best 1 checkpoints:
2025-12-21T21:44:02.045625+0800 | INFO | 1. Step 0: acc=0.5241 (reward_model.best_0.pt)
2025-12-21T21:44:04.377088+0800 | INFO | Step 0: Saved to /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.0.pt
2025-12-21T21:44:07.391702+0800 | INFO | Step 1: loss=0.7071, acc=0.500 (IF=0.500, MQ=0.500)
2025-12-21T21:44:09.503304+0800 | INFO | Step 2: loss=0.6900, acc=0.562 (IF=0.500, MQ=0.625)
2025-12-21T21:44:11.085308+0800 | INFO | Step 3: loss=0.6565, acc=0.719 (IF=0.750, MQ=0.688)
2025-12-21T21:44:13.342690+0800 | INFO | Step 4: loss=0.7126, acc=0.481 (IF=0.400, MQ=0.562)
2025-12-21T21:44:14.792108+0800 | INFO | Step 5: loss=0.6705, acc=0.662 (IF=0.700, MQ=0.625)
2025-12-21T21:44:16.259123+0800 | INFO | Step 6: loss=0.6749, acc=0.750 (IF=0.750, MQ=0.750)
2025-12-21T21:44:17.778168+0800 | INFO | Step 7: loss=0.6823, acc=0.552 (IF=0.667, MQ=0.438)
2025-12-21T21:44:19.281737+0800 | INFO | Step 8: loss=0.6878, acc=0.469 (IF=0.375, MQ=0.562)
2025-12-21T21:44:20.737703+0800 | INFO | Step 9: loss=0.6968, acc=0.482 (IF=0.714, MQ=0.250)
2025-12-21T21:44:22.225643+0800 | INFO | Step 10: loss=0.6795, acc=0.582 (IF=0.538, MQ=0.625)
2025-12-21T21:44:23.691584+0800 | INFO | Step 11: loss=0.7057, acc=0.550 (IF=0.538, MQ=0.562)
2025-12-21T21:44:25.288818+0800 | INFO | Step 12: loss=0.6852, acc=0.528 (IF=0.556, MQ=0.500)
2025-12-21T21:44:26.775742+0800 | INFO | Step 13: loss=0.7075, acc=0.494 (IF=0.300, MQ=0.688)
2025-12-21T21:44:28.245050+0800 | INFO | Step 14: loss=0.7153, acc=0.373 (IF=0.308, MQ=0.438)
2025-12-21T21:44:29.727332+0800 | INFO | Step 15: loss=0.6680, acc=0.614 (IF=0.727, MQ=0.500)
2025-12-21T21:44:31.204247+0800 | INFO | Step 16: loss=0.7432, acc=0.400 (IF=0.300, MQ=0.500)
2025-12-21T21:44:32.668165+0800 | INFO | Step 17: loss=0.6703, acc=0.490 (IF=0.417, MQ=0.562)
2025-12-21T21:44:34.152104+0800 | INFO | Step 18: loss=0.6692, acc=0.644 (IF=0.600, MQ=0.688)
2025-12-21T21:44:35.634101+0800 | INFO | Step 19: loss=0.6860, acc=0.571 (IF=0.455, MQ=0.688)
2025-12-21T21:44:37.102086+0800 | INFO | Step 20: loss=0.6888, acc=0.539 (IF=0.545, MQ=0.533)
2025-12-21T21:44:38.578393+0800 | INFO | Step 21: loss=0.6630, acc=0.631 (IF=0.636, MQ=0.625)
2025-12-21T21:44:40.030475+0800 | INFO | Step 22: loss=0.6662, acc=0.575 (IF=0.462, MQ=0.688)
2025-12-21T21:44:41.467971+0800 | INFO | Step 23: loss=0.6760, acc=0.656 (IF=0.778, MQ=0.533)
2025-12-21T21:44:42.957799+0800 | INFO | Step 24: loss=0.6765, acc=0.604 (IF=0.583, MQ=0.625)
2025-12-21T21:44:44.441300+0800 | INFO | Step 25: loss=0.6981, acc=0.528 (IF=0.556, MQ=0.500)
2025-12-21T21:44:45.917051+0800 | INFO | Step 26: loss=0.6688, acc=0.574 (IF=0.273, MQ=0.875)
2025-12-21T21:44:47.406045+0800 | INFO | Step 27: loss=0.7138, acc=0.306 (IF=0.111, MQ=0.500)
2025-12-21T21:44:48.913194+0800 | INFO | Step 28: loss=0.6935, acc=0.512 (IF=0.462, MQ=0.562)
2025-12-21T21:44:50.392461+0800 | INFO | Step 29: loss=0.6786, acc=0.536 (IF=0.385, MQ=0.688)
2025-12-21T21:44:51.884356+0800 | INFO | Step 30: loss=0.6513, acc=0.753 (IF=0.818, MQ=0.688)
2025-12-21T21:44:53.443844+0800 | INFO | Step 31: loss=0.6469, acc=0.635 (IF=0.769, MQ=0.500)
2025-12-21T21:44:54.985142+0800 | INFO | Step 32: loss=0.6734, acc=0.600 (IF=0.600, MQ=0.600)
2025-12-21T21:44:56.455087+0800 | INFO | Step 33: loss=0.6799, acc=0.458 (IF=0.167, MQ=0.750)
2025-12-21T21:44:57.990529+0800 | INFO | Step 34: loss=0.6797, acc=0.571 (IF=0.455, MQ=0.688)
2025-12-21T21:44:59.520155+0800 | INFO | Step 35: loss=0.6388, acc=0.812 (IF=0.750, MQ=0.875)
2025-12-21T21:45:01.000661+0800 | INFO | Step 36: loss=0.6994, acc=0.503 (IF=0.444, MQ=0.562)
2025-12-21T21:45:02.460783+0800 | INFO | Step 37: loss=0.6746, acc=0.666 (IF=0.769, MQ=0.562)
2025-12-21T21:45:03.927822+0800 | INFO | Step 38: loss=0.6634, acc=0.613 (IF=0.600, MQ=0.625)
2025-12-21T21:45:05.392229+0800 | INFO | Step 39: loss=0.6645, acc=0.652 (IF=0.429, MQ=0.875)
2025-12-21T21:45:06.950720+0800 | INFO | Step 40: loss=0.6992, acc=0.401 (IF=0.364, MQ=0.438)
2025-12-21T21:45:08.568760+0800 | INFO | Step 41: loss=0.6945, acc=0.442 (IF=0.385, MQ=0.500)
2025-12-21T21:45:10.059441+0800 | INFO | Step 42: loss=0.6675, acc=0.656 (IF=0.750, MQ=0.562)
2025-12-21T21:45:11.586130+0800 | INFO | Step 43: loss=0.6634, acc=0.531 (IF=0.375, MQ=0.688)
2025-12-21T21:45:13.216303+0800 | INFO | Step 44: loss=0.6409, acc=0.799 (IF=0.786, MQ=0.812)
2025-12-21T21:45:14.312382+0800 | INFO | Step 45: loss=0.6516, acc=0.638 (IF=0.714, MQ=0.562)
2025-12-21T21:45:15.880405+0800 | INFO | Step 46: loss=0.6406, acc=0.768 (IF=0.786, MQ=0.750)
2025-12-21T21:45:16.971577+0800 | INFO | Step 47: loss=0.6939, acc=0.432 (IF=0.364, MQ=0.500)
2025-12-21T21:45:18.635623+0800 | INFO | Step 48: loss=0.6645, acc=0.707 (IF=0.727, MQ=0.688)
2025-12-21T21:45:20.100504+0800 | INFO | Step 49: loss=0.6439, acc=0.659 (IF=0.692, MQ=0.625)
2025-12-21T21:45:21.596239+0800 | INFO | Step 50: loss=0.6470, acc=0.634 (IF=0.455, MQ=0.812)
2025-12-21T21:45:23.252915+0800 | INFO | Step 51: loss=0.6385, acc=0.631 (IF=0.700, MQ=0.562)
2025-12-21T21:45:24.721527+0800 | INFO | Step 52: loss=0.6854, acc=0.589 (IF=0.615, MQ=0.562)
2025-12-21T21:45:26.305632+0800 | INFO | Step 53: loss=0.6979, acc=0.481 (IF=0.400, MQ=0.562)
2025-12-21T21:45:27.845697+0800 | INFO | Step 54: loss=0.6860, acc=0.540 (IF=0.455, MQ=0.625)
2025-12-21T21:45:30.658365+0800 | INFO | Step 55: loss=0.6530, acc=0.688 (IF=0.500, MQ=0.875)
2025-12-21T21:45:32.124129+0800 | INFO | Step 56: loss=0.6809, acc=0.625 (IF=0.500, MQ=0.750)
2025-12-21T21:45:33.610620+0800 | INFO | Step 57: loss=0.6547, acc=0.733 (IF=0.778, MQ=0.688)
2025-12-21T21:45:35.084171+0800 | INFO | Step 58: loss=0.6386, acc=0.729 (IF=0.583, MQ=0.875)
2025-12-21T21:45:36.541233+0800 | INFO | Step 59: loss=0.6169, acc=0.714 (IF=0.571, MQ=0.857)
2025-12-21T21:45:38.029820+0800 | INFO | Step 60: loss=0.6530, acc=0.544 (IF=0.400, MQ=0.688)
2025-12-21T21:45:39.481504+0800 | INFO | Step 61: loss=0.6683, acc=0.554 (IF=0.545, MQ=0.562)
2025-12-21T21:45:40.989189+0800 | INFO | Step 62: loss=0.6622, acc=0.694 (IF=0.889, MQ=0.500)
2025-12-21T21:45:42.087132+0800 | INFO | Step 63: loss=0.6659, acc=0.531 (IF=0.500, MQ=0.562)
2025-12-21T21:45:43.542097+0800 | INFO | Step 64: loss=0.6538, acc=0.677 (IF=0.667, MQ=0.688)
2025-12-21T21:45:45.019856+0800 | INFO | Step 65: loss=0.6185, acc=0.724 (IF=0.636, MQ=0.812)
2025-12-21T21:45:46.507139+0800 | INFO | Step 66: loss=0.6261, acc=0.825 (IF=0.900, MQ=0.750)
2025-12-21T21:45:47.996554+0800 | INFO | Step 67: loss=0.6483, acc=0.725 (IF=0.700, MQ=0.750)
2025-12-21T21:45:49.483959+0800 | INFO | Step 68: loss=0.6565, acc=0.573 (IF=0.333, MQ=0.812)
2025-12-21T21:45:50.938016+0800 | INFO | Step 69: loss=0.6241, acc=0.792 (IF=0.833, MQ=0.750)
2025-12-21T21:45:52.448848+0800 | INFO | Step 70: loss=0.6708, acc=0.646 (IF=0.667, MQ=0.625)
2025-12-21T21:45:53.923804+0800 | INFO | Step 71: loss=0.6309, acc=0.675 (IF=0.600, MQ=0.750)
2025-12-21T21:45:55.394915+0800 | INFO | Step 72: loss=0.6137, acc=0.707 (IF=0.727, MQ=0.688)
2025-12-21T21:45:56.874640+0800 | INFO | Step 73: loss=0.6238, acc=0.763 (IF=0.714, MQ=0.812)
2025-12-21T21:45:58.387706+0800 | INFO | Step 74: loss=0.6896, acc=0.599 (IF=0.636, MQ=0.562)
2025-12-21T21:45:59.867422+0800 | INFO | Step 75: loss=0.6340, acc=0.742 (IF=0.818, MQ=0.667)
2025-12-21T21:46:01.324888+0800 | INFO | Step 76: loss=0.6440, acc=0.585 (IF=0.545, MQ=0.625)
2025-12-21T21:46:02.848784+0800 | INFO | Step 77: loss=0.6605, acc=0.581 (IF=0.600, MQ=0.562)
2025-12-21T21:46:03.898980+0800 | INFO | Step 78: loss=0.6685, acc=0.535 (IF=0.444, MQ=0.625)
2025-12-21T21:46:05.382278+0800 | INFO | Step 79: loss=0.7122, acc=0.477 (IF=0.455, MQ=0.500)
2025-12-21T21:46:06.854242+0800 | INFO | Step 80: loss=0.6524, acc=0.675 (IF=0.600, MQ=0.750)
2025-12-21T21:46:08.346575+0800 | INFO | Step 81: loss=0.6050, acc=0.829 (IF=0.846, MQ=0.812)
2025-12-21T21:46:09.815017+0800 | INFO | Step 82: loss=0.6189, acc=0.692 (IF=0.571, MQ=0.812)
2025-12-21T21:46:11.275705+0800 | INFO | Step 83: loss=0.6237, acc=0.833 (IF=0.917, MQ=0.750)
2025-12-21T21:46:12.737802+0800 | INFO | Step 84: loss=0.6460, acc=0.771 (IF=0.667, MQ=0.875)
2025-12-21T21:46:14.223942+0800 | INFO | Step 85: loss=0.6180, acc=0.688 (IF=0.625, MQ=0.750)
2025-12-21T21:46:15.705464+0800 | INFO | Step 86: loss=0.6311, acc=0.708 (IF=0.667, MQ=0.750)
2025-12-21T21:46:16.773904+0800 | INFO | Step 87: loss=0.6123, acc=0.708 (IF=0.667, MQ=0.750)
2025-12-21T21:46:18.245488+0800 | INFO | Step 88: loss=0.6149, acc=0.861 (IF=0.909, MQ=0.812)
2025-12-21T21:46:19.727404+0800 | INFO | Step 89: loss=0.6025, acc=0.775 (IF=0.800, MQ=0.750)
2025-12-21T21:46:21.193859+0800 | INFO | Step 90: loss=0.7144, acc=0.519 (IF=0.600, MQ=0.438)
2025-12-21T21:46:22.782381+0800 | INFO | Step 91: loss=0.5917, acc=0.835 (IF=0.857, MQ=0.812)
2025-12-21T21:46:24.254752+0800 | INFO | Step 92: loss=0.6107, acc=0.781 (IF=0.750, MQ=0.812)
2025-12-21T21:46:25.755822+0800 | INFO | Step 93: loss=0.6435, acc=0.604 (IF=0.583, MQ=0.625)
2025-12-21T21:46:27.246577+0800 | INFO | Step 94: loss=0.6437, acc=0.566 (IF=0.444, MQ=0.688)
2025-12-21T21:46:28.701574+0800 | INFO | Step 95: loss=0.5713, acc=0.812 (IF=0.750, MQ=0.875)
2025-12-21T21:46:30.143004+0800 | INFO | Step 96: loss=0.5779, acc=0.756 (IF=0.700, MQ=0.812)
2025-12-21T21:46:30.840306+0800 | INFO | Step 97: loss=0.5668, acc=0.819 (IF=0.889, MQ=0.750)
2025-12-21T21:46:32.325851+0800 | INFO | Step 98: loss=0.5755, acc=0.775 (IF=0.800, MQ=0.750)
2025-12-21T21:46:33.787027+0800 | INFO | Step 99: loss=0.6304, acc=0.771 (IF=0.917, MQ=0.625)
2025-12-21T21:46:35.272291+0800 | INFO | Step 100: loss=0.5989, acc=0.744 (IF=0.800, MQ=0.688)
2025-12-21T21:46:43.010467+0800 | INFO |
============================================================
Validation Results (took 7.71s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.4138
Quality Acc: 0.6500
Average Acc: 0.5319
Total Loss: 0.6889
Instruction Loss: 0.6961
Quality Loss: 0.6817
============================================================
2025-12-21T21:46:45.400224+0800 | INFO | Best 2 checkpoints:
2025-12-21T21:46:45.400735+0800 | INFO | 1. Step 100: acc=0.5319 (reward_model.best_100.pt)
2025-12-21T21:46:45.400805+0800 | INFO | 2. Step 0: acc=0.5241 (reward_model.best_0.pt)
2025-12-21T21:46:46.891768+0800 | INFO | Step 101: loss=0.6405, acc=0.619 (IF=0.800, MQ=0.438)
2025-12-21T21:46:48.372348+0800 | INFO | Step 102: loss=0.6238, acc=0.760 (IF=0.769, MQ=0.750)
2025-12-21T21:46:49.832051+0800 | INFO | Step 103: loss=0.5848, acc=0.753 (IF=0.818, MQ=0.688)
2025-12-21T21:46:51.278723+0800 | INFO | Step 104: loss=0.5940, acc=0.693 (IF=0.636, MQ=0.750)
2025-12-21T21:46:52.737797+0800 | INFO | Step 105: loss=0.6180, acc=0.662 (IF=0.636, MQ=0.688)
2025-12-21T21:46:54.181301+0800 | INFO | Step 106: loss=0.6318, acc=0.599 (IF=0.636, MQ=0.562)
2025-12-21T21:46:55.650478+0800 | INFO | Step 107: loss=0.5766, acc=0.736 (IF=0.846, MQ=0.625)
2025-12-21T21:46:57.106416+0800 | INFO | Step 108: loss=0.5177, acc=0.838 (IF=0.800, MQ=0.875)
2025-12-21T21:46:58.565152+0800 | INFO | Step 109: loss=0.6653, acc=0.620 (IF=0.615, MQ=0.625)
2025-12-21T21:47:00.010498+0800 | INFO | Step 110: loss=0.6805, acc=0.527 (IF=0.429, MQ=0.625)
2025-12-21T21:47:02.884198+0800 | INFO | Step 111: loss=0.6241, acc=0.739 (IF=0.727, MQ=0.750)
2025-12-21T21:47:04.359824+0800 | INFO | Step 112: loss=0.5924, acc=0.707 (IF=0.727, MQ=0.688)
2025-12-21T21:47:05.808888+0800 | INFO | Step 113: loss=0.5519, acc=0.833 (IF=0.917, MQ=0.750)
2025-12-21T21:47:07.270434+0800 | INFO | Step 114: loss=0.5400, acc=0.871 (IF=0.867, MQ=0.875)
2025-12-21T21:47:08.752733+0800 | INFO | Step 115: loss=0.6030, acc=0.662 (IF=0.636, MQ=0.688)
2025-12-21T21:47:10.190555+0800 | INFO | Step 116: loss=0.6605, acc=0.662 (IF=0.700, MQ=0.625)
2025-12-21T21:47:11.673740+0800 | INFO | Step 117: loss=0.6219, acc=0.659 (IF=0.692, MQ=0.625)
2025-12-21T21:47:13.124060+0800 | INFO | Step 118: loss=0.5496, acc=0.808 (IF=0.750, MQ=0.867)
2025-12-21T21:47:14.601486+0800 | INFO | Step 119: loss=0.6460, acc=0.688 (IF=0.750, MQ=0.625)
2025-12-21T21:47:16.064272+0800 | INFO | Step 120: loss=0.5852, acc=0.776 (IF=0.818, MQ=0.733)
2025-12-21T21:47:17.555601+0800 | INFO | Step 121: loss=0.5760, acc=0.725 (IF=0.700, MQ=0.750)
2025-12-21T21:47:19.036044+0800 | INFO | Step 122: loss=0.5613, acc=0.724 (IF=0.636, MQ=0.812)
2025-12-21T21:47:20.493934+0800 | INFO | Step 123: loss=0.5078, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T21:47:21.946485+0800 | INFO | Step 124: loss=0.6357, acc=0.542 (IF=0.333, MQ=0.750)
2025-12-21T21:47:23.417747+0800 | INFO | Step 125: loss=0.5350, acc=0.823 (IF=0.833, MQ=0.812)
2025-12-21T21:47:24.853005+0800 | INFO | Step 126: loss=0.5765, acc=0.719 (IF=0.750, MQ=0.688)
2025-12-21T21:47:26.319347+0800 | INFO | Step 127: loss=0.5771, acc=0.795 (IF=0.778, MQ=0.812)
2025-12-21T21:47:27.420780+0800 | INFO | Step 128: loss=0.4979, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:47:28.868291+0800 | INFO | Step 129: loss=0.5023, acc=0.791 (IF=0.769, MQ=0.812)
2025-12-21T21:47:30.305772+0800 | INFO | Step 130: loss=0.5179, acc=0.906 (IF=0.875, MQ=0.938)
2025-12-21T21:47:31.773319+0800 | INFO | Step 131: loss=0.5442, acc=0.753 (IF=0.818, MQ=0.688)
2025-12-21T21:47:32.821935+0800 | INFO | Step 132: loss=0.5985, acc=0.772 (IF=0.857, MQ=0.688)
2025-12-21T21:47:34.282972+0800 | INFO | Step 133: loss=0.5504, acc=0.851 (IF=0.889, MQ=0.812)
2025-12-21T21:47:35.735332+0800 | INFO | Step 134: loss=0.5075, acc=0.781 (IF=0.750, MQ=0.812)
2025-12-21T21:47:37.228414+0800 | INFO | Step 135: loss=0.6055, acc=0.667 (IF=0.667, MQ=0.667)
2025-12-21T21:47:38.700120+0800 | INFO | Step 136: loss=0.6250, acc=0.651 (IF=0.615, MQ=0.688)
2025-12-21T21:47:40.177604+0800 | INFO | Step 137: loss=0.5086, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T21:47:41.624259+0800 | INFO | Step 138: loss=0.4895, acc=0.844 (IF=1.000, MQ=0.688)
2025-12-21T21:47:43.079648+0800 | INFO | Step 139: loss=0.5363, acc=0.767 (IF=0.909, MQ=0.625)
2025-12-21T21:47:44.605911+0800 | INFO | Step 140: loss=0.6206, acc=0.737 (IF=0.786, MQ=0.688)
2025-12-21T21:47:46.059899+0800 | INFO | Step 141: loss=0.5442, acc=0.760 (IF=0.833, MQ=0.688)
2025-12-21T21:47:47.520808+0800 | INFO | Step 142: loss=0.5735, acc=0.760 (IF=0.833, MQ=0.688)
2025-12-21T21:47:48.993918+0800 | INFO | Step 143: loss=0.5731, acc=0.769 (IF=0.600, MQ=0.938)
2025-12-21T21:47:50.446772+0800 | INFO | Step 144: loss=0.5780, acc=0.764 (IF=0.778, MQ=0.750)
2025-12-21T21:47:51.934228+0800 | INFO | Step 145: loss=0.5072, acc=0.752 (IF=0.692, MQ=0.812)
2025-12-21T21:47:53.388409+0800 | INFO | Step 146: loss=0.4479, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T21:47:54.830122+0800 | INFO | Step 147: loss=0.5932, acc=0.750 (IF=0.750, MQ=0.750)
2025-12-21T21:47:56.268078+0800 | INFO | Step 148: loss=0.4564, acc=0.861 (IF=0.909, MQ=0.812)
2025-12-21T21:47:57.727024+0800 | INFO | Step 149: loss=0.4376, acc=0.866 (IF=0.857, MQ=0.875)
2025-12-21T21:47:59.175226+0800 | INFO | Step 150: loss=0.5014, acc=0.775 (IF=0.800, MQ=0.750)
2025-12-21T21:48:00.611208+0800 | INFO | Step 151: loss=0.5473, acc=0.731 (IF=0.900, MQ=0.562)
2025-12-21T21:48:02.062113+0800 | INFO | Step 152: loss=0.6113, acc=0.732 (IF=0.714, MQ=0.750)
2025-12-21T21:48:03.504157+0800 | INFO | Step 153: loss=0.4291, acc=0.861 (IF=0.909, MQ=0.812)
2025-12-21T21:48:04.951357+0800 | INFO | Step 154: loss=0.5752, acc=0.631 (IF=0.636, MQ=0.625)
2025-12-21T21:48:06.402033+0800 | INFO | Step 155: loss=0.5415, acc=0.775 (IF=0.800, MQ=0.750)
2025-12-21T21:48:07.458812+0800 | INFO | Step 156: loss=0.5142, acc=0.838 (IF=0.800, MQ=0.875)
2025-12-21T21:48:08.903937+0800 | INFO | Step 157: loss=0.5468, acc=0.771 (IF=0.667, MQ=0.875)
2025-12-21T21:48:10.370716+0800 | INFO | Step 158: loss=0.5328, acc=0.770 (IF=0.727, MQ=0.812)
2025-12-21T21:48:11.817398+0800 | INFO | Step 159: loss=0.5802, acc=0.728 (IF=0.769, MQ=0.688)
2025-12-21T21:48:13.278884+0800 | INFO | Step 160: loss=0.5036, acc=0.835 (IF=0.857, MQ=0.812)
2025-12-21T21:48:14.730013+0800 | INFO | Step 161: loss=0.5355, acc=0.688 (IF=0.750, MQ=0.625)
2025-12-21T21:48:15.761756+0800 | INFO | Step 162: loss=0.5347, acc=0.760 (IF=0.833, MQ=0.688)
2025-12-21T21:48:17.198494+0800 | INFO | Step 163: loss=0.4242, acc=0.875 (IF=0.875, MQ=0.875)
2025-12-21T21:48:18.647236+0800 | INFO | Step 164: loss=0.5410, acc=0.815 (IF=0.818, MQ=0.812)
2025-12-21T21:48:20.099886+0800 | INFO | Step 165: loss=0.5340, acc=0.688 (IF=0.750, MQ=0.625)
2025-12-21T21:48:23.214208+0800 | INFO | Step 166: loss=0.4249, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T21:48:24.677371+0800 | INFO | Step 167: loss=0.3910, acc=0.866 (IF=0.857, MQ=0.875)
2025-12-21T21:48:26.196217+0800 | INFO | Step 168: loss=0.4801, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T21:48:27.642418+0800 | INFO | Step 169: loss=0.5817, acc=0.726 (IF=0.889, MQ=0.562)
2025-12-21T21:48:29.113504+0800 | INFO | Step 170: loss=0.3772, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:48:30.581355+0800 | INFO | Step 171: loss=0.4903, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T21:48:32.052527+0800 | INFO | Step 172: loss=0.6416, acc=0.713 (IF=0.800, MQ=0.625)
2025-12-21T21:48:33.528137+0800 | INFO | Step 173: loss=0.5245, acc=0.708 (IF=0.667, MQ=0.750)
2025-12-21T21:48:34.997172+0800 | INFO | Step 174: loss=0.4038, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T21:48:36.477881+0800 | INFO | Step 175: loss=0.4516, acc=0.861 (IF=0.909, MQ=0.812)
2025-12-21T21:48:37.949654+0800 | INFO | Step 176: loss=0.4404, acc=0.781 (IF=0.750, MQ=0.812)
2025-12-21T21:48:39.430084+0800 | INFO | Step 177: loss=0.6558, acc=0.677 (IF=0.667, MQ=0.688)
2025-12-21T21:48:40.894831+0800 | INFO | Step 178: loss=0.5918, acc=0.717 (IF=0.700, MQ=0.733)
2025-12-21T21:48:42.354970+0800 | INFO | Step 179: loss=0.3637, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:48:43.796650+0800 | INFO | Step 180: loss=0.3800, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:48:45.246563+0800 | INFO | Step 181: loss=0.5277, acc=0.719 (IF=0.750, MQ=0.688)
2025-12-21T21:48:46.677013+0800 | INFO | Step 182: loss=0.4383, acc=0.812 (IF=1.000, MQ=0.625)
2025-12-21T21:48:48.121649+0800 | INFO | Step 183: loss=0.5939, acc=0.627 (IF=0.692, MQ=0.562)
2025-12-21T21:48:49.594872+0800 | INFO | Step 184: loss=0.4479, acc=0.812 (IF=0.750, MQ=0.875)
2025-12-21T21:48:51.054495+0800 | INFO | Step 185: loss=0.6328, acc=0.851 (IF=0.889, MQ=0.812)
2025-12-21T21:48:52.526666+0800 | INFO | Step 186: loss=0.4281, acc=0.844 (IF=1.000, MQ=0.688)
2025-12-21T21:48:53.987719+0800 | INFO | Step 187: loss=0.4306, acc=0.788 (IF=0.889, MQ=0.688)
2025-12-21T21:48:55.451725+0800 | INFO | Step 188: loss=0.5039, acc=0.806 (IF=0.800, MQ=0.812)
2025-12-21T21:48:56.905357+0800 | INFO | Step 189: loss=0.4007, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T21:48:58.362479+0800 | INFO | Step 190: loss=0.4336, acc=0.858 (IF=0.917, MQ=0.800)
2025-12-21T21:48:59.817873+0800 | INFO | Step 191: loss=0.3570, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T21:49:01.274078+0800 | INFO | Step 192: loss=0.4250, acc=0.868 (IF=0.923, MQ=0.812)
2025-12-21T21:49:02.731121+0800 | INFO | Step 193: loss=0.5240, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T21:49:04.188607+0800 | INFO | Step 194: loss=0.4250, acc=0.806 (IF=0.800, MQ=0.812)
2025-12-21T21:49:05.638462+0800 | INFO | Step 195: loss=0.5335, acc=0.750 (IF=0.750, MQ=0.750)
2025-12-21T21:49:07.097800+0800 | INFO | Step 196: loss=0.4114, acc=0.838 (IF=0.800, MQ=0.875)
2025-12-21T21:49:08.558939+0800 | INFO | Step 197: loss=0.5387, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T21:49:10.041964+0800 | INFO | Step 198: loss=0.4652, acc=0.732 (IF=0.714, MQ=0.750)
2025-12-21T21:49:11.514371+0800 | INFO | Step 199: loss=0.7640, acc=0.519 (IF=0.538, MQ=0.500)
2025-12-21T21:49:12.986477+0800 | INFO | Step 200: loss=0.5817, acc=0.756 (IF=0.700, MQ=0.812)
2025-12-21T21:49:20.385984+0800 | INFO |
============================================================
Validation Results (took 7.37s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.4310
Quality Acc: 0.6875
Average Acc: 0.5593
Total Loss: 0.6873
Instruction Loss: 0.6946
Quality Loss: 0.6800
============================================================
2025-12-21T21:49:22.946117+0800 | INFO | Best 3 checkpoints:
2025-12-21T21:49:22.946530+0800 | INFO | 1. Step 200: acc=0.5593 (reward_model.best_200.pt)
2025-12-21T21:49:22.946600+0800 | INFO | 2. Step 100: acc=0.5319 (reward_model.best_100.pt)
2025-12-21T21:49:22.946654+0800 | INFO | 3. Step 0: acc=0.5241 (reward_model.best_0.pt)
2025-12-21T21:49:24.451134+0800 | INFO | Step 201: loss=0.4273, acc=0.862 (IF=0.786, MQ=0.938)
2025-12-21T21:49:25.894079+0800 | INFO | Step 202: loss=0.3927, acc=0.865 (IF=0.917, MQ=0.812)
2025-12-21T21:49:27.321637+0800 | INFO | Step 203: loss=0.4405, acc=0.802 (IF=0.667, MQ=0.938)
2025-12-21T21:49:28.770924+0800 | INFO | Step 204: loss=0.5154, acc=0.701 (IF=0.714, MQ=0.688)
2025-12-21T21:49:30.232553+0800 | INFO | Step 205: loss=0.4881, acc=0.838 (IF=0.800, MQ=0.875)
2025-12-21T21:49:31.710661+0800 | INFO | Step 206: loss=0.5722, acc=0.666 (IF=0.769, MQ=0.562)
2025-12-21T21:49:33.161429+0800 | INFO | Step 207: loss=0.3652, acc=0.933 (IF=1.000, MQ=0.867)
2025-12-21T21:49:34.610359+0800 | INFO | Step 208: loss=0.6468, acc=0.662 (IF=0.700, MQ=0.625)
2025-12-21T21:49:36.109631+0800 | INFO | Step 209: loss=0.6136, acc=0.644 (IF=0.600, MQ=0.688)
2025-12-21T21:49:37.623265+0800 | INFO | Step 210: loss=0.6500, acc=0.544 (IF=0.400, MQ=0.688)
2025-12-21T21:49:39.091011+0800 | INFO | Step 211: loss=0.5799, acc=0.788 (IF=0.889, MQ=0.688)
2025-12-21T21:49:40.564435+0800 | INFO | Step 212: loss=0.4256, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T21:49:42.027093+0800 | INFO | Step 213: loss=0.4250, acc=0.826 (IF=0.714, MQ=0.938)
2025-12-21T21:49:43.481213+0800 | INFO | Step 214: loss=0.4937, acc=0.753 (IF=0.818, MQ=0.688)
2025-12-21T21:49:44.928178+0800 | INFO | Step 215: loss=0.5024, acc=0.662 (IF=0.700, MQ=0.625)
2025-12-21T21:49:46.381547+0800 | INFO | Step 216: loss=0.3995, acc=0.823 (IF=0.833, MQ=0.812)
2025-12-21T21:49:47.833342+0800 | INFO | Step 217: loss=0.3990, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T21:49:49.282499+0800 | INFO | Step 218: loss=0.3695, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T21:49:50.727848+0800 | INFO | Step 219: loss=0.4177, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T21:49:52.182819+0800 | INFO | Step 220: loss=0.4477, acc=0.756 (IF=0.636, MQ=0.875)
2025-12-21T21:49:53.623291+0800 | INFO | Step 221: loss=0.3676, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:49:56.498254+0800 | INFO | Step 222: loss=0.4336, acc=0.868 (IF=0.923, MQ=0.812)
2025-12-21T21:49:57.970480+0800 | INFO | Step 223: loss=0.4353, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T21:49:59.467212+0800 | INFO | Step 224: loss=0.4753, acc=0.729 (IF=0.833, MQ=0.625)
2025-12-21T21:50:00.923521+0800 | INFO | Step 225: loss=0.4124, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T21:50:02.400836+0800 | INFO | Step 226: loss=0.3900, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T21:50:03.884611+0800 | INFO | Step 227: loss=0.5296, acc=0.713 (IF=0.800, MQ=0.625)
2025-12-21T21:50:05.350218+0800 | INFO | Step 228: loss=0.3508, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:50:06.786447+0800 | INFO | Step 229: loss=0.4232, acc=0.791 (IF=0.769, MQ=0.812)
2025-12-21T21:50:08.285335+0800 | INFO | Step 230: loss=0.4504, acc=0.856 (IF=0.900, MQ=0.812)
2025-12-21T21:50:09.725485+0800 | INFO | Step 231: loss=0.4522, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T21:50:11.177007+0800 | INFO | Step 232: loss=0.5326, acc=0.708 (IF=0.667, MQ=0.750)
2025-12-21T21:50:12.627184+0800 | INFO | Step 233: loss=0.3758, acc=0.806 (IF=0.800, MQ=0.812)
2025-12-21T21:50:14.075915+0800 | INFO | Step 234: loss=0.4258, acc=0.805 (IF=0.923, MQ=0.688)
2025-12-21T21:50:15.534968+0800 | INFO | Step 235: loss=0.6260, acc=0.677 (IF=0.667, MQ=0.688)
2025-12-21T21:50:16.995574+0800 | INFO | Step 236: loss=0.4408, acc=0.909 (IF=0.818, MQ=1.000)
2025-12-21T21:50:18.450796+0800 | INFO | Step 237: loss=0.6442, acc=0.673 (IF=0.545, MQ=0.800)
2025-12-21T21:50:19.900814+0800 | INFO | Step 238: loss=0.5241, acc=0.792 (IF=0.833, MQ=0.750)
2025-12-21T21:50:21.394757+0800 | INFO | Step 239: loss=0.4515, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T21:50:22.462006+0800 | INFO | Step 240: loss=0.3702, acc=0.826 (IF=0.778, MQ=0.875)
2025-12-21T21:50:23.908394+0800 | INFO | Step 241: loss=0.3951, acc=0.856 (IF=0.900, MQ=0.812)
2025-12-21T21:50:25.348448+0800 | INFO | Step 242: loss=0.3037, acc=0.830 (IF=0.786, MQ=0.875)
2025-12-21T21:50:26.812563+0800 | INFO | Step 243: loss=0.4792, acc=0.733 (IF=0.778, MQ=0.688)
2025-12-21T21:50:28.303161+0800 | INFO | Step 244: loss=0.4467, acc=0.798 (IF=0.909, MQ=0.688)
2025-12-21T21:50:29.782187+0800 | INFO | Step 245: loss=0.4109, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T21:50:30.433724+0800 | INFO | Step 246: loss=0.4044, acc=0.802 (IF=0.889, MQ=0.714)
2025-12-21T21:50:31.504024+0800 | INFO | Step 247: loss=0.4339, acc=0.875 (IF=0.875, MQ=0.875)
2025-12-21T21:50:32.951713+0800 | INFO | Step 248: loss=0.5420, acc=0.770 (IF=0.727, MQ=0.812)
2025-12-21T21:50:34.409191+0800 | INFO | Step 249: loss=0.3413, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T21:50:35.878527+0800 | INFO | Step 250: loss=0.4159, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:50:37.380736+0800 | INFO | Step 251: loss=0.3711, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T21:50:38.841249+0800 | INFO | Step 252: loss=0.4622, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T21:50:40.322467+0800 | INFO | Step 253: loss=0.4072, acc=0.826 (IF=0.714, MQ=0.938)
2025-12-21T21:50:41.758514+0800 | INFO | Step 254: loss=0.4130, acc=0.804 (IF=0.857, MQ=0.750)
2025-12-21T21:50:43.208704+0800 | INFO | Step 255: loss=0.4843, acc=0.684 (IF=0.556, MQ=0.812)
2025-12-21T21:50:44.654852+0800 | INFO | Step 256: loss=0.5675, acc=0.719 (IF=0.625, MQ=0.812)
2025-12-21T21:50:46.097716+0800 | INFO | Step 257: loss=0.4128, acc=0.784 (IF=0.818, MQ=0.750)
2025-12-21T21:50:47.561516+0800 | INFO | Step 258: loss=0.4511, acc=0.830 (IF=0.786, MQ=0.875)
2025-12-21T21:50:49.032397+0800 | INFO | Step 259: loss=0.3089, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T21:50:50.482340+0800 | INFO | Step 260: loss=0.3216, acc=0.865 (IF=0.917, MQ=0.812)
2025-12-21T21:50:51.963039+0800 | INFO | Step 261: loss=0.2836, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T21:50:53.411030+0800 | INFO | Step 262: loss=0.4405, acc=0.819 (IF=0.889, MQ=0.750)
2025-12-21T21:50:54.848322+0800 | INFO | Step 263: loss=0.3900, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T21:50:56.291808+0800 | INFO | Step 264: loss=0.2892, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:50:57.742594+0800 | INFO | Step 265: loss=0.3319, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T21:50:59.212723+0800 | INFO | Step 266: loss=0.2991, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T21:51:00.672877+0800 | INFO | Step 267: loss=0.2841, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:51:02.119792+0800 | INFO | Step 268: loss=0.4925, acc=0.799 (IF=0.786, MQ=0.812)
2025-12-21T21:51:03.580325+0800 | INFO | Step 269: loss=0.5107, acc=0.688 (IF=0.625, MQ=0.750)
2025-12-21T21:51:05.025551+0800 | INFO | Step 270: loss=0.3527, acc=0.865 (IF=0.917, MQ=0.812)
2025-12-21T21:51:06.477767+0800 | INFO | Step 271: loss=0.3614, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T21:51:07.925865+0800 | INFO | Step 272: loss=0.4371, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T21:51:09.399757+0800 | INFO | Step 273: loss=0.5703, acc=0.739 (IF=0.727, MQ=0.750)
2025-12-21T21:51:10.866302+0800 | INFO | Step 274: loss=0.2743, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:51:12.321575+0800 | INFO | Step 275: loss=0.5001, acc=0.760 (IF=0.769, MQ=0.750)
2025-12-21T21:51:13.776902+0800 | INFO | Step 276: loss=0.3300, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T21:51:16.535630+0800 | INFO | Step 277: loss=0.3202, acc=0.875 (IF=1.000, MQ=0.750)
2025-12-21T21:51:18.053874+0800 | INFO | Step 278: loss=0.2819, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:51:19.543073+0800 | INFO | Step 279: loss=0.3603, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T21:51:20.989578+0800 | INFO | Step 280: loss=0.3330, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T21:51:22.451387+0800 | INFO | Step 281: loss=0.5532, acc=0.719 (IF=0.750, MQ=0.688)
2025-12-21T21:51:23.887998+0800 | INFO | Step 282: loss=0.3004, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T21:51:25.358401+0800 | INFO | Step 283: loss=0.4181, acc=0.771 (IF=0.667, MQ=0.875)
2025-12-21T21:51:26.374233+0800 | INFO | Step 284: loss=0.4484, acc=0.781 (IF=0.750, MQ=0.812)
2025-12-21T21:51:27.847657+0800 | INFO | Step 285: loss=0.3266, acc=0.815 (IF=0.692, MQ=0.938)
2025-12-21T21:51:29.313671+0800 | INFO | Step 286: loss=0.4069, acc=0.750 (IF=0.750, MQ=0.750)
2025-12-21T21:51:30.779232+0800 | INFO | Step 287: loss=0.4225, acc=0.822 (IF=0.769, MQ=0.875)
2025-12-21T21:51:32.223716+0800 | INFO | Step 288: loss=0.3986, acc=0.815 (IF=0.818, MQ=0.812)
2025-12-21T21:51:33.697699+0800 | INFO | Step 289: loss=0.4497, acc=0.787 (IF=0.700, MQ=0.875)
2025-12-21T21:51:35.136133+0800 | INFO | Step 290: loss=0.3591, acc=0.815 (IF=0.818, MQ=0.812)
2025-12-21T21:51:36.586834+0800 | INFO | Step 291: loss=0.5017, acc=0.784 (IF=0.818, MQ=0.750)
2025-12-21T21:51:38.029938+0800 | INFO | Step 292: loss=0.3662, acc=0.812 (IF=0.750, MQ=0.875)
2025-12-21T21:51:39.464045+0800 | INFO | Step 293: loss=0.5417, acc=0.701 (IF=0.714, MQ=0.688)
2025-12-21T21:51:40.943187+0800 | INFO | Step 294: loss=0.3496, acc=0.833 (IF=0.917, MQ=0.750)
2025-12-21T21:51:42.433369+0800 | INFO | Step 295: loss=0.3274, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T21:51:43.886086+0800 | INFO | Step 296: loss=0.3851, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:51:45.339512+0800 | INFO | Step 297: loss=0.2428, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:51:46.786808+0800 | INFO | Step 298: loss=0.3672, acc=0.875 (IF=1.000, MQ=0.750)
2025-12-21T21:51:48.270766+0800 | INFO | Step 299: loss=0.4615, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T21:51:49.723450+0800 | INFO | Step 300: loss=0.4511, acc=0.838 (IF=0.875, MQ=0.800)
2025-12-21T21:51:57.341747+0800 | INFO |
============================================================
Validation Results (took 7.59s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.4483
Quality Acc: 0.6500
Average Acc: 0.5491
Total Loss: 0.6860
Instruction Loss: 0.6922
Quality Loss: 0.6797
============================================================
2025-12-21T21:51:59.937216+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_0.pt
2025-12-21T21:51:59.937687+0800 | INFO | Best 3 checkpoints:
2025-12-21T21:51:59.937777+0800 | INFO | 1. Step 200: acc=0.5593 (reward_model.best_200.pt)
2025-12-21T21:51:59.937827+0800 | INFO | 2. Step 300: acc=0.5491 (reward_model.best_300.pt)
2025-12-21T21:51:59.937872+0800 | INFO | 3. Step 100: acc=0.5319 (reward_model.best_100.pt)
2025-12-21T21:52:01.415061+0800 | INFO | Step 301: loss=0.3013, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:52:02.879655+0800 | INFO | Step 302: loss=0.2331, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:52:04.357425+0800 | INFO | Step 303: loss=0.3252, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:52:05.811743+0800 | INFO | Step 304: loss=0.7245, acc=0.616 (IF=0.545, MQ=0.688)
2025-12-21T21:52:07.323922+0800 | INFO | Step 305: loss=0.3611, acc=0.861 (IF=0.846, MQ=0.875)
2025-12-21T21:52:08.805740+0800 | INFO | Step 306: loss=0.5027, acc=0.753 (IF=0.818, MQ=0.688)
2025-12-21T21:52:10.275739+0800 | INFO | Step 307: loss=0.4292, acc=0.806 (IF=0.800, MQ=0.812)
2025-12-21T21:52:11.736208+0800 | INFO | Step 308: loss=0.4288, acc=0.830 (IF=0.909, MQ=0.750)
2025-12-21T21:52:13.178205+0800 | INFO | Step 309: loss=0.3322, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T21:52:14.652262+0800 | INFO | Step 310: loss=0.5287, acc=0.679 (IF=0.545, MQ=0.812)
2025-12-21T21:52:16.191857+0800 | INFO | Step 311: loss=0.4994, acc=0.739 (IF=0.727, MQ=0.750)
2025-12-21T21:52:17.728627+0800 | INFO | Step 312: loss=0.4541, acc=0.825 (IF=0.900, MQ=0.750)
2025-12-21T21:52:19.274650+0800 | INFO | Step 313: loss=0.4435, acc=0.787 (IF=0.700, MQ=0.875)
2025-12-21T21:52:20.729866+0800 | INFO | Step 314: loss=0.2848, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T21:52:22.192181+0800 | INFO | Step 315: loss=0.3137, acc=0.871 (IF=0.867, MQ=0.875)
2025-12-21T21:52:23.633264+0800 | INFO | Step 316: loss=0.3794, acc=0.801 (IF=0.727, MQ=0.875)
2025-12-21T21:52:25.065585+0800 | INFO | Step 317: loss=0.3155, acc=0.844 (IF=0.875, MQ=0.812)
2025-12-21T21:52:26.499835+0800 | INFO | Step 318: loss=0.3973, acc=0.806 (IF=0.800, MQ=0.812)
2025-12-21T21:52:27.945077+0800 | INFO | Step 319: loss=0.4710, acc=0.760 (IF=0.833, MQ=0.688)
2025-12-21T21:52:29.392918+0800 | INFO | Step 320: loss=0.2611, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:52:30.841721+0800 | INFO | Step 321: loss=0.3763, acc=0.861 (IF=0.846, MQ=0.875)
2025-12-21T21:52:32.283496+0800 | INFO | Step 322: loss=0.2531, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T21:52:33.794785+0800 | INFO | Step 323: loss=0.2171, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:52:35.239739+0800 | INFO | Step 324: loss=0.3147, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T21:52:36.690316+0800 | INFO | Step 325: loss=0.2835, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T21:52:38.128102+0800 | INFO | Step 326: loss=0.2166, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:52:39.561926+0800 | INFO | Step 327: loss=0.4596, acc=0.685 (IF=0.636, MQ=0.733)
2025-12-21T21:52:40.986093+0800 | INFO | Step 328: loss=0.3882, acc=0.823 (IF=0.833, MQ=0.812)
2025-12-21T21:52:42.419378+0800 | INFO | Step 329: loss=0.5846, acc=0.763 (IF=0.714, MQ=0.812)
2025-12-21T21:52:43.841926+0800 | INFO | Step 330: loss=0.5136, acc=0.825 (IF=0.900, MQ=0.750)
2025-12-21T21:52:45.275858+0800 | INFO | Step 331: loss=0.3430, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T21:52:46.709561+0800 | INFO | Step 332: loss=0.3637, acc=0.781 (IF=0.750, MQ=0.812)
2025-12-21T21:52:49.273088+0800 | INFO | Step 333: loss=0.3642, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T21:52:50.766481+0800 | INFO | Step 334: loss=0.3003, acc=0.829 (IF=0.846, MQ=0.812)
2025-12-21T21:52:52.226742+0800 | INFO | Step 335: loss=0.2313, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:52:53.290966+0800 | INFO | Step 336: loss=0.2643, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:52:54.769135+0800 | INFO | Step 337: loss=0.3171, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T21:52:56.211370+0800 | INFO | Step 338: loss=0.3697, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T21:52:57.682786+0800 | INFO | Step 339: loss=0.3764, acc=0.798 (IF=0.846, MQ=0.750)
2025-12-21T21:52:59.146140+0800 | INFO | Step 340: loss=0.3599, acc=0.871 (IF=0.929, MQ=0.812)
2025-12-21T21:53:00.635198+0800 | INFO | Step 341: loss=0.3575, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T21:53:02.077533+0800 | INFO | Step 342: loss=0.4132, acc=0.842 (IF=0.818, MQ=0.867)
2025-12-21T21:53:03.558373+0800 | INFO | Step 343: loss=0.4106, acc=0.771 (IF=0.667, MQ=0.875)
2025-12-21T21:53:05.037438+0800 | INFO | Step 344: loss=0.4942, acc=0.775 (IF=0.800, MQ=0.750)
2025-12-21T21:53:06.497546+0800 | INFO | Step 345: loss=0.3797, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T21:53:07.933295+0800 | INFO | Step 346: loss=0.2921, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T21:53:09.378413+0800 | INFO | Step 347: loss=0.2708, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:53:10.812394+0800 | INFO | Step 348: loss=0.2185, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:53:12.259326+0800 | INFO | Step 349: loss=0.3280, acc=0.830 (IF=0.909, MQ=0.750)
2025-12-21T21:53:13.729375+0800 | INFO | Step 350: loss=0.3041, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:53:15.209674+0800 | INFO | Step 351: loss=0.3289, acc=0.929 (IF=0.857, MQ=1.000)
2025-12-21T21:53:16.678545+0800 | INFO | Step 352: loss=0.2379, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T21:53:18.127828+0800 | INFO | Step 353: loss=0.1619, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:53:19.588282+0800 | INFO | Step 354: loss=0.5492, acc=0.784 (IF=0.818, MQ=0.750)
2025-12-21T21:53:21.046172+0800 | INFO | Step 355: loss=0.2356, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T21:53:22.500663+0800 | INFO | Step 356: loss=0.3211, acc=0.835 (IF=0.857, MQ=0.812)
2025-12-21T21:53:23.988381+0800 | INFO | Step 357: loss=0.2564, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:53:25.021410+0800 | INFO | Step 358: loss=0.2253, acc=0.904 (IF=0.875, MQ=0.933)
2025-12-21T21:53:26.453193+0800 | INFO | Step 359: loss=0.2899, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:53:27.900430+0800 | INFO | Step 360: loss=0.1824, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:53:29.346335+0800 | INFO | Step 361: loss=0.4360, acc=0.795 (IF=0.778, MQ=0.812)
2025-12-21T21:53:30.406677+0800 | INFO | Step 362: loss=0.5158, acc=0.762 (IF=0.857, MQ=0.667)
2025-12-21T21:53:31.916255+0800 | INFO | Step 363: loss=0.2919, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T21:53:33.355551+0800 | INFO | Step 364: loss=0.3196, acc=0.923 (IF=0.846, MQ=1.000)
2025-12-21T21:53:34.794689+0800 | INFO | Step 365: loss=0.3640, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T21:53:36.237338+0800 | INFO | Step 366: loss=0.2430, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T21:53:37.317764+0800 | INFO | Step 367: loss=0.2705, acc=0.808 (IF=0.867, MQ=0.750)
2025-12-21T21:53:38.764005+0800 | INFO | Step 368: loss=0.3322, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T21:53:40.227812+0800 | INFO | Step 369: loss=0.2869, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T21:53:41.729453+0800 | INFO | Step 370: loss=0.4876, acc=0.851 (IF=0.889, MQ=0.812)
2025-12-21T21:53:43.170519+0800 | INFO | Step 371: loss=0.2562, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T21:53:44.612210+0800 | INFO | Step 372: loss=0.3207, acc=0.815 (IF=0.818, MQ=0.812)
2025-12-21T21:53:46.060456+0800 | INFO | Step 373: loss=0.4618, acc=0.806 (IF=0.800, MQ=0.812)
2025-12-21T21:53:47.497233+0800 | INFO | Step 374: loss=0.2891, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T21:53:48.949123+0800 | INFO | Step 375: loss=0.3120, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T21:53:50.398410+0800 | INFO | Step 376: loss=0.1758, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:53:51.907241+0800 | INFO | Step 377: loss=0.4612, acc=0.830 (IF=0.909, MQ=0.750)
2025-12-21T21:53:53.361644+0800 | INFO | Step 378: loss=0.3088, acc=0.909 (IF=0.818, MQ=1.000)
2025-12-21T21:53:54.854213+0800 | INFO | Step 379: loss=0.5233, acc=0.756 (IF=0.700, MQ=0.812)
2025-12-21T21:53:56.313114+0800 | INFO | Step 380: loss=0.4435, acc=0.750 (IF=0.750, MQ=0.750)
2025-12-21T21:53:57.735758+0800 | INFO | Step 381: loss=0.3185, acc=0.826 (IF=0.714, MQ=0.938)
2025-12-21T21:53:59.191990+0800 | INFO | Step 382: loss=0.4669, acc=0.868 (IF=0.923, MQ=0.812)
2025-12-21T21:54:00.637287+0800 | INFO | Step 383: loss=0.3559, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T21:54:02.079987+0800 | INFO | Step 384: loss=0.4414, acc=0.725 (IF=0.700, MQ=0.750)
2025-12-21T21:54:03.532385+0800 | INFO | Step 385: loss=0.4961, acc=0.635 (IF=0.583, MQ=0.688)
2025-12-21T21:54:04.978328+0800 | INFO | Step 386: loss=0.4521, acc=0.818 (IF=0.636, MQ=1.000)
2025-12-21T21:54:06.432538+0800 | INFO | Step 387: loss=0.3855, acc=0.868 (IF=0.923, MQ=0.812)
2025-12-21T21:54:09.494197+0800 | INFO | Step 388: loss=0.4319, acc=0.819 (IF=0.889, MQ=0.750)
2025-12-21T21:54:10.961120+0800 | INFO | Step 389: loss=0.2567, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T21:54:12.425995+0800 | INFO | Step 390: loss=0.3343, acc=0.812 (IF=0.750, MQ=0.875)
2025-12-21T21:54:13.885544+0800 | INFO | Step 391: loss=0.3018, acc=0.865 (IF=0.917, MQ=0.812)
2025-12-21T21:54:15.355295+0800 | INFO | Step 392: loss=0.4819, acc=0.706 (IF=0.600, MQ=0.812)
2025-12-21T21:54:16.781713+0800 | INFO | Step 393: loss=0.3668, acc=0.799 (IF=0.786, MQ=0.812)
2025-12-21T21:54:18.228857+0800 | INFO | Step 394: loss=0.3045, acc=0.856 (IF=0.900, MQ=0.812)
2025-12-21T21:54:19.663554+0800 | INFO | Step 395: loss=0.1710, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T21:54:21.103720+0800 | INFO | Step 396: loss=0.3064, acc=0.844 (IF=0.750, MQ=0.938)
2025-12-21T21:54:22.587032+0800 | INFO | Step 397: loss=0.2401, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T21:54:24.058834+0800 | INFO | Step 398: loss=0.2315, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T21:54:25.493659+0800 | INFO | Step 399: loss=0.3853, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T21:54:26.930183+0800 | INFO | Step 400: loss=0.3289, acc=0.835 (IF=0.733, MQ=0.938)
2025-12-21T21:54:34.460130+0800 | INFO |
============================================================
Validation Results (took 7.51s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.4310
Quality Acc: 0.6875
Average Acc: 0.5593
Total Loss: 0.6861
Instruction Loss: 0.6941
Quality Loss: 0.6781
============================================================
2025-12-21T21:54:37.031317+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_100.pt
2025-12-21T21:54:37.032374+0800 | INFO | Best 3 checkpoints:
2025-12-21T21:54:37.032565+0800 | INFO | 1. Step 200: acc=0.5593 (reward_model.best_200.pt)
2025-12-21T21:54:37.032619+0800 | INFO | 2. Step 400: acc=0.5593 (reward_model.best_400.pt)
2025-12-21T21:54:37.032672+0800 | INFO | 3. Step 300: acc=0.5491 (reward_model.best_300.pt)
2025-12-21T21:54:37.742434+0800 | INFO | Step 401: loss=0.5161, acc=0.732 (IF=0.714, MQ=0.750)
2025-12-21T21:54:38.847217+0800 | INFO | Step 402: loss=0.3809, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T21:54:40.317057+0800 | INFO | Step 403: loss=0.2868, acc=0.838 (IF=0.800, MQ=0.875)
2025-12-21T21:54:41.765656+0800 | INFO | Step 404: loss=0.2302, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:54:43.944973+0800 | INFO | Step 405: loss=0.1460, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:54:45.409242+0800 | INFO | Step 406: loss=0.3662, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T21:54:46.844249+0800 | INFO | Step 407: loss=0.3711, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T21:54:48.292527+0800 | INFO | Step 408: loss=0.3138, acc=0.823 (IF=0.833, MQ=0.812)
2025-12-21T21:54:49.731339+0800 | INFO | Step 409: loss=0.2407, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:54:51.175775+0800 | INFO | Step 410: loss=0.2166, acc=0.876 (IF=0.818, MQ=0.933)
2025-12-21T21:54:52.655293+0800 | INFO | Step 411: loss=0.3953, acc=0.798 (IF=0.909, MQ=0.688)
2025-12-21T21:54:54.092533+0800 | INFO | Step 412: loss=0.2436, acc=0.897 (IF=0.857, MQ=0.938)
2025-12-21T21:54:55.530978+0800 | INFO | Step 413: loss=0.3594, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T21:54:56.971058+0800 | INFO | Step 414: loss=0.3320, acc=0.819 (IF=0.700, MQ=0.938)
2025-12-21T21:54:58.414781+0800 | INFO | Step 415: loss=0.2635, acc=0.909 (IF=0.818, MQ=1.000)
2025-12-21T21:54:59.439588+0800 | INFO | Step 416: loss=0.2041, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T21:55:00.881466+0800 | INFO | Step 417: loss=0.4453, acc=0.806 (IF=0.800, MQ=0.812)
2025-12-21T21:55:02.347908+0800 | INFO | Step 418: loss=0.2094, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:55:03.797950+0800 | INFO | Step 419: loss=0.3781, acc=0.875 (IF=1.000, MQ=0.750)
2025-12-21T21:55:05.245659+0800 | INFO | Step 420: loss=0.2197, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:55:06.695122+0800 | INFO | Step 421: loss=0.2466, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:55:08.151396+0800 | INFO | Step 422: loss=0.4072, acc=0.861 (IF=0.846, MQ=0.875)
2025-12-21T21:55:09.221531+0800 | INFO | Step 423: loss=0.5681, acc=0.740 (IF=0.667, MQ=0.812)
2025-12-21T21:55:10.664987+0800 | INFO | Step 424: loss=0.2271, acc=0.925 (IF=0.917, MQ=0.933)
2025-12-21T21:55:12.129038+0800 | INFO | Step 425: loss=0.2655, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T21:55:13.568246+0800 | INFO | Step 426: loss=0.3873, acc=0.753 (IF=0.818, MQ=0.688)
2025-12-21T21:55:15.009976+0800 | INFO | Step 427: loss=0.2576, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T21:55:16.464800+0800 | INFO | Step 428: loss=0.1747, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:55:17.917827+0800 | INFO | Step 429: loss=0.4211, acc=0.798 (IF=0.846, MQ=0.750)
2025-12-21T21:55:19.377233+0800 | INFO | Step 430: loss=0.2690, acc=0.875 (IF=1.000, MQ=0.750)
2025-12-21T21:55:20.819476+0800 | INFO | Step 431: loss=0.3877, acc=0.806 (IF=0.800, MQ=0.812)
2025-12-21T21:55:22.267551+0800 | INFO | Step 432: loss=0.1855, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:55:23.745362+0800 | INFO | Step 433: loss=0.3248, acc=0.875 (IF=0.875, MQ=0.875)
2025-12-21T21:55:25.225676+0800 | INFO | Step 434: loss=0.3541, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T21:55:26.655065+0800 | INFO | Step 435: loss=0.3153, acc=0.826 (IF=0.778, MQ=0.875)
2025-12-21T21:55:28.077890+0800 | INFO | Step 436: loss=0.4293, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T21:55:29.486835+0800 | INFO | Step 437: loss=0.3867, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T21:55:30.904805+0800 | INFO | Step 438: loss=0.3524, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T21:55:32.318104+0800 | INFO | Step 439: loss=0.3194, acc=0.853 (IF=0.769, MQ=0.938)
2025-12-21T21:55:33.734037+0800 | INFO | Step 440: loss=0.3241, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T21:55:35.195334+0800 | INFO | Step 441: loss=0.3011, acc=0.838 (IF=0.800, MQ=0.875)
2025-12-21T21:55:36.612326+0800 | INFO | Step 442: loss=0.3747, acc=0.817 (IF=0.833, MQ=0.800)
2025-12-21T21:55:38.037493+0800 | INFO | Step 443: loss=0.2896, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T21:55:40.684518+0800 | INFO | Step 444: loss=0.3278, acc=0.844 (IF=0.889, MQ=0.800)
2025-12-21T21:55:42.305473+0800 | INFO | Step 445: loss=0.1492, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:55:43.853693+0800 | INFO | Step 446: loss=0.3787, acc=0.817 (IF=0.833, MQ=0.800)
2025-12-21T21:55:45.412830+0800 | INFO | Step 447: loss=0.3900, acc=0.776 (IF=0.615, MQ=0.938)
2025-12-21T21:55:46.880531+0800 | INFO | Step 448: loss=0.1805, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:55:48.372610+0800 | INFO | Step 449: loss=0.2311, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T21:55:49.837457+0800 | INFO | Step 450: loss=0.4490, acc=0.804 (IF=0.857, MQ=0.750)
2025-12-21T21:55:51.318046+0800 | INFO | Step 451: loss=0.3757, acc=0.844 (IF=0.875, MQ=0.812)
2025-12-21T21:55:52.772978+0800 | INFO | Step 452: loss=0.1701, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:55:54.223871+0800 | INFO | Step 453: loss=0.4249, acc=0.781 (IF=0.750, MQ=0.812)
2025-12-21T21:55:55.694661+0800 | INFO | Step 454: loss=0.2609, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T21:55:57.347684+0800 | INFO | Step 455: loss=0.2660, acc=0.861 (IF=0.909, MQ=0.812)
2025-12-21T21:55:58.845897+0800 | INFO | Step 456: loss=0.2570, acc=0.893 (IF=0.786, MQ=1.000)
2025-12-21T21:56:00.327754+0800 | INFO | Step 457: loss=0.1881, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T21:56:01.769642+0800 | INFO | Step 458: loss=0.2818, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T21:56:03.208032+0800 | INFO | Step 459: loss=0.3102, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T21:56:04.655692+0800 | INFO | Step 460: loss=0.2267, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T21:56:06.108333+0800 | INFO | Step 461: loss=0.2603, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T21:56:07.562340+0800 | INFO | Step 462: loss=0.1827, acc=0.906 (IF=0.875, MQ=0.938)
2025-12-21T21:56:09.021692+0800 | INFO | Step 463: loss=0.2560, acc=0.871 (IF=0.929, MQ=0.812)
2025-12-21T21:56:10.470343+0800 | INFO | Step 464: loss=0.2322, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T21:56:11.892202+0800 | INFO | Step 465: loss=0.1948, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T21:56:13.334477+0800 | INFO | Step 466: loss=0.3850, acc=0.832 (IF=0.727, MQ=0.938)
2025-12-21T21:56:14.366781+0800 | INFO | Step 467: loss=0.3759, acc=0.795 (IF=0.714, MQ=0.875)
2025-12-21T21:56:15.804589+0800 | INFO | Step 468: loss=0.3477, acc=0.875 (IF=1.000, MQ=0.750)
2025-12-21T21:56:17.270976+0800 | INFO | Step 469: loss=0.3348, acc=0.858 (IF=0.778, MQ=0.938)
2025-12-21T21:56:18.734683+0800 | INFO | Step 470: loss=0.1455, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:56:20.183407+0800 | INFO | Step 471: loss=0.2242, acc=0.906 (IF=0.875, MQ=0.938)
2025-12-21T21:56:21.631305+0800 | INFO | Step 472: loss=0.2362, acc=0.875 (IF=1.000, MQ=0.750)
2025-12-21T21:56:23.076028+0800 | INFO | Step 473: loss=0.2915, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T21:56:24.523765+0800 | INFO | Step 474: loss=0.1862, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:56:25.982709+0800 | INFO | Step 475: loss=0.1046, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:56:27.432626+0800 | INFO | Step 476: loss=0.3963, acc=0.745 (IF=0.615, MQ=0.875)
2025-12-21T21:56:28.875761+0800 | INFO | Step 477: loss=0.6981, acc=0.739 (IF=0.727, MQ=0.750)
2025-12-21T21:56:29.918110+0800 | INFO | Step 478: loss=0.3692, acc=0.801 (IF=0.727, MQ=0.875)
2025-12-21T21:56:31.355309+0800 | INFO | Step 479: loss=0.2944, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:56:32.385437+0800 | INFO | Step 480: loss=0.4399, acc=0.856 (IF=0.900, MQ=0.812)
2025-12-21T21:56:33.879283+0800 | INFO | Step 481: loss=0.3964, acc=0.795 (IF=0.778, MQ=0.812)
2025-12-21T21:56:35.336312+0800 | INFO | Step 482: loss=0.2686, acc=0.865 (IF=0.917, MQ=0.812)
2025-12-21T21:56:36.785015+0800 | INFO | Step 483: loss=0.2420, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T21:56:38.235996+0800 | INFO | Step 484: loss=0.2610, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T21:56:39.716969+0800 | INFO | Step 485: loss=0.4676, acc=0.812 (IF=0.750, MQ=0.875)
2025-12-21T21:56:41.194666+0800 | INFO | Step 486: loss=0.2695, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T21:56:42.668149+0800 | INFO | Step 487: loss=0.1413, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:56:44.124017+0800 | INFO | Step 488: loss=0.2504, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:56:45.568865+0800 | INFO | Step 489: loss=0.5143, acc=0.756 (IF=0.636, MQ=0.875)
2025-12-21T21:56:47.019720+0800 | INFO | Step 490: loss=0.1180, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:56:48.462820+0800 | INFO | Step 491: loss=0.2834, acc=0.875 (IF=0.875, MQ=0.875)
2025-12-21T21:56:49.528034+0800 | INFO | Step 492: loss=0.2674, acc=0.871 (IF=0.875, MQ=0.867)
2025-12-21T21:56:50.972354+0800 | INFO | Step 493: loss=0.2889, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:56:52.422269+0800 | INFO | Step 494: loss=0.1873, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:56:53.866006+0800 | INFO | Step 495: loss=0.3315, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T21:56:55.302354+0800 | INFO | Step 496: loss=0.2788, acc=0.866 (IF=0.857, MQ=0.875)
2025-12-21T21:56:56.769609+0800 | INFO | Step 497: loss=0.3019, acc=0.838 (IF=0.800, MQ=0.875)
2025-12-21T21:56:58.213777+0800 | INFO | Step 498: loss=0.3424, acc=0.819 (IF=0.889, MQ=0.750)
2025-12-21T21:57:00.968513+0800 | INFO | Step 499: loss=0.3581, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T21:57:02.601611+0800 | INFO | Step 500: loss=0.2081, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:57:10.053896+0800 | INFO |
============================================================
Validation Results (took 7.43s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.4138
Quality Acc: 0.6875
Average Acc: 0.5506
Total Loss: 0.6844
Instruction Loss: 0.6928
Quality Loss: 0.6760
============================================================
2025-12-21T21:57:12.642503+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_300.pt
2025-12-21T21:57:12.643100+0800 | INFO | Best 3 checkpoints:
2025-12-21T21:57:12.643239+0800 | INFO | 1. Step 200: acc=0.5593 (reward_model.best_200.pt)
2025-12-21T21:57:12.643292+0800 | INFO | 2. Step 400: acc=0.5593 (reward_model.best_400.pt)
2025-12-21T21:57:12.643336+0800 | INFO | 3. Step 500: acc=0.5506 (reward_model.best_500.pt)
2025-12-21T21:57:14.114270+0800 | INFO | Step 501: loss=0.3129, acc=0.829 (IF=0.846, MQ=0.812)
2025-12-21T21:57:15.552205+0800 | INFO | Step 502: loss=0.2747, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:57:16.998382+0800 | INFO | Step 503: loss=0.3776, acc=0.830 (IF=0.786, MQ=0.875)
2025-12-21T21:57:18.484236+0800 | INFO | Step 504: loss=0.2865, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T21:57:19.929919+0800 | INFO | Step 505: loss=0.2185, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:57:20.997909+0800 | INFO | Step 506: loss=0.4606, acc=0.795 (IF=0.778, MQ=0.812)
2025-12-21T21:57:22.429520+0800 | INFO | Step 507: loss=0.2114, acc=0.875 (IF=1.000, MQ=0.750)
2025-12-21T21:57:23.864290+0800 | INFO | Step 508: loss=0.3220, acc=0.819 (IF=0.700, MQ=0.938)
2025-12-21T21:57:25.305529+0800 | INFO | Step 509: loss=0.3163, acc=0.851 (IF=0.889, MQ=0.812)
2025-12-21T21:57:26.753095+0800 | INFO | Step 510: loss=0.1965, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T21:57:28.238146+0800 | INFO | Step 511: loss=0.1211, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:57:29.695236+0800 | INFO | Step 512: loss=0.2987, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T21:57:31.150410+0800 | INFO | Step 513: loss=0.2533, acc=0.865 (IF=0.917, MQ=0.812)
2025-12-21T21:57:32.602626+0800 | INFO | Step 514: loss=0.4390, acc=0.826 (IF=0.778, MQ=0.875)
2025-12-21T21:57:34.053608+0800 | INFO | Step 515: loss=0.2437, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T21:57:35.489376+0800 | INFO | Step 516: loss=0.1895, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:57:36.943838+0800 | INFO | Step 517: loss=0.2353, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T21:57:38.388630+0800 | INFO | Step 518: loss=0.2278, acc=0.858 (IF=0.778, MQ=0.938)
2025-12-21T21:57:39.871444+0800 | INFO | Step 519: loss=0.1780, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T21:57:40.923191+0800 | INFO | Step 520: loss=0.3445, acc=0.795 (IF=0.778, MQ=0.812)
2025-12-21T21:57:42.371255+0800 | INFO | Step 521: loss=0.3156, acc=0.921 (IF=0.909, MQ=0.933)
2025-12-21T21:57:43.820577+0800 | INFO | Step 522: loss=0.2678, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:57:45.273116+0800 | INFO | Step 523: loss=0.3265, acc=0.861 (IF=0.846, MQ=0.875)
2025-12-21T21:57:46.319229+0800 | INFO | Step 524: loss=0.1924, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T21:57:47.750590+0800 | INFO | Step 525: loss=0.1114, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:57:48.819216+0800 | INFO | Step 526: loss=0.4485, acc=0.806 (IF=0.800, MQ=0.812)
2025-12-21T21:57:50.270427+0800 | INFO | Step 527: loss=0.2084, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T21:57:51.711861+0800 | INFO | Step 528: loss=0.1268, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:57:53.157845+0800 | INFO | Step 529: loss=0.5183, acc=0.697 (IF=0.769, MQ=0.625)
2025-12-21T21:57:54.586633+0800 | INFO | Step 530: loss=0.2188, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:57:56.014975+0800 | INFO | Step 531: loss=0.1892, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:57:57.443046+0800 | INFO | Step 532: loss=0.2601, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T21:57:58.875625+0800 | INFO | Step 533: loss=0.3787, acc=0.893 (IF=0.786, MQ=1.000)
2025-12-21T21:58:00.305667+0800 | INFO | Step 534: loss=0.1287, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T21:58:01.735845+0800 | INFO | Step 535: loss=0.2017, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:58:03.164741+0800 | INFO | Step 536: loss=0.2010, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:58:04.612467+0800 | INFO | Step 537: loss=0.3170, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T21:58:06.058130+0800 | INFO | Step 538: loss=0.2852, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T21:58:07.498078+0800 | INFO | Step 539: loss=0.1267, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:58:08.943998+0800 | INFO | Step 540: loss=0.2541, acc=0.929 (IF=0.857, MQ=1.000)
2025-12-21T21:58:10.399061+0800 | INFO | Step 541: loss=0.1912, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T21:58:11.449169+0800 | INFO | Step 542: loss=0.2929, acc=0.844 (IF=0.889, MQ=0.800)
2025-12-21T21:58:12.894866+0800 | INFO | Step 543: loss=0.2182, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:58:14.338084+0800 | INFO | Step 544: loss=0.2465, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T21:58:15.789555+0800 | INFO | Step 545: loss=0.3791, acc=0.738 (IF=0.600, MQ=0.875)
2025-12-21T21:58:17.237995+0800 | INFO | Step 546: loss=0.0840, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:58:18.687651+0800 | INFO | Step 547: loss=0.2522, acc=0.851 (IF=0.889, MQ=0.812)
2025-12-21T21:58:20.126555+0800 | INFO | Step 548: loss=0.2191, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:58:21.576135+0800 | INFO | Step 549: loss=0.1724, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T21:58:23.019794+0800 | INFO | Step 550: loss=0.2054, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T21:58:24.472746+0800 | INFO | Step 551: loss=0.3462, acc=0.835 (IF=0.857, MQ=0.812)
2025-12-21T21:58:25.910885+0800 | INFO | Step 552: loss=0.2387, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T21:58:27.364423+0800 | INFO | Step 553: loss=0.2147, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:58:28.804006+0800 | INFO | Step 554: loss=0.4355, acc=0.798 (IF=0.846, MQ=0.750)
2025-12-21T21:58:31.641834+0800 | INFO | Step 555: loss=0.1535, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:58:33.088805+0800 | INFO | Step 556: loss=0.3392, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T21:58:34.133425+0800 | INFO | Step 557: loss=0.3070, acc=0.815 (IF=0.818, MQ=0.812)
2025-12-21T21:58:35.606185+0800 | INFO | Step 558: loss=0.1177, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:58:37.052949+0800 | INFO | Step 559: loss=0.3389, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T21:58:38.557648+0800 | INFO | Step 560: loss=0.1484, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:58:40.014567+0800 | INFO | Step 561: loss=0.1605, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T21:58:41.470174+0800 | INFO | Step 562: loss=0.1519, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T21:58:42.914124+0800 | INFO | Step 563: loss=0.2848, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T21:58:44.364721+0800 | INFO | Step 564: loss=0.1872, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:58:45.839548+0800 | INFO | Step 565: loss=0.1898, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T21:58:47.296496+0800 | INFO | Step 566: loss=0.2493, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:58:48.738939+0800 | INFO | Step 567: loss=0.1616, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T21:58:50.189543+0800 | INFO | Step 568: loss=0.1127, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:58:51.641204+0800 | INFO | Step 569: loss=0.3156, acc=0.865 (IF=0.917, MQ=0.812)
2025-12-21T21:58:52.701110+0800 | INFO | Step 570: loss=0.4232, acc=0.756 (IF=0.700, MQ=0.812)
2025-12-21T21:58:54.147142+0800 | INFO | Step 571: loss=0.1194, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:58:55.612878+0800 | INFO | Step 572: loss=0.1821, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T21:58:57.070036+0800 | INFO | Step 573: loss=0.1943, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T21:58:58.521199+0800 | INFO | Step 574: loss=0.1242, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:58:59.555603+0800 | INFO | Step 575: loss=0.2346, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T21:59:01.001910+0800 | INFO | Step 576: loss=0.1709, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T21:59:02.451702+0800 | INFO | Step 577: loss=0.2727, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T21:59:03.898330+0800 | INFO | Step 578: loss=0.1342, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:59:05.349145+0800 | INFO | Step 579: loss=0.3674, acc=0.876 (IF=0.818, MQ=0.933)
2025-12-21T21:59:06.796077+0800 | INFO | Step 580: loss=0.4956, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T21:59:08.239560+0800 | INFO | Step 581: loss=0.3288, acc=0.762 (IF=0.900, MQ=0.625)
2025-12-21T21:59:09.687623+0800 | INFO | Step 582: loss=0.3159, acc=0.844 (IF=0.875, MQ=0.812)
2025-12-21T21:59:11.127856+0800 | INFO | Step 583: loss=0.1105, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T21:59:12.572225+0800 | INFO | Step 584: loss=0.3684, acc=0.861 (IF=0.846, MQ=0.875)
2025-12-21T21:59:14.012910+0800 | INFO | Step 585: loss=0.2718, acc=0.878 (IF=0.889, MQ=0.867)
2025-12-21T21:59:15.461950+0800 | INFO | Step 586: loss=0.2603, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T21:59:16.907803+0800 | INFO | Step 587: loss=0.2607, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T21:59:18.398370+0800 | INFO | Step 588: loss=0.1882, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T21:59:19.560785+0800 | INFO | Step 589: loss=0.6553, acc=0.750 (IF=0.625, MQ=0.875)
2025-12-21T21:59:21.175620+0800 | INFO | Step 590: loss=0.2421, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T21:59:22.674540+0800 | INFO | Step 591: loss=0.1869, acc=0.897 (IF=0.857, MQ=0.938)
2025-12-21T21:59:24.293780+0800 | INFO | Step 592: loss=0.2072, acc=0.858 (IF=0.778, MQ=0.938)
2025-12-21T21:59:25.821443+0800 | INFO | Step 593: loss=0.1530, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:59:27.255994+0800 | INFO | Step 594: loss=0.3038, acc=0.861 (IF=0.909, MQ=0.812)
2025-12-21T21:59:28.717379+0800 | INFO | Step 595: loss=0.2668, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T21:59:30.182368+0800 | INFO | Step 596: loss=0.2415, acc=0.897 (IF=0.857, MQ=0.938)
2025-12-21T21:59:31.629894+0800 | INFO | Step 597: loss=0.1749, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:59:33.096078+0800 | INFO | Step 598: loss=0.1811, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T21:59:34.555764+0800 | INFO | Step 599: loss=0.3480, acc=0.835 (IF=0.857, MQ=0.812)
2025-12-21T21:59:36.094587+0800 | INFO | Step 600: loss=0.1901, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:59:43.266777+0800 | INFO |
============================================================
Validation Results (took 7.15s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.4483
Quality Acc: 0.6500
Average Acc: 0.5491
Total Loss: 0.6841
Instruction Loss: 0.6918
Quality Loss: 0.6764
============================================================
2025-12-21T21:59:45.694479+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_600.pt
2025-12-21T21:59:45.694938+0800 | INFO | Best 3 checkpoints:
2025-12-21T21:59:45.695029+0800 | INFO | 1. Step 200: acc=0.5593 (reward_model.best_200.pt)
2025-12-21T21:59:45.695082+0800 | INFO | 2. Step 400: acc=0.5593 (reward_model.best_400.pt)
2025-12-21T21:59:45.695127+0800 | INFO | 3. Step 500: acc=0.5506 (reward_model.best_500.pt)
2025-12-21T21:59:47.154026+0800 | INFO | Step 601: loss=0.3905, acc=0.883 (IF=0.900, MQ=0.867)
2025-12-21T21:59:48.594971+0800 | INFO | Step 602: loss=0.3158, acc=0.875 (IF=0.875, MQ=0.875)
2025-12-21T21:59:50.033262+0800 | INFO | Step 603: loss=0.1531, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T21:59:51.482337+0800 | INFO | Step 604: loss=0.3268, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T21:59:52.930291+0800 | INFO | Step 605: loss=0.1172, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T21:59:54.378522+0800 | INFO | Step 606: loss=0.1241, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T21:59:55.819112+0800 | INFO | Step 607: loss=0.2375, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T21:59:57.291177+0800 | INFO | Step 608: loss=0.4373, acc=0.784 (IF=0.818, MQ=0.750)
2025-12-21T21:59:58.753646+0800 | INFO | Step 609: loss=0.1347, acc=0.967 (IF=0.933, MQ=1.000)
2025-12-21T22:00:01.462541+0800 | INFO | Step 610: loss=0.6465, acc=0.629 (IF=0.571, MQ=0.688)
2025-12-21T22:00:02.920023+0800 | INFO | Step 611: loss=0.2783, acc=0.826 (IF=0.778, MQ=0.875)
2025-12-21T22:00:04.364255+0800 | INFO | Step 612: loss=0.1082, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:00:05.465527+0800 | INFO | Step 613: loss=0.2576, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:00:06.931892+0800 | INFO | Step 614: loss=0.0613, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:00:08.373949+0800 | INFO | Step 615: loss=0.0984, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:00:09.815850+0800 | INFO | Step 616: loss=0.2421, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:00:11.259878+0800 | INFO | Step 617: loss=0.3015, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T22:00:12.707549+0800 | INFO | Step 618: loss=0.3275, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:00:14.182790+0800 | INFO | Step 619: loss=0.2428, acc=0.865 (IF=0.917, MQ=0.812)
2025-12-21T22:00:15.638481+0800 | INFO | Step 620: loss=0.1641, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T22:00:16.714193+0800 | INFO | Step 621: loss=0.6061, acc=0.740 (IF=0.667, MQ=0.812)
2025-12-21T22:00:18.184937+0800 | INFO | Step 622: loss=0.4147, acc=0.771 (IF=0.667, MQ=0.875)
2025-12-21T22:00:19.230419+0800 | INFO | Step 623: loss=0.1909, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:00:20.685880+0800 | INFO | Step 624: loss=0.2008, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:00:22.173889+0800 | INFO | Step 625: loss=0.2885, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:00:23.671672+0800 | INFO | Step 626: loss=0.1071, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:00:25.169741+0800 | INFO | Step 627: loss=0.2773, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:00:26.666468+0800 | INFO | Step 628: loss=0.2440, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:00:28.123472+0800 | INFO | Step 629: loss=0.3445, acc=0.862 (IF=0.786, MQ=0.938)
2025-12-21T22:00:29.575225+0800 | INFO | Step 630: loss=0.1438, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:00:31.026691+0800 | INFO | Step 631: loss=0.2752, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T22:00:32.476572+0800 | INFO | Step 632: loss=0.1826, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:00:33.926986+0800 | INFO | Step 633: loss=0.2695, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T22:00:35.382328+0800 | INFO | Step 634: loss=0.2560, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T22:00:36.829322+0800 | INFO | Step 635: loss=0.2592, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T22:00:38.281965+0800 | INFO | Step 636: loss=0.1056, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:00:39.740218+0800 | INFO | Step 637: loss=0.3372, acc=0.801 (IF=0.727, MQ=0.875)
2025-12-21T22:00:41.196854+0800 | INFO | Step 638: loss=0.2256, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:00:42.710500+0800 | INFO | Step 639: loss=0.1894, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:00:44.164969+0800 | INFO | Step 640: loss=0.1669, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:00:45.615188+0800 | INFO | Step 641: loss=0.1607, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:00:47.098956+0800 | INFO | Step 642: loss=0.1402, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:00:48.544412+0800 | INFO | Step 643: loss=0.0812, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:00:50.005708+0800 | INFO | Step 644: loss=0.2950, acc=0.838 (IF=0.800, MQ=0.875)
2025-12-21T22:00:51.465284+0800 | INFO | Step 645: loss=0.1883, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:00:52.914057+0800 | INFO | Step 646: loss=0.2838, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:00:54.356013+0800 | INFO | Step 647: loss=0.2144, acc=0.838 (IF=0.800, MQ=0.875)
2025-12-21T22:00:55.801306+0800 | INFO | Step 648: loss=0.2458, acc=0.923 (IF=0.846, MQ=1.000)
2025-12-21T22:00:57.255172+0800 | INFO | Step 649: loss=0.0298, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:00:58.697900+0800 | INFO | Step 650: loss=0.2953, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T22:01:00.131815+0800 | INFO | Step 651: loss=0.1553, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:01:01.574463+0800 | INFO | Step 652: loss=0.3355, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:01:03.014741+0800 | INFO | Step 653: loss=0.2557, acc=0.835 (IF=0.857, MQ=0.812)
2025-12-21T22:01:04.454572+0800 | INFO | Step 654: loss=0.3121, acc=0.865 (IF=0.917, MQ=0.812)
2025-12-21T22:01:05.888047+0800 | INFO | Step 655: loss=0.1357, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:01:07.330586+0800 | INFO | Step 656: loss=0.2775, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T22:01:08.763452+0800 | INFO | Step 657: loss=0.0963, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:01:10.195336+0800 | INFO | Step 658: loss=0.2753, acc=0.866 (IF=0.857, MQ=0.875)
2025-12-21T22:01:11.607044+0800 | INFO | Step 659: loss=0.2397, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:01:12.624711+0800 | INFO | Step 660: loss=0.1937, acc=0.906 (IF=0.875, MQ=0.938)
2025-12-21T22:01:14.064416+0800 | INFO | Step 661: loss=0.1672, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:01:15.505407+0800 | INFO | Step 662: loss=0.4438, acc=0.725 (IF=0.700, MQ=0.750)
2025-12-21T22:01:16.952647+0800 | INFO | Step 663: loss=0.4310, acc=0.781 (IF=0.750, MQ=0.812)
2025-12-21T22:01:18.400847+0800 | INFO | Step 664: loss=0.3440, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:01:19.839391+0800 | INFO | Step 665: loss=0.3404, acc=0.823 (IF=0.833, MQ=0.812)
2025-12-21T22:01:22.498215+0800 | INFO | Step 666: loss=0.1359, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:01:23.982524+0800 | INFO | Step 667: loss=0.1416, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:01:25.438044+0800 | INFO | Step 668: loss=0.2208, acc=0.917 (IF=0.900, MQ=0.933)
2025-12-21T22:01:26.977347+0800 | INFO | Step 669: loss=0.0261, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:01:28.413598+0800 | INFO | Step 670: loss=0.1684, acc=0.909 (IF=0.818, MQ=1.000)
2025-12-21T22:01:29.861205+0800 | INFO | Step 671: loss=0.1802, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:01:31.293769+0800 | INFO | Step 672: loss=0.3802, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T22:01:32.741350+0800 | INFO | Step 673: loss=0.2582, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T22:01:34.162086+0800 | INFO | Step 674: loss=0.1624, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:01:35.598479+0800 | INFO | Step 675: loss=0.1826, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:01:36.631813+0800 | INFO | Step 676: loss=0.2014, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:01:38.113937+0800 | INFO | Step 677: loss=0.1447, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:01:39.542553+0800 | INFO | Step 678: loss=0.3904, acc=0.858 (IF=0.778, MQ=0.938)
2025-12-21T22:01:40.980253+0800 | INFO | Step 679: loss=0.1693, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:01:42.477451+0800 | INFO | Step 680: loss=0.1509, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:01:43.914913+0800 | INFO | Step 681: loss=0.3461, acc=0.868 (IF=0.923, MQ=0.812)
2025-12-21T22:01:45.352035+0800 | INFO | Step 682: loss=0.2987, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:01:46.811040+0800 | INFO | Step 683: loss=0.1429, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:01:48.274391+0800 | INFO | Step 684: loss=0.1612, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:01:49.726635+0800 | INFO | Step 685: loss=0.2151, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T22:01:51.195816+0800 | INFO | Step 686: loss=0.3189, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:01:52.653268+0800 | INFO | Step 687: loss=0.1912, acc=0.906 (IF=0.875, MQ=0.938)
2025-12-21T22:01:53.698121+0800 | INFO | Step 688: loss=0.1050, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:01:55.151637+0800 | INFO | Step 689: loss=0.1909, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:01:56.220311+0800 | INFO | Step 690: loss=0.9839, acc=0.622 (IF=0.556, MQ=0.688)
2025-12-21T22:01:57.648934+0800 | INFO | Step 691: loss=0.3456, acc=0.829 (IF=0.846, MQ=0.812)
2025-12-21T22:01:59.084697+0800 | INFO | Step 692: loss=0.2873, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:02:00.529837+0800 | INFO | Step 693: loss=0.1781, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:02:01.972800+0800 | INFO | Step 694: loss=0.1082, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:02:03.422615+0800 | INFO | Step 695: loss=0.1262, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:02:04.871793+0800 | INFO | Step 696: loss=0.3863, acc=0.875 (IF=1.000, MQ=0.750)
2025-12-21T22:02:06.326240+0800 | INFO | Step 697: loss=0.1600, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:02:07.775217+0800 | INFO | Step 698: loss=0.1868, acc=0.902 (IF=0.867, MQ=0.938)
2025-12-21T22:02:09.227302+0800 | INFO | Step 699: loss=0.1452, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T22:02:10.675416+0800 | INFO | Step 700: loss=0.4430, acc=0.799 (IF=0.786, MQ=0.812)
2025-12-21T22:02:17.934383+0800 | INFO |
============================================================
Validation Results (took 7.23s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.4310
Quality Acc: 0.6750
Average Acc: 0.5530
Total Loss: 0.6825
Instruction Loss: 0.6908
Quality Loss: 0.6741
============================================================
2025-12-21T22:02:20.548871+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_500.pt
2025-12-21T22:02:20.549385+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:02:20.549496+0800 | INFO | 1. Step 200: acc=0.5593 (reward_model.best_200.pt)
2025-12-21T22:02:20.549563+0800 | INFO | 2. Step 400: acc=0.5593 (reward_model.best_400.pt)
2025-12-21T22:02:20.549618+0800 | INFO | 3. Step 700: acc=0.5530 (reward_model.best_700.pt)
2025-12-21T22:02:22.024759+0800 | INFO | Step 701: loss=0.1755, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:02:23.476078+0800 | INFO | Step 702: loss=0.3572, acc=0.823 (IF=0.833, MQ=0.812)
2025-12-21T22:02:24.914286+0800 | INFO | Step 703: loss=0.2341, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T22:02:26.391377+0800 | INFO | Step 704: loss=0.0935, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:02:27.844464+0800 | INFO | Step 705: loss=0.2413, acc=0.865 (IF=0.917, MQ=0.812)
2025-12-21T22:02:29.295587+0800 | INFO | Step 706: loss=0.1247, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:02:30.336582+0800 | INFO | Step 707: loss=0.3077, acc=0.837 (IF=0.923, MQ=0.750)
2025-12-21T22:02:31.795537+0800 | INFO | Step 708: loss=0.1698, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:02:33.246601+0800 | INFO | Step 709: loss=0.2986, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:02:34.690181+0800 | INFO | Step 710: loss=0.2063, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:02:36.138805+0800 | INFO | Step 711: loss=0.1479, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:02:37.210145+0800 | INFO | Step 712: loss=0.1627, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:02:38.679250+0800 | INFO | Step 713: loss=0.3026, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:02:40.130793+0800 | INFO | Step 714: loss=0.3398, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:02:41.573207+0800 | INFO | Step 715: loss=0.2780, acc=0.897 (IF=0.857, MQ=0.938)
2025-12-21T22:02:43.026997+0800 | INFO | Step 716: loss=0.2472, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:02:44.472506+0800 | INFO | Step 717: loss=0.3958, acc=0.812 (IF=0.750, MQ=0.875)
2025-12-21T22:02:45.968741+0800 | INFO | Step 718: loss=0.2028, acc=0.864 (IF=0.727, MQ=1.000)
2025-12-21T22:02:47.408507+0800 | INFO | Step 719: loss=0.2895, acc=0.902 (IF=0.929, MQ=0.875)
2025-12-21T22:02:48.856074+0800 | INFO | Step 720: loss=0.0870, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:02:51.832700+0800 | INFO | Step 721: loss=0.3138, acc=0.822 (IF=0.769, MQ=0.875)
2025-12-21T22:02:53.300911+0800 | INFO | Step 722: loss=0.0533, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:02:54.767576+0800 | INFO | Step 723: loss=0.0877, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:02:56.238223+0800 | INFO | Step 724: loss=0.2901, acc=0.897 (IF=0.857, MQ=0.938)
2025-12-21T22:02:57.709403+0800 | INFO | Step 725: loss=0.2495, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:02:59.141666+0800 | INFO | Step 726: loss=0.1327, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:03:00.591190+0800 | INFO | Step 727: loss=0.3291, acc=0.893 (IF=1.000, MQ=0.786)
2025-12-21T22:03:02.025240+0800 | INFO | Step 728: loss=0.2033, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:03:03.082498+0800 | INFO | Step 729: loss=0.0926, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:03:04.116858+0800 | INFO | Step 730: loss=0.2586, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:03:05.566599+0800 | INFO | Step 731: loss=0.0928, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:03:07.045210+0800 | INFO | Step 732: loss=0.3858, acc=0.835 (IF=0.857, MQ=0.812)
2025-12-21T22:03:08.511260+0800 | INFO | Step 733: loss=0.1209, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:03:09.956029+0800 | INFO | Step 734: loss=0.5360, acc=0.706 (IF=0.600, MQ=0.812)
2025-12-21T22:03:11.401344+0800 | INFO | Step 735: loss=0.2288, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T22:03:12.847642+0800 | INFO | Step 736: loss=0.1445, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:03:14.307330+0800 | INFO | Step 737: loss=0.1205, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:03:15.747762+0800 | INFO | Step 738: loss=0.3004, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:03:17.195716+0800 | INFO | Step 739: loss=0.1460, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:03:18.661565+0800 | INFO | Step 740: loss=0.1915, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:03:20.090034+0800 | INFO | Step 741: loss=0.2306, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T22:03:21.544044+0800 | INFO | Step 742: loss=0.2990, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T22:03:22.992574+0800 | INFO | Step 743: loss=0.2863, acc=0.871 (IF=0.867, MQ=0.875)
2025-12-21T22:03:24.430986+0800 | INFO | Step 744: loss=0.2053, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:03:25.888165+0800 | INFO | Step 745: loss=0.0717, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:03:27.330016+0800 | INFO | Step 746: loss=0.2168, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T22:03:28.781082+0800 | INFO | Step 747: loss=0.1886, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:03:30.223167+0800 | INFO | Step 748: loss=0.1641, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:03:31.651829+0800 | INFO | Step 749: loss=0.1827, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:03:33.102730+0800 | INFO | Step 750: loss=0.3462, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T22:03:34.530861+0800 | INFO | Step 751: loss=0.2724, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:03:35.972762+0800 | INFO | Step 752: loss=0.1447, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:03:37.014932+0800 | INFO | Step 753: loss=0.2401, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T22:03:38.433812+0800 | INFO | Step 754: loss=0.1333, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:03:39.872816+0800 | INFO | Step 755: loss=0.2416, acc=0.909 (IF=0.818, MQ=1.000)
2025-12-21T22:03:41.308126+0800 | INFO | Step 756: loss=0.2486, acc=0.866 (IF=0.857, MQ=0.875)
2025-12-21T22:03:42.749382+0800 | INFO | Step 757: loss=0.3209, acc=0.856 (IF=0.900, MQ=0.812)
2025-12-21T22:03:44.231811+0800 | INFO | Step 758: loss=0.1214, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:03:45.696442+0800 | INFO | Step 759: loss=0.1685, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:03:47.137699+0800 | INFO | Step 760: loss=0.1208, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:03:48.584553+0800 | INFO | Step 761: loss=0.2183, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T22:03:50.042469+0800 | INFO | Step 762: loss=0.1860, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:03:51.493220+0800 | INFO | Step 763: loss=0.0970, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:03:52.948228+0800 | INFO | Step 764: loss=0.2928, acc=0.902 (IF=0.929, MQ=0.875)
2025-12-21T22:03:54.395814+0800 | INFO | Step 765: loss=0.1650, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:03:55.834388+0800 | INFO | Step 766: loss=0.2759, acc=0.868 (IF=0.923, MQ=0.812)
2025-12-21T22:03:57.262245+0800 | INFO | Step 767: loss=0.1647, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:03:58.695242+0800 | INFO | Step 768: loss=0.0922, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:04:00.132509+0800 | INFO | Step 769: loss=0.5547, acc=0.825 (IF=0.900, MQ=0.750)
2025-12-21T22:04:01.168996+0800 | INFO | Step 770: loss=0.4709, acc=0.825 (IF=0.900, MQ=0.750)
2025-12-21T22:04:02.213123+0800 | INFO | Step 771: loss=0.2576, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T22:04:03.651372+0800 | INFO | Step 772: loss=0.1816, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:04:05.098190+0800 | INFO | Step 773: loss=0.1918, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:04:06.541103+0800 | INFO | Step 774: loss=0.1215, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:04:07.978781+0800 | INFO | Step 775: loss=0.1553, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:04:09.416236+0800 | INFO | Step 776: loss=0.2866, acc=0.850 (IF=0.833, MQ=0.867)
2025-12-21T22:04:12.149500+0800 | INFO | Step 777: loss=0.1003, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:04:13.590393+0800 | INFO | Step 778: loss=0.2528, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T22:04:15.025910+0800 | INFO | Step 779: loss=0.0898, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:04:16.508595+0800 | INFO | Step 780: loss=0.1401, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:04:17.955912+0800 | INFO | Step 781: loss=0.0727, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:04:19.042585+0800 | INFO | Step 782: loss=0.1888, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:04:20.518148+0800 | INFO | Step 783: loss=0.1412, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:04:21.963596+0800 | INFO | Step 784: loss=0.3198, acc=0.858 (IF=0.917, MQ=0.800)
2025-12-21T22:04:23.412862+0800 | INFO | Step 785: loss=0.1428, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:04:24.852894+0800 | INFO | Step 786: loss=0.2863, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T22:04:26.302834+0800 | INFO | Step 787: loss=0.0996, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:04:27.758028+0800 | INFO | Step 788: loss=0.2997, acc=0.856 (IF=0.900, MQ=0.812)
2025-12-21T22:04:29.213729+0800 | INFO | Step 789: loss=0.2416, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T22:04:30.637802+0800 | INFO | Step 790: loss=0.1749, acc=0.909 (IF=0.818, MQ=1.000)
2025-12-21T22:04:32.070643+0800 | INFO | Step 791: loss=0.1239, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:04:33.504338+0800 | INFO | Step 792: loss=0.1021, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:04:34.940219+0800 | INFO | Step 793: loss=0.3588, acc=0.856 (IF=0.900, MQ=0.812)
2025-12-21T22:04:36.374200+0800 | INFO | Step 794: loss=0.3742, acc=0.861 (IF=0.909, MQ=0.812)
2025-12-21T22:04:37.808688+0800 | INFO | Step 795: loss=0.0537, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:04:39.240034+0800 | INFO | Step 796: loss=0.0764, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:04:40.728585+0800 | INFO | Step 797: loss=0.1555, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:04:42.162527+0800 | INFO | Step 798: loss=0.0865, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:04:43.593015+0800 | INFO | Step 799: loss=0.2348, acc=0.858 (IF=0.778, MQ=0.938)
2025-12-21T22:04:45.021943+0800 | INFO | Step 800: loss=0.1622, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:04:52.344467+0800 | INFO |
============================================================
Validation Results (took 7.30s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.5172
Quality Acc: 0.6750
Average Acc: 0.5961
Total Loss: 0.6794
Instruction Loss: 0.6869
Quality Loss: 0.6719
============================================================
2025-12-21T22:04:54.941101+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_700.pt
2025-12-21T22:04:54.941603+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:04:54.941699+0800 | INFO | 1. Step 800: acc=0.5961 (reward_model.best_800.pt)
2025-12-21T22:04:54.941762+0800 | INFO | 2. Step 200: acc=0.5593 (reward_model.best_200.pt)
2025-12-21T22:04:54.941814+0800 | INFO | 3. Step 400: acc=0.5593 (reward_model.best_400.pt)
2025-12-21T22:04:56.413125+0800 | INFO | Step 801: loss=0.1469, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T22:04:57.482385+0800 | INFO | Step 802: loss=0.1647, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:04:58.947523+0800 | INFO | Step 803: loss=0.2062, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:05:00.391239+0800 | INFO | Step 804: loss=0.2885, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:05:01.879205+0800 | INFO | Step 805: loss=0.1632, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:05:03.295916+0800 | INFO | Step 806: loss=0.1877, acc=0.929 (IF=0.857, MQ=1.000)
2025-12-21T22:05:04.767371+0800 | INFO | Step 807: loss=0.1903, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:05:06.219496+0800 | INFO | Step 808: loss=0.3224, acc=0.802 (IF=0.667, MQ=0.938)
2025-12-21T22:05:07.666785+0800 | INFO | Step 809: loss=0.2535, acc=0.851 (IF=0.889, MQ=0.812)
2025-12-21T22:05:09.117457+0800 | INFO | Step 810: loss=0.1522, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:05:10.553536+0800 | INFO | Step 811: loss=0.1764, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T22:05:12.012733+0800 | INFO | Step 812: loss=0.1437, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:05:13.461668+0800 | INFO | Step 813: loss=0.2204, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:05:14.913582+0800 | INFO | Step 814: loss=0.1936, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T22:05:16.355206+0800 | INFO | Step 815: loss=0.1418, acc=0.923 (IF=0.846, MQ=1.000)
2025-12-21T22:05:17.817688+0800 | INFO | Step 816: loss=0.1043, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:05:19.249926+0800 | INFO | Step 817: loss=0.1223, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:05:20.689496+0800 | INFO | Step 818: loss=0.2495, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T22:05:22.180671+0800 | INFO | Step 819: loss=0.2289, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:05:23.607560+0800 | INFO | Step 820: loss=0.3986, acc=0.822 (IF=0.769, MQ=0.875)
2025-12-21T22:05:25.041682+0800 | INFO | Step 821: loss=0.2453, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:05:26.535801+0800 | INFO | Step 822: loss=0.2869, acc=0.875 (IF=1.000, MQ=0.750)
2025-12-21T22:05:27.997571+0800 | INFO | Step 823: loss=0.1070, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:05:29.441971+0800 | INFO | Step 824: loss=0.2250, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:05:30.969955+0800 | INFO | Step 825: loss=0.2209, acc=0.892 (IF=0.917, MQ=0.867)
2025-12-21T22:05:32.453353+0800 | INFO | Step 826: loss=0.1034, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:05:33.939771+0800 | INFO | Step 827: loss=0.2222, acc=0.935 (IF=0.933, MQ=0.938)
2025-12-21T22:05:35.379311+0800 | INFO | Step 828: loss=0.1776, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:05:36.832308+0800 | INFO | Step 829: loss=0.1761, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:05:38.318820+0800 | INFO | Step 830: loss=0.1376, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:05:39.757566+0800 | INFO | Step 831: loss=0.1993, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:05:42.341128+0800 | INFO | Step 832: loss=0.1066, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:05:43.861626+0800 | INFO | Step 833: loss=0.1763, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:05:45.344270+0800 | INFO | Step 834: loss=0.3534, acc=0.767 (IF=0.667, MQ=0.867)
2025-12-21T22:05:46.830444+0800 | INFO | Step 835: loss=0.2453, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T22:05:48.279735+0800 | INFO | Step 836: loss=0.1451, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:05:49.737214+0800 | INFO | Step 837: loss=0.0646, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:05:51.192520+0800 | INFO | Step 838: loss=0.1856, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:05:52.626008+0800 | INFO | Step 839: loss=0.3604, acc=0.792 (IF=0.833, MQ=0.750)
2025-12-21T22:05:54.063180+0800 | INFO | Step 840: loss=0.1690, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T22:05:55.504962+0800 | INFO | Step 841: loss=0.0997, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:05:56.960077+0800 | INFO | Step 842: loss=0.1789, acc=0.909 (IF=0.818, MQ=1.000)
2025-12-21T22:05:58.429257+0800 | INFO | Step 843: loss=0.2993, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:05:59.887379+0800 | INFO | Step 844: loss=0.2283, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T22:06:01.341506+0800 | INFO | Step 845: loss=0.2531, acc=0.844 (IF=0.875, MQ=0.812)
2025-12-21T22:06:02.820444+0800 | INFO | Step 846: loss=0.2240, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:06:04.262262+0800 | INFO | Step 847: loss=0.4194, acc=0.865 (IF=0.917, MQ=0.812)
2025-12-21T22:06:05.726552+0800 | INFO | Step 848: loss=0.1635, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T22:06:07.179540+0800 | INFO | Step 849: loss=0.2332, acc=0.833 (IF=0.917, MQ=0.750)
2025-12-21T22:06:08.256459+0800 | INFO | Step 850: loss=0.2878, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:06:09.715404+0800 | INFO | Step 851: loss=0.2125, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:06:11.166571+0800 | INFO | Step 852: loss=0.1968, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:06:12.635510+0800 | INFO | Step 853: loss=0.2657, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:06:14.118822+0800 | INFO | Step 854: loss=0.0720, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:06:15.570158+0800 | INFO | Step 855: loss=0.2077, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:06:17.056990+0800 | INFO | Step 856: loss=0.1180, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:06:18.525539+0800 | INFO | Step 857: loss=0.2545, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T22:06:19.969682+0800 | INFO | Step 858: loss=0.1368, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:06:21.431395+0800 | INFO | Step 859: loss=0.0972, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:06:22.885520+0800 | INFO | Step 860: loss=0.1539, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:06:24.331233+0800 | INFO | Step 861: loss=0.0995, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:06:25.800902+0800 | INFO | Step 862: loss=0.1983, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:06:27.267838+0800 | INFO | Step 863: loss=0.1995, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:06:28.696577+0800 | INFO | Step 864: loss=0.1316, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:06:30.131553+0800 | INFO | Step 865: loss=0.0632, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:06:31.572793+0800 | INFO | Step 866: loss=0.1277, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:06:33.019041+0800 | INFO | Step 867: loss=0.1126, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:06:34.485944+0800 | INFO | Step 868: loss=0.1982, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T22:06:35.939701+0800 | INFO | Step 869: loss=0.1623, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:06:37.369846+0800 | INFO | Step 870: loss=0.1317, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:06:38.808841+0800 | INFO | Step 871: loss=0.0886, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:06:39.876226+0800 | INFO | Step 872: loss=0.0957, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:06:41.308865+0800 | INFO | Step 873: loss=0.1782, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:06:42.408596+0800 | INFO | Step 874: loss=0.1789, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:06:43.864134+0800 | INFO | Step 875: loss=0.1668, acc=0.900 (IF=0.800, MQ=1.000)
2025-12-21T22:06:45.289885+0800 | INFO | Step 876: loss=0.1078, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:06:46.744009+0800 | INFO | Step 877: loss=0.1677, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:06:48.190293+0800 | INFO | Step 878: loss=0.1799, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:06:49.632301+0800 | INFO | Step 879: loss=0.1634, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:06:51.071113+0800 | INFO | Step 880: loss=0.2894, acc=0.851 (IF=0.889, MQ=0.812)
2025-12-21T22:06:52.517039+0800 | INFO | Step 881: loss=0.1495, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:06:53.958966+0800 | INFO | Step 882: loss=0.2749, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T22:06:55.442532+0800 | INFO | Step 883: loss=0.1604, acc=0.900 (IF=1.000, MQ=0.800)
2025-12-21T22:06:56.909486+0800 | INFO | Step 884: loss=0.1319, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:06:58.351775+0800 | INFO | Step 885: loss=0.3728, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:06:59.806480+0800 | INFO | Step 886: loss=0.1168, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:07:01.238041+0800 | INFO | Step 887: loss=0.0919, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:07:03.905939+0800 | INFO | Step 888: loss=0.4050, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T22:07:05.420924+0800 | INFO | Step 889: loss=0.0438, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:07:06.890746+0800 | INFO | Step 890: loss=0.2091, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:07:08.357517+0800 | INFO | Step 891: loss=0.0972, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:07:09.811380+0800 | INFO | Step 892: loss=0.1388, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:07:11.268029+0800 | INFO | Step 893: loss=0.2191, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:07:12.721250+0800 | INFO | Step 894: loss=0.4305, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:07:14.199788+0800 | INFO | Step 895: loss=0.1818, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:07:15.291826+0800 | INFO | Step 896: loss=0.0405, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:07:16.732882+0800 | INFO | Step 897: loss=0.1244, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:07:18.216059+0800 | INFO | Step 898: loss=0.0441, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:07:19.669015+0800 | INFO | Step 899: loss=0.1159, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:07:21.141363+0800 | INFO | Step 900: loss=0.2849, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:07:28.501350+0800 | INFO |
============================================================
Validation Results (took 7.33s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.5000
Quality Acc: 0.6750
Average Acc: 0.5875
Total Loss: 0.6791
Instruction Loss: 0.6864
Quality Loss: 0.6718
============================================================
2025-12-21T22:07:31.052419+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_400.pt
2025-12-21T22:07:31.052894+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:07:31.053471+0800 | INFO | 1. Step 800: acc=0.5961 (reward_model.best_800.pt)
2025-12-21T22:07:31.053557+0800 | INFO | 2. Step 900: acc=0.5875 (reward_model.best_900.pt)
2025-12-21T22:07:31.053607+0800 | INFO | 3. Step 200: acc=0.5593 (reward_model.best_200.pt)
2025-12-21T22:07:32.515060+0800 | INFO | Step 901: loss=0.0587, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:07:33.979151+0800 | INFO | Step 902: loss=0.2023, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:07:35.410117+0800 | INFO | Step 903: loss=0.1028, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:07:36.838478+0800 | INFO | Step 904: loss=0.0725, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:07:37.859830+0800 | INFO | Step 905: loss=0.1323, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:07:39.304480+0800 | INFO | Step 906: loss=0.1008, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:07:40.751166+0800 | INFO | Step 907: loss=0.2920, acc=0.883 (IF=0.833, MQ=0.933)
2025-12-21T22:07:42.214748+0800 | INFO | Step 908: loss=0.2295, acc=0.875 (IF=1.000, MQ=0.750)
2025-12-21T22:07:43.658687+0800 | INFO | Step 909: loss=0.1644, acc=0.923 (IF=0.846, MQ=1.000)
2025-12-21T22:07:45.094226+0800 | INFO | Step 910: loss=0.2126, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T22:07:46.558517+0800 | INFO | Step 911: loss=0.0968, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:07:48.017772+0800 | INFO | Step 912: loss=0.1648, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:07:49.592975+0800 | INFO | Step 913: loss=0.1552, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:07:50.680967+0800 | INFO | Step 914: loss=0.2341, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:07:52.144536+0800 | INFO | Step 915: loss=0.3916, acc=0.861 (IF=0.909, MQ=0.812)
2025-12-21T22:07:53.593835+0800 | INFO | Step 916: loss=0.1850, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:07:55.106611+0800 | INFO | Step 917: loss=0.2137, acc=0.875 (IF=0.875, MQ=0.875)
2025-12-21T22:07:56.545601+0800 | INFO | Step 918: loss=0.0680, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:07:57.990438+0800 | INFO | Step 919: loss=0.4874, acc=0.829 (IF=0.846, MQ=0.812)
2025-12-21T22:07:59.484838+0800 | INFO | Step 920: loss=0.2947, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T22:08:00.935390+0800 | INFO | Step 921: loss=0.2044, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T22:08:02.416428+0800 | INFO | Step 922: loss=0.1188, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:08:03.845089+0800 | INFO | Step 923: loss=0.1052, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:08:05.278706+0800 | INFO | Step 924: loss=0.1356, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:08:06.737798+0800 | INFO | Step 925: loss=0.2341, acc=0.885 (IF=0.769, MQ=1.000)
2025-12-21T22:08:08.234696+0800 | INFO | Step 926: loss=0.1206, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:08:09.681720+0800 | INFO | Step 927: loss=0.1837, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:08:11.127114+0800 | INFO | Step 928: loss=0.2442, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:08:12.583493+0800 | INFO | Step 929: loss=0.0659, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:08:14.066797+0800 | INFO | Step 930: loss=0.3253, acc=0.861 (IF=0.909, MQ=0.812)
2025-12-21T22:08:15.515813+0800 | INFO | Step 931: loss=0.5107, acc=0.771 (IF=0.667, MQ=0.875)
2025-12-21T22:08:16.975312+0800 | INFO | Step 932: loss=0.1376, acc=0.875 (IF=0.875, MQ=0.875)
2025-12-21T22:08:18.410531+0800 | INFO | Step 933: loss=0.1324, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:08:19.862108+0800 | INFO | Step 934: loss=0.1222, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:08:21.301983+0800 | INFO | Step 935: loss=0.3214, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T22:08:22.742972+0800 | INFO | Step 936: loss=0.2794, acc=0.806 (IF=0.800, MQ=0.812)
2025-12-21T22:08:24.183617+0800 | INFO | Step 937: loss=0.2157, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:08:25.624954+0800 | INFO | Step 938: loss=0.1285, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:08:26.672605+0800 | INFO | Step 939: loss=0.1223, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:08:28.113609+0800 | INFO | Step 940: loss=0.0829, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:08:29.564201+0800 | INFO | Step 941: loss=0.0975, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:08:31.024763+0800 | INFO | Step 942: loss=0.0771, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T22:08:33.690723+0800 | INFO | Step 943: loss=0.0861, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:08:35.134733+0800 | INFO | Step 944: loss=0.1560, acc=0.917 (IF=0.833, MQ=1.000)
2025-12-21T22:08:36.594631+0800 | INFO | Step 945: loss=0.1522, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:08:38.026939+0800 | INFO | Step 946: loss=0.1159, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:08:39.473624+0800 | INFO | Step 947: loss=0.1258, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:08:40.915309+0800 | INFO | Step 948: loss=0.3628, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:08:42.341940+0800 | INFO | Step 949: loss=0.0577, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:08:43.769097+0800 | INFO | Step 950: loss=0.2013, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T22:08:45.210172+0800 | INFO | Step 951: loss=0.1502, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:08:46.674314+0800 | INFO | Step 952: loss=0.2006, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:08:47.701739+0800 | INFO | Step 953: loss=0.0450, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:08:49.143847+0800 | INFO | Step 954: loss=0.0838, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:08:50.578990+0800 | INFO | Step 955: loss=0.0521, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:08:52.024894+0800 | INFO | Step 956: loss=0.0959, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:08:53.456684+0800 | INFO | Step 957: loss=0.1297, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T22:08:54.894997+0800 | INFO | Step 958: loss=0.3347, acc=0.923 (IF=0.846, MQ=1.000)
2025-12-21T22:08:56.343476+0800 | INFO | Step 959: loss=0.0949, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:08:57.802808+0800 | INFO | Step 960: loss=0.3105, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T22:08:59.251612+0800 | INFO | Step 961: loss=0.1674, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:09:00.714187+0800 | INFO | Step 962: loss=0.2411, acc=0.868 (IF=0.923, MQ=0.812)
2025-12-21T22:09:02.161234+0800 | INFO | Step 963: loss=0.0480, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:09:03.614896+0800 | INFO | Step 964: loss=0.2352, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T22:09:05.066443+0800 | INFO | Step 965: loss=0.1285, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:09:06.115093+0800 | INFO | Step 966: loss=0.2118, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T22:09:07.542450+0800 | INFO | Step 967: loss=0.0960, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T22:09:08.978244+0800 | INFO | Step 968: loss=0.1377, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:09:10.447709+0800 | INFO | Step 969: loss=0.3394, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:09:11.919398+0800 | INFO | Step 970: loss=0.0451, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:09:13.341619+0800 | INFO | Step 971: loss=0.1377, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:09:14.779221+0800 | INFO | Step 972: loss=0.1588, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:09:16.223157+0800 | INFO | Step 973: loss=0.0598, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:09:17.658640+0800 | INFO | Step 974: loss=0.1310, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:09:19.100563+0800 | INFO | Step 975: loss=0.1613, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:09:20.594025+0800 | INFO | Step 976: loss=0.2183, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:09:22.051328+0800 | INFO | Step 977: loss=0.0793, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:09:23.517843+0800 | INFO | Step 978: loss=0.1987, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:09:24.982310+0800 | INFO | Step 979: loss=0.3023, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:09:26.427030+0800 | INFO | Step 980: loss=0.4393, acc=0.850 (IF=0.900, MQ=0.800)
2025-12-21T22:09:27.891877+0800 | INFO | Step 981: loss=0.1584, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:09:29.396594+0800 | INFO | Step 982: loss=0.5053, acc=0.760 (IF=0.769, MQ=0.750)
2025-12-21T22:09:30.825808+0800 | INFO | Step 983: loss=0.0899, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:09:32.275791+0800 | INFO | Step 984: loss=0.1692, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T22:09:33.724580+0800 | INFO | Step 985: loss=0.3674, acc=0.865 (IF=0.917, MQ=0.812)
2025-12-21T22:09:35.193048+0800 | INFO | Step 986: loss=0.1639, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:09:36.650168+0800 | INFO | Step 987: loss=0.2331, acc=0.865 (IF=0.917, MQ=0.812)
2025-12-21T22:09:38.146568+0800 | INFO | Step 988: loss=0.2297, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:09:39.593183+0800 | INFO | Step 989: loss=0.0856, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:09:41.065965+0800 | INFO | Step 990: loss=0.2132, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:09:42.486939+0800 | INFO | Step 991: loss=0.2839, acc=0.866 (IF=0.857, MQ=0.875)
2025-12-21T22:09:43.952765+0800 | INFO | Step 992: loss=0.1689, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:09:45.420675+0800 | INFO | Step 993: loss=0.1365, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:09:46.866999+0800 | INFO | Step 994: loss=0.2269, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:09:48.320022+0800 | INFO | Step 995: loss=0.1587, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:09:49.358285+0800 | INFO | Step 996: loss=0.3668, acc=0.812 (IF=0.750, MQ=0.875)
2025-12-21T22:09:50.820660+0800 | INFO | Step 997: loss=0.2201, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T22:09:52.258304+0800 | INFO | Step 998: loss=0.2631, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:09:55.156877+0800 | INFO | Step 999: loss=0.1566, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:09:56.632952+0800 | INFO | Step 1000: loss=0.1040, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:10:03.889861+0800 | INFO |
============================================================
Validation Results (took 7.23s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.5862
Quality Acc: 0.6875
Average Acc: 0.6369
Total Loss: 0.6769
Instruction Loss: 0.6837
Quality Loss: 0.6701
============================================================
2025-12-21T22:10:06.530471+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_200.pt
2025-12-21T22:10:06.530962+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:10:06.531055+0800 | INFO | 1. Step 1000: acc=0.6369 (reward_model.best_1000.pt)
2025-12-21T22:10:06.531117+0800 | INFO | 2. Step 800: acc=0.5961 (reward_model.best_800.pt)
2025-12-21T22:10:06.531167+0800 | INFO | 3. Step 900: acc=0.5875 (reward_model.best_900.pt)
2025-12-21T22:10:09.192514+0800 | INFO | Step 1000: Saved to /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.1000.pt
2025-12-21T22:10:10.674099+0800 | INFO | Step 1001: loss=0.1991, acc=0.921 (IF=0.909, MQ=0.933)
2025-12-21T22:10:12.135200+0800 | INFO | Step 1002: loss=0.1198, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:10:13.575320+0800 | INFO | Step 1003: loss=0.1862, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T22:10:15.018970+0800 | INFO | Step 1004: loss=0.1488, acc=0.929 (IF=0.857, MQ=1.000)
2025-12-21T22:10:16.460503+0800 | INFO | Step 1005: loss=0.0440, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:10:17.920393+0800 | INFO | Step 1006: loss=0.1442, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:10:19.357986+0800 | INFO | Step 1007: loss=0.1245, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:10:20.803693+0800 | INFO | Step 1008: loss=0.0609, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:10:22.243146+0800 | INFO | Step 1009: loss=0.1153, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:10:23.717949+0800 | INFO | Step 1010: loss=0.2743, acc=0.856 (IF=0.900, MQ=0.812)
2025-12-21T22:10:24.745446+0800 | INFO | Step 1011: loss=0.1852, acc=0.875 (IF=1.000, MQ=0.750)
2025-12-21T22:10:25.791290+0800 | INFO | Step 1012: loss=0.1232, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:10:27.232963+0800 | INFO | Step 1013: loss=0.2152, acc=0.917 (IF=0.833, MQ=1.000)
2025-12-21T22:10:28.686961+0800 | INFO | Step 1014: loss=0.1557, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:10:30.119836+0800 | INFO | Step 1015: loss=0.1543, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:10:31.561341+0800 | INFO | Step 1016: loss=0.1001, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:10:33.014035+0800 | INFO | Step 1017: loss=0.0723, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:10:34.453863+0800 | INFO | Step 1018: loss=0.1535, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:10:35.895927+0800 | INFO | Step 1019: loss=0.0953, acc=0.900 (IF=0.800, MQ=1.000)
2025-12-21T22:10:37.369553+0800 | INFO | Step 1020: loss=0.2482, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:10:38.822184+0800 | INFO | Step 1021: loss=0.1323, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:10:40.251116+0800 | INFO | Step 1022: loss=0.1451, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:10:41.678826+0800 | INFO | Step 1023: loss=0.2916, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:10:43.131315+0800 | INFO | Step 1024: loss=0.1935, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:10:44.579663+0800 | INFO | Step 1025: loss=0.1209, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:10:46.053707+0800 | INFO | Step 1026: loss=0.3311, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T22:10:47.494416+0800 | INFO | Step 1027: loss=0.1665, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:10:48.933942+0800 | INFO | Step 1028: loss=0.1007, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:10:50.374785+0800 | INFO | Step 1029: loss=0.3052, acc=0.806 (IF=0.800, MQ=0.812)
2025-12-21T22:10:51.869219+0800 | INFO | Step 1030: loss=0.0371, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:10:52.911796+0800 | INFO | Step 1031: loss=0.0884, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:10:54.363615+0800 | INFO | Step 1032: loss=0.3610, acc=0.861 (IF=0.909, MQ=0.812)
2025-12-21T22:10:55.805540+0800 | INFO | Step 1033: loss=0.1047, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:10:57.251184+0800 | INFO | Step 1034: loss=0.1052, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:10:58.686778+0800 | INFO | Step 1035: loss=0.0730, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:11:00.138476+0800 | INFO | Step 1036: loss=0.1545, acc=0.900 (IF=0.800, MQ=1.000)
2025-12-21T22:11:01.580562+0800 | INFO | Step 1037: loss=0.3698, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T22:11:02.651323+0800 | INFO | Step 1038: loss=0.3707, acc=0.864 (IF=0.727, MQ=1.000)
2025-12-21T22:11:04.089149+0800 | INFO | Step 1039: loss=0.0297, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:11:05.552425+0800 | INFO | Step 1040: loss=0.1109, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:11:06.999889+0800 | INFO | Step 1041: loss=0.1254, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:11:08.447569+0800 | INFO | Step 1042: loss=0.2515, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T22:11:09.890315+0800 | INFO | Step 1043: loss=0.1035, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:11:11.340387+0800 | INFO | Step 1044: loss=0.2328, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T22:11:12.788976+0800 | INFO | Step 1045: loss=0.1415, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:11:14.227276+0800 | INFO | Step 1046: loss=0.1400, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:11:15.674634+0800 | INFO | Step 1047: loss=0.0640, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:11:17.115843+0800 | INFO | Step 1048: loss=0.1160, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:11:18.557144+0800 | INFO | Step 1049: loss=0.2113, acc=0.897 (IF=0.857, MQ=0.938)
2025-12-21T22:11:20.008832+0800 | INFO | Step 1050: loss=0.3489, acc=0.838 (IF=0.800, MQ=0.875)
2025-12-21T22:11:21.446331+0800 | INFO | Step 1051: loss=0.0512, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:11:22.892027+0800 | INFO | Step 1052: loss=0.0819, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:11:24.362042+0800 | INFO | Step 1053: loss=0.2837, acc=0.795 (IF=0.778, MQ=0.812)
2025-12-21T22:11:27.100815+0800 | INFO | Step 1054: loss=0.1485, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:11:28.601437+0800 | INFO | Step 1055: loss=0.2650, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:11:30.101409+0800 | INFO | Step 1056: loss=0.1717, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:11:31.150035+0800 | INFO | Step 1057: loss=0.0793, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:11:32.633476+0800 | INFO | Step 1058: loss=0.1131, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:11:34.095195+0800 | INFO | Step 1059: loss=0.1523, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:11:35.182494+0800 | INFO | Step 1060: loss=0.1697, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:11:36.634263+0800 | INFO | Step 1061: loss=0.2400, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:11:38.073616+0800 | INFO | Step 1062: loss=0.1660, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:11:39.522720+0800 | INFO | Step 1063: loss=0.1852, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:11:40.971920+0800 | INFO | Step 1064: loss=0.1154, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:11:42.426971+0800 | INFO | Step 1065: loss=0.1436, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:11:43.904600+0800 | INFO | Step 1066: loss=0.3283, acc=0.898 (IF=0.929, MQ=0.867)
2025-12-21T22:11:45.356167+0800 | INFO | Step 1067: loss=0.1533, acc=0.923 (IF=0.846, MQ=1.000)
2025-12-21T22:11:46.806111+0800 | INFO | Step 1068: loss=0.2681, acc=0.856 (IF=0.900, MQ=0.812)
2025-12-21T22:11:48.259012+0800 | INFO | Step 1069: loss=0.0450, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:11:49.780810+0800 | INFO | Step 1070: loss=0.2293, acc=0.875 (IF=0.875, MQ=0.875)
2025-12-21T22:11:51.210283+0800 | INFO | Step 1071: loss=0.1175, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:11:52.674884+0800 | INFO | Step 1072: loss=0.2728, acc=0.802 (IF=0.667, MQ=0.938)
2025-12-21T22:11:54.100933+0800 | INFO | Step 1073: loss=0.1698, acc=0.921 (IF=0.909, MQ=0.933)
2025-12-21T22:11:55.561907+0800 | INFO | Step 1074: loss=0.0879, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:11:57.018764+0800 | INFO | Step 1075: loss=0.0355, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:11:58.532748+0800 | INFO | Step 1076: loss=0.1213, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:12:00.120743+0800 | INFO | Step 1077: loss=0.1538, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:12:01.590568+0800 | INFO | Step 1078: loss=0.1543, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:12:03.031659+0800 | INFO | Step 1079: loss=0.1050, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:12:04.099637+0800 | INFO | Step 1080: loss=0.1051, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:12:05.551305+0800 | INFO | Step 1081: loss=0.3401, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T22:12:06.991371+0800 | INFO | Step 1082: loss=0.2947, acc=0.861 (IF=0.909, MQ=0.812)
2025-12-21T22:12:08.437974+0800 | INFO | Step 1083: loss=0.1302, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:12:10.016513+0800 | INFO | Step 1084: loss=0.0719, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:12:11.484801+0800 | INFO | Step 1085: loss=0.1716, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:12:12.927293+0800 | INFO | Step 1086: loss=0.0460, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:12:14.390619+0800 | INFO | Step 1087: loss=0.0705, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:12:15.897604+0800 | INFO | Step 1088: loss=0.1484, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:12:17.395637+0800 | INFO | Step 1089: loss=0.2050, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:12:18.876977+0800 | INFO | Step 1090: loss=0.0569, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:12:20.323405+0800 | INFO | Step 1091: loss=0.4019, acc=0.833 (IF=0.917, MQ=0.750)
2025-12-21T22:12:21.796698+0800 | INFO | Step 1092: loss=0.0985, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:12:23.237381+0800 | INFO | Step 1093: loss=0.0677, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:12:24.688545+0800 | INFO | Step 1094: loss=0.4149, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:12:26.119457+0800 | INFO | Step 1095: loss=0.3377, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T22:12:27.566683+0800 | INFO | Step 1096: loss=0.2552, acc=0.878 (IF=0.889, MQ=0.867)
2025-12-21T22:12:29.009458+0800 | INFO | Step 1097: loss=0.2008, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T22:12:30.486802+0800 | INFO | Step 1098: loss=0.0854, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:12:31.943540+0800 | INFO | Step 1099: loss=0.1901, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T22:12:33.387786+0800 | INFO | Step 1100: loss=0.2578, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:12:40.976739+0800 | INFO |
============================================================
Validation Results (took 7.56s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.4828
Quality Acc: 0.6875
Average Acc: 0.5851
Total Loss: 0.6769
Instruction Loss: 0.6872
Quality Loss: 0.6665
============================================================
2025-12-21T22:12:43.678970+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_1100.pt
2025-12-21T22:12:43.679482+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:12:43.679585+0800 | INFO | 1. Step 1000: acc=0.6369 (reward_model.best_1000.pt)
2025-12-21T22:12:43.679650+0800 | INFO | 2. Step 800: acc=0.5961 (reward_model.best_800.pt)
2025-12-21T22:12:43.679705+0800 | INFO | 3. Step 900: acc=0.5875 (reward_model.best_900.pt)
2025-12-21T22:12:45.162560+0800 | INFO | Step 1101: loss=0.0577, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:12:46.603967+0800 | INFO | Step 1102: loss=0.2133, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:12:48.054017+0800 | INFO | Step 1103: loss=0.0903, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:12:49.502130+0800 | INFO | Step 1104: loss=0.0613, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:12:50.946380+0800 | INFO | Step 1105: loss=0.1051, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:12:52.375810+0800 | INFO | Step 1106: loss=0.1240, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:12:53.462066+0800 | INFO | Step 1107: loss=0.2768, acc=0.844 (IF=0.750, MQ=0.938)
2025-12-21T22:12:54.900069+0800 | INFO | Step 1108: loss=0.4138, acc=0.861 (IF=0.909, MQ=0.812)
2025-12-21T22:12:56.358336+0800 | INFO | Step 1109: loss=0.0580, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:12:58.943138+0800 | INFO | Step 1110: loss=0.0614, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:13:00.378794+0800 | INFO | Step 1111: loss=0.2006, acc=0.889 (IF=0.778, MQ=1.000)
2025-12-21T22:13:01.811569+0800 | INFO | Step 1112: loss=0.0401, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:13:03.377740+0800 | INFO | Step 1113: loss=0.0705, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:13:04.832043+0800 | INFO | Step 1114: loss=0.1199, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:13:06.270604+0800 | INFO | Step 1115: loss=0.1595, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:13:07.709991+0800 | INFO | Step 1116: loss=0.0613, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:13:09.140363+0800 | INFO | Step 1117: loss=0.2183, acc=0.904 (IF=0.933, MQ=0.875)
2025-12-21T22:13:10.592732+0800 | INFO | Step 1118: loss=0.3954, acc=0.822 (IF=0.778, MQ=0.867)
2025-12-21T22:13:12.095935+0800 | INFO | Step 1119: loss=0.0656, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:13:13.541754+0800 | INFO | Step 1120: loss=0.1109, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:13:14.986424+0800 | INFO | Step 1121: loss=0.0809, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:13:16.436558+0800 | INFO | Step 1122: loss=0.1737, acc=0.909 (IF=0.818, MQ=1.000)
2025-12-21T22:13:17.885447+0800 | INFO | Step 1123: loss=0.2877, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T22:13:19.332355+0800 | INFO | Step 1124: loss=0.1139, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:13:20.783119+0800 | INFO | Step 1125: loss=0.0980, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:13:22.236726+0800 | INFO | Step 1126: loss=0.0401, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:13:23.713209+0800 | INFO | Step 1127: loss=0.0772, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:13:25.157689+0800 | INFO | Step 1128: loss=0.2778, acc=0.861 (IF=0.846, MQ=0.875)
2025-12-21T22:13:26.583240+0800 | INFO | Step 1129: loss=0.0454, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:13:28.024765+0800 | INFO | Step 1130: loss=0.0659, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:13:29.468582+0800 | INFO | Step 1131: loss=0.1058, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:13:30.916180+0800 | INFO | Step 1132: loss=0.2275, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:13:32.361237+0800 | INFO | Step 1133: loss=0.2395, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:13:33.792557+0800 | INFO | Step 1134: loss=0.1689, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T22:13:35.229378+0800 | INFO | Step 1135: loss=0.2924, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T22:13:36.654889+0800 | INFO | Step 1136: loss=0.1559, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T22:13:38.086866+0800 | INFO | Step 1137: loss=0.2986, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T22:13:39.520175+0800 | INFO | Step 1138: loss=0.2948, acc=0.851 (IF=0.889, MQ=0.812)
2025-12-21T22:13:40.956511+0800 | INFO | Step 1139: loss=0.2669, acc=0.875 (IF=1.000, MQ=0.750)
2025-12-21T22:13:42.413131+0800 | INFO | Step 1140: loss=0.1065, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:13:43.858947+0800 | INFO | Step 1141: loss=0.0851, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:13:45.299662+0800 | INFO | Step 1142: loss=0.1478, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:13:46.753210+0800 | INFO | Step 1143: loss=0.1416, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:13:48.188361+0800 | INFO | Step 1144: loss=0.1184, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:13:49.220423+0800 | INFO | Step 1145: loss=0.1417, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:13:50.665538+0800 | INFO | Step 1146: loss=0.1742, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:13:52.103309+0800 | INFO | Step 1147: loss=0.1637, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:13:53.575959+0800 | INFO | Step 1148: loss=0.0973, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:13:55.003503+0800 | INFO | Step 1149: loss=0.1142, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:13:56.449260+0800 | INFO | Step 1150: loss=0.3576, acc=0.861 (IF=0.846, MQ=0.875)
2025-12-21T22:13:57.893867+0800 | INFO | Step 1151: loss=0.2023, acc=0.917 (IF=0.833, MQ=1.000)
2025-12-21T22:13:59.338147+0800 | INFO | Step 1152: loss=0.1957, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T22:14:00.766503+0800 | INFO | Step 1153: loss=0.1034, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:14:02.204910+0800 | INFO | Step 1154: loss=0.2985, acc=0.851 (IF=0.889, MQ=0.812)
2025-12-21T22:14:03.647468+0800 | INFO | Step 1155: loss=0.0783, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:14:05.087223+0800 | INFO | Step 1156: loss=0.0990, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:14:06.128512+0800 | INFO | Step 1157: loss=0.0771, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:14:07.571688+0800 | INFO | Step 1158: loss=0.4123, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T22:14:09.034700+0800 | INFO | Step 1159: loss=0.1031, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:14:10.476472+0800 | INFO | Step 1160: loss=0.2221, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:14:11.896068+0800 | INFO | Step 1161: loss=0.2232, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:14:13.323629+0800 | INFO | Step 1162: loss=0.1066, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:14:14.759434+0800 | INFO | Step 1163: loss=0.1325, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:14:16.204536+0800 | INFO | Step 1164: loss=0.1107, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:14:18.792115+0800 | INFO | Step 1165: loss=0.0638, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:14:20.241037+0800 | INFO | Step 1166: loss=0.2913, acc=0.876 (IF=0.818, MQ=0.933)
2025-12-21T22:14:21.711207+0800 | INFO | Step 1167: loss=0.1007, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:14:23.160813+0800 | INFO | Step 1168: loss=0.1662, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:14:24.668583+0800 | INFO | Step 1169: loss=0.1651, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:14:26.132667+0800 | INFO | Step 1170: loss=0.1916, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T22:14:27.615320+0800 | INFO | Step 1171: loss=0.1155, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:14:29.079158+0800 | INFO | Step 1172: loss=0.0787, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:14:30.526358+0800 | INFO | Step 1173: loss=0.0887, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:14:31.989176+0800 | INFO | Step 1174: loss=0.1508, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T22:14:33.452319+0800 | INFO | Step 1175: loss=0.1743, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:14:34.907171+0800 | INFO | Step 1176: loss=0.1946, acc=0.921 (IF=0.909, MQ=0.933)
2025-12-21T22:14:36.353073+0800 | INFO | Step 1177: loss=0.2558, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:14:37.782297+0800 | INFO | Step 1178: loss=0.0634, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:14:39.230020+0800 | INFO | Step 1179: loss=0.1802, acc=0.826 (IF=0.778, MQ=0.875)
2025-12-21T22:14:40.698516+0800 | INFO | Step 1180: loss=0.0700, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:14:42.135540+0800 | INFO | Step 1181: loss=0.1525, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:14:43.170506+0800 | INFO | Step 1182: loss=0.3163, acc=0.866 (IF=0.857, MQ=0.875)
2025-12-21T22:14:44.615304+0800 | INFO | Step 1183: loss=0.2581, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:14:46.057496+0800 | INFO | Step 1184: loss=0.0815, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:14:47.495488+0800 | INFO | Step 1185: loss=0.1251, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:14:48.917202+0800 | INFO | Step 1186: loss=0.0823, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:14:50.381198+0800 | INFO | Step 1187: loss=0.0528, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:14:51.812697+0800 | INFO | Step 1188: loss=0.3734, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:14:53.266211+0800 | INFO | Step 1189: loss=0.0472, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:14:54.765172+0800 | INFO | Step 1190: loss=0.2274, acc=0.893 (IF=0.786, MQ=1.000)
2025-12-21T22:14:56.395476+0800 | INFO | Step 1191: loss=0.0758, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:14:58.003480+0800 | INFO | Step 1192: loss=0.0965, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:14:59.609390+0800 | INFO | Step 1193: loss=0.2488, acc=0.867 (IF=1.000, MQ=0.733)
2025-12-21T22:15:01.077223+0800 | INFO | Step 1194: loss=0.0460, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:15:02.526847+0800 | INFO | Step 1195: loss=0.2509, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T22:15:03.975007+0800 | INFO | Step 1196: loss=0.3277, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T22:15:05.462292+0800 | INFO | Step 1197: loss=0.1043, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:15:06.900101+0800 | INFO | Step 1198: loss=0.0454, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:15:08.373659+0800 | INFO | Step 1199: loss=0.0788, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:15:09.821437+0800 | INFO | Step 1200: loss=0.1604, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:15:17.423640+0800 | INFO |
============================================================
Validation Results (took 7.57s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.5690
Quality Acc: 0.6625
Average Acc: 0.6157
Total Loss: 0.6746
Instruction Loss: 0.6822
Quality Loss: 0.6669
============================================================
2025-12-21T22:15:19.861602+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_900.pt
2025-12-21T22:15:19.862061+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:15:19.862160+0800 | INFO | 1. Step 1000: acc=0.6369 (reward_model.best_1000.pt)
2025-12-21T22:15:19.862221+0800 | INFO | 2. Step 1200: acc=0.6157 (reward_model.best_1200.pt)
2025-12-21T22:15:19.862271+0800 | INFO | 3. Step 800: acc=0.5961 (reward_model.best_800.pt)
2025-12-21T22:15:21.310424+0800 | INFO | Step 1201: loss=0.2985, acc=0.844 (IF=0.750, MQ=0.938)
2025-12-21T22:15:22.747302+0800 | INFO | Step 1202: loss=0.1370, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:15:24.216805+0800 | INFO | Step 1203: loss=0.2257, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:15:25.666690+0800 | INFO | Step 1204: loss=0.0950, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:15:26.708772+0800 | INFO | Step 1205: loss=0.3528, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T22:15:28.156583+0800 | INFO | Step 1206: loss=0.0647, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:15:29.605928+0800 | INFO | Step 1207: loss=0.1342, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:15:31.060803+0800 | INFO | Step 1208: loss=0.1142, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:15:32.509142+0800 | INFO | Step 1209: loss=0.0545, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:15:33.948127+0800 | INFO | Step 1210: loss=0.1885, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:15:35.397308+0800 | INFO | Step 1211: loss=0.2675, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:15:36.861087+0800 | INFO | Step 1212: loss=0.1730, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:15:38.336645+0800 | INFO | Step 1213: loss=0.1566, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:15:39.791332+0800 | INFO | Step 1214: loss=0.0216, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:15:41.249195+0800 | INFO | Step 1215: loss=0.1402, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:15:42.698585+0800 | INFO | Step 1216: loss=0.1661, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:15:44.137373+0800 | INFO | Step 1217: loss=0.0658, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:15:45.580822+0800 | INFO | Step 1218: loss=0.1310, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:15:47.049734+0800 | INFO | Step 1219: loss=0.0978, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:15:48.460600+0800 | INFO | Step 1220: loss=0.0885, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:15:51.424701+0800 | INFO | Step 1221: loss=0.2896, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:15:52.895925+0800 | INFO | Step 1222: loss=0.0634, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:15:54.340518+0800 | INFO | Step 1223: loss=0.1229, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:15:55.458057+0800 | INFO | Step 1224: loss=0.1032, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:15:56.904831+0800 | INFO | Step 1225: loss=0.0443, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:15:58.126232+0800 | INFO | Step 1226: loss=0.1069, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:15:59.573728+0800 | INFO | Step 1227: loss=0.1679, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:16:01.015314+0800 | INFO | Step 1228: loss=0.2494, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:16:02.458656+0800 | INFO | Step 1229: loss=0.0735, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:16:03.906560+0800 | INFO | Step 1230: loss=0.0661, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:16:05.349900+0800 | INFO | Step 1231: loss=0.1219, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:16:06.814164+0800 | INFO | Step 1232: loss=0.0331, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:16:08.272701+0800 | INFO | Step 1233: loss=0.2255, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:16:09.714542+0800 | INFO | Step 1234: loss=0.0858, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:16:11.147806+0800 | INFO | Step 1235: loss=0.1549, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:16:12.181677+0800 | INFO | Step 1236: loss=0.1234, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:16:13.633489+0800 | INFO | Step 1237: loss=0.0503, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:16:15.073365+0800 | INFO | Step 1238: loss=0.0488, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:16:16.490412+0800 | INFO | Step 1239: loss=0.0612, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:16:17.933519+0800 | INFO | Step 1240: loss=0.0623, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:16:19.409522+0800 | INFO | Step 1241: loss=0.0968, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:16:20.859308+0800 | INFO | Step 1242: loss=0.4731, acc=0.784 (IF=0.692, MQ=0.875)
2025-12-21T22:16:22.312525+0800 | INFO | Step 1243: loss=0.0784, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:16:23.740267+0800 | INFO | Step 1244: loss=0.3220, acc=0.835 (IF=0.857, MQ=0.812)
2025-12-21T22:16:25.169206+0800 | INFO | Step 1245: loss=0.1643, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:16:26.607974+0800 | INFO | Step 1246: loss=0.0625, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:16:28.053281+0800 | INFO | Step 1247: loss=0.2809, acc=0.819 (IF=0.700, MQ=0.938)
2025-12-21T22:16:29.502664+0800 | INFO | Step 1248: loss=0.2426, acc=0.911 (IF=0.889, MQ=0.933)
2025-12-21T22:16:30.953537+0800 | INFO | Step 1249: loss=0.0875, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:16:32.400792+0800 | INFO | Step 1250: loss=0.0264, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:16:33.851623+0800 | INFO | Step 1251: loss=0.1834, acc=0.917 (IF=0.833, MQ=1.000)
2025-12-21T22:16:35.298574+0800 | INFO | Step 1252: loss=0.0689, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:16:36.761980+0800 | INFO | Step 1253: loss=0.0870, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:16:38.228027+0800 | INFO | Step 1254: loss=0.3154, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:16:39.715735+0800 | INFO | Step 1255: loss=0.3364, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T22:16:41.188386+0800 | INFO | Step 1256: loss=0.0374, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:16:42.663730+0800 | INFO | Step 1257: loss=0.1431, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:16:44.150379+0800 | INFO | Step 1258: loss=0.1432, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:16:45.603044+0800 | INFO | Step 1259: loss=0.1022, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:16:47.042051+0800 | INFO | Step 1260: loss=0.0810, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:16:48.104727+0800 | INFO | Step 1261: loss=0.1066, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:16:49.563702+0800 | INFO | Step 1262: loss=0.2879, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:16:51.008575+0800 | INFO | Step 1263: loss=0.1838, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:16:52.453908+0800 | INFO | Step 1264: loss=0.2421, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:16:53.943787+0800 | INFO | Step 1265: loss=0.3881, acc=0.837 (IF=0.923, MQ=0.750)
2025-12-21T22:16:55.067462+0800 | INFO | Step 1266: loss=0.0498, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:16:56.680594+0800 | INFO | Step 1267: loss=0.1167, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:16:58.202662+0800 | INFO | Step 1268: loss=0.2002, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T22:16:59.747462+0800 | INFO | Step 1269: loss=0.3585, acc=0.866 (IF=0.857, MQ=0.875)
2025-12-21T22:17:01.194452+0800 | INFO | Step 1270: loss=0.1727, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:17:02.645504+0800 | INFO | Step 1271: loss=0.0513, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:17:04.110119+0800 | INFO | Step 1272: loss=0.2772, acc=0.868 (IF=0.923, MQ=0.812)
2025-12-21T22:17:05.588657+0800 | INFO | Step 1273: loss=0.0646, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:17:07.059655+0800 | INFO | Step 1274: loss=0.3070, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T22:17:08.498887+0800 | INFO | Step 1275: loss=0.2768, acc=0.833 (IF=0.917, MQ=0.750)
2025-12-21T22:17:11.673036+0800 | INFO | Step 1276: loss=0.1839, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:17:13.211831+0800 | INFO | Step 1277: loss=0.0874, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:17:14.725366+0800 | INFO | Step 1278: loss=0.1091, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:17:16.199315+0800 | INFO | Step 1279: loss=0.1849, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:17:17.702502+0800 | INFO | Step 1280: loss=0.0474, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:17:19.141411+0800 | INFO | Step 1281: loss=0.0612, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:17:20.577773+0800 | INFO | Step 1282: loss=0.1024, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T22:17:22.007785+0800 | INFO | Step 1283: loss=0.1204, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:17:23.448814+0800 | INFO | Step 1284: loss=0.2613, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:17:24.923186+0800 | INFO | Step 1285: loss=0.1407, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:17:26.353936+0800 | INFO | Step 1286: loss=0.0362, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:17:27.788068+0800 | INFO | Step 1287: loss=0.2938, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T22:17:29.227659+0800 | INFO | Step 1288: loss=0.1257, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:17:30.682365+0800 | INFO | Step 1289: loss=0.2125, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:17:32.140201+0800 | INFO | Step 1290: loss=0.1101, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:17:33.587113+0800 | INFO | Step 1291: loss=0.1754, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T22:17:35.037457+0800 | INFO | Step 1292: loss=0.2518, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T22:17:36.492175+0800 | INFO | Step 1293: loss=0.4050, acc=0.822 (IF=0.769, MQ=0.875)
2025-12-21T22:17:37.991670+0800 | INFO | Step 1294: loss=0.1245, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:17:39.472825+0800 | INFO | Step 1295: loss=0.0803, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:17:40.913503+0800 | INFO | Step 1296: loss=0.0192, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:17:42.364366+0800 | INFO | Step 1297: loss=0.0286, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:17:43.823363+0800 | INFO | Step 1298: loss=0.1397, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:17:45.270965+0800 | INFO | Step 1299: loss=0.1835, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T22:17:46.716594+0800 | INFO | Step 1300: loss=0.1227, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:17:54.106454+0800 | INFO |
============================================================
Validation Results (took 7.37s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.5517
Quality Acc: 0.6500
Average Acc: 0.6009
Total Loss: 0.6725
Instruction Loss: 0.6809
Quality Loss: 0.6641
============================================================
2025-12-21T22:17:56.775572+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_800.pt
2025-12-21T22:17:56.776051+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:17:56.776145+0800 | INFO | 1. Step 1000: acc=0.6369 (reward_model.best_1000.pt)
2025-12-21T22:17:56.776203+0800 | INFO | 2. Step 1200: acc=0.6157 (reward_model.best_1200.pt)
2025-12-21T22:17:56.776253+0800 | INFO | 3. Step 1300: acc=0.6009 (reward_model.best_1300.pt)
2025-12-21T22:17:58.268164+0800 | INFO | Step 1301: loss=0.2415, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T22:17:59.722425+0800 | INFO | Step 1302: loss=0.0676, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:18:01.168053+0800 | INFO | Step 1303: loss=0.0800, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:18:02.612370+0800 | INFO | Step 1304: loss=0.1417, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:18:04.119472+0800 | INFO | Step 1305: loss=0.1886, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:18:05.568302+0800 | INFO | Step 1306: loss=0.1750, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:18:07.023837+0800 | INFO | Step 1307: loss=0.0516, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:18:08.487604+0800 | INFO | Step 1308: loss=0.1531, acc=0.900 (IF=0.800, MQ=1.000)
2025-12-21T22:18:09.540356+0800 | INFO | Step 1309: loss=0.2643, acc=0.795 (IF=0.778, MQ=0.812)
2025-12-21T22:18:10.989313+0800 | INFO | Step 1310: loss=0.3415, acc=0.787 (IF=0.700, MQ=0.875)
2025-12-21T22:18:12.472928+0800 | INFO | Step 1311: loss=0.0541, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:18:13.921587+0800 | INFO | Step 1312: loss=0.1316, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:18:15.366325+0800 | INFO | Step 1313: loss=0.3855, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T22:18:16.815029+0800 | INFO | Step 1314: loss=0.2324, acc=0.925 (IF=0.917, MQ=0.933)
2025-12-21T22:18:18.266740+0800 | INFO | Step 1315: loss=0.1738, acc=0.917 (IF=0.833, MQ=1.000)
2025-12-21T22:18:19.736626+0800 | INFO | Step 1316: loss=0.2080, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T22:18:20.776631+0800 | INFO | Step 1317: loss=0.4764, acc=0.781 (IF=0.750, MQ=0.812)
2025-12-21T22:18:22.240407+0800 | INFO | Step 1318: loss=0.1264, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:18:23.699476+0800 | INFO | Step 1319: loss=0.1558, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:18:25.143927+0800 | INFO | Step 1320: loss=0.3341, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:18:26.602239+0800 | INFO | Step 1321: loss=0.0969, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:18:28.077237+0800 | INFO | Step 1322: loss=0.1631, acc=0.823 (IF=0.833, MQ=0.812)
2025-12-21T22:18:29.563755+0800 | INFO | Step 1323: loss=0.0584, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:18:31.034572+0800 | INFO | Step 1324: loss=0.0754, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:18:32.522839+0800 | INFO | Step 1325: loss=0.1984, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T22:18:34.038590+0800 | INFO | Step 1326: loss=0.1131, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:18:35.498420+0800 | INFO | Step 1327: loss=0.1720, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:18:36.925340+0800 | INFO | Step 1328: loss=0.0341, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:18:38.359993+0800 | INFO | Step 1329: loss=0.2903, acc=0.823 (IF=0.833, MQ=0.812)
2025-12-21T22:18:39.805599+0800 | INFO | Step 1330: loss=0.1163, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:18:41.225390+0800 | INFO | Step 1331: loss=0.1257, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:18:43.991971+0800 | INFO | Step 1332: loss=0.3476, acc=0.829 (IF=0.857, MQ=0.800)
2025-12-21T22:18:45.464974+0800 | INFO | Step 1333: loss=0.1785, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:18:46.899087+0800 | INFO | Step 1334: loss=0.4461, acc=0.770 (IF=0.727, MQ=0.812)
2025-12-21T22:18:48.368074+0800 | INFO | Step 1335: loss=0.1253, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:18:49.800532+0800 | INFO | Step 1336: loss=0.1709, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T22:18:51.232873+0800 | INFO | Step 1337: loss=0.0545, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:18:52.672687+0800 | INFO | Step 1338: loss=0.0916, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:18:54.115787+0800 | INFO | Step 1339: loss=0.0772, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:18:55.575610+0800 | INFO | Step 1340: loss=0.1309, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:18:57.040038+0800 | INFO | Step 1341: loss=0.0354, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:18:58.467756+0800 | INFO | Step 1342: loss=0.0892, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:18:59.895273+0800 | INFO | Step 1343: loss=0.1166, acc=0.933 (IF=1.000, MQ=0.867)
2025-12-21T22:19:01.325205+0800 | INFO | Step 1344: loss=0.0952, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:19:02.806260+0800 | INFO | Step 1345: loss=0.0417, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:19:04.238658+0800 | INFO | Step 1346: loss=0.3309, acc=0.781 (IF=0.750, MQ=0.812)
2025-12-21T22:19:05.692019+0800 | INFO | Step 1347: loss=0.0449, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:19:07.164920+0800 | INFO | Step 1348: loss=0.1569, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:19:08.603015+0800 | INFO | Step 1349: loss=0.1097, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:19:10.064966+0800 | INFO | Step 1350: loss=0.0329, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:19:11.502938+0800 | INFO | Step 1351: loss=0.0470, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:19:12.943670+0800 | INFO | Step 1352: loss=0.2715, acc=0.844 (IF=0.875, MQ=0.812)
2025-12-21T22:19:14.393848+0800 | INFO | Step 1353: loss=0.0147, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:19:15.444220+0800 | INFO | Step 1354: loss=0.0977, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:19:16.881621+0800 | INFO | Step 1355: loss=0.1554, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:19:18.332108+0800 | INFO | Step 1356: loss=0.0359, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:19:19.786630+0800 | INFO | Step 1357: loss=0.1014, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:19:21.220436+0800 | INFO | Step 1358: loss=0.1339, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:19:22.663213+0800 | INFO | Step 1359: loss=0.1031, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:19:24.098826+0800 | INFO | Step 1360: loss=0.0524, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:19:25.545379+0800 | INFO | Step 1361: loss=0.1536, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:19:26.990641+0800 | INFO | Step 1362: loss=0.4022, acc=0.838 (IF=0.800, MQ=0.875)
2025-12-21T22:19:28.464484+0800 | INFO | Step 1363: loss=0.0389, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:19:29.946095+0800 | INFO | Step 1364: loss=0.1710, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:19:31.016502+0800 | INFO | Step 1365: loss=0.0521, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:19:32.456055+0800 | INFO | Step 1366: loss=0.3891, acc=0.826 (IF=0.778, MQ=0.875)
2025-12-21T22:19:33.884462+0800 | INFO | Step 1367: loss=0.0383, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:19:35.312831+0800 | INFO | Step 1368: loss=0.3753, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:19:36.752079+0800 | INFO | Step 1369: loss=0.1050, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:19:38.199534+0800 | INFO | Step 1370: loss=0.1581, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:19:39.640584+0800 | INFO | Step 1371: loss=0.0534, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:19:41.087730+0800 | INFO | Step 1372: loss=0.0941, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:19:42.516257+0800 | INFO | Step 1373: loss=0.3235, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:19:43.958013+0800 | INFO | Step 1374: loss=0.2236, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:19:45.392393+0800 | INFO | Step 1375: loss=0.1661, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:19:46.838697+0800 | INFO | Step 1376: loss=0.3377, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T22:19:48.288886+0800 | INFO | Step 1377: loss=0.1969, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:19:49.735203+0800 | INFO | Step 1378: loss=0.1523, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T22:19:51.185109+0800 | INFO | Step 1379: loss=0.1112, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:19:52.224908+0800 | INFO | Step 1380: loss=0.1680, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:19:53.683041+0800 | INFO | Step 1381: loss=0.2247, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:19:55.130853+0800 | INFO | Step 1382: loss=0.2106, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:19:56.587635+0800 | INFO | Step 1383: loss=0.3021, acc=0.895 (IF=0.857, MQ=0.933)
2025-12-21T22:19:58.093431+0800 | INFO | Step 1384: loss=0.2177, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:19:59.545085+0800 | INFO | Step 1385: loss=0.1154, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:20:01.015041+0800 | INFO | Step 1386: loss=0.0738, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:20:03.975400+0800 | INFO | Step 1387: loss=0.1745, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:20:05.424742+0800 | INFO | Step 1388: loss=0.1555, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:20:06.896169+0800 | INFO | Step 1389: loss=0.0299, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:20:08.344402+0800 | INFO | Step 1390: loss=0.2026, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:20:09.807313+0800 | INFO | Step 1391: loss=0.0215, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:20:11.239977+0800 | INFO | Step 1392: loss=0.1603, acc=0.917 (IF=0.833, MQ=1.000)
2025-12-21T22:20:12.680308+0800 | INFO | Step 1393: loss=0.1308, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:20:14.124744+0800 | INFO | Step 1394: loss=0.1381, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:20:15.767439+0800 | INFO | Step 1395: loss=0.0886, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:20:17.256228+0800 | INFO | Step 1396: loss=0.1764, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:20:18.894631+0800 | INFO | Step 1397: loss=0.1669, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:20:20.354599+0800 | INFO | Step 1398: loss=0.1356, acc=0.925 (IF=0.917, MQ=0.933)
2025-12-21T22:20:21.801591+0800 | INFO | Step 1399: loss=0.0682, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:20:23.250080+0800 | INFO | Step 1400: loss=0.0266, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:20:30.537335+0800 | INFO |
============================================================
Validation Results (took 7.26s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.5517
Quality Acc: 0.6625
Average Acc: 0.6071
Total Loss: 0.6724
Instruction Loss: 0.6810
Quality Loss: 0.6639
============================================================
2025-12-21T22:20:33.190682+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_1300.pt
2025-12-21T22:20:33.191179+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:20:33.191277+0800 | INFO | 1. Step 1000: acc=0.6369 (reward_model.best_1000.pt)
2025-12-21T22:20:33.191333+0800 | INFO | 2. Step 1200: acc=0.6157 (reward_model.best_1200.pt)
2025-12-21T22:20:33.191380+0800 | INFO | 3. Step 1400: acc=0.6071 (reward_model.best_1400.pt)
2025-12-21T22:20:34.701922+0800 | INFO | Step 1401: loss=0.0957, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:20:36.154844+0800 | INFO | Step 1402: loss=0.2028, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T22:20:37.205004+0800 | INFO | Step 1403: loss=0.1926, acc=0.875 (IF=0.875, MQ=0.875)
2025-12-21T22:20:38.634664+0800 | INFO | Step 1404: loss=0.1111, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:20:40.093122+0800 | INFO | Step 1405: loss=0.1444, acc=0.900 (IF=0.800, MQ=1.000)
2025-12-21T22:20:41.528620+0800 | INFO | Step 1406: loss=0.0330, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:20:42.974951+0800 | INFO | Step 1407: loss=0.2832, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:20:44.398319+0800 | INFO | Step 1408: loss=0.2328, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:20:45.854966+0800 | INFO | Step 1409: loss=0.0445, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:20:47.271803+0800 | INFO | Step 1410: loss=0.0987, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T22:20:48.735298+0800 | INFO | Step 1411: loss=0.1336, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:20:50.157702+0800 | INFO | Step 1412: loss=0.2413, acc=0.826 (IF=0.778, MQ=0.875)
2025-12-21T22:20:51.594068+0800 | INFO | Step 1413: loss=0.0637, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:20:53.023560+0800 | INFO | Step 1414: loss=0.0311, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:20:54.456419+0800 | INFO | Step 1415: loss=0.0792, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T22:20:55.890334+0800 | INFO | Step 1416: loss=0.1902, acc=0.875 (IF=0.938, MQ=0.812)
2025-12-21T22:20:57.327502+0800 | INFO | Step 1417: loss=0.0453, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:20:58.761312+0800 | INFO | Step 1418: loss=0.2859, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:21:00.204190+0800 | INFO | Step 1419: loss=0.0641, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:21:01.667003+0800 | INFO | Step 1420: loss=0.0155, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:21:03.123603+0800 | INFO | Step 1421: loss=0.3485, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T22:21:04.573959+0800 | INFO | Step 1422: loss=0.1627, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:21:05.619522+0800 | INFO | Step 1423: loss=0.1129, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:21:07.098519+0800 | INFO | Step 1424: loss=0.1340, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:21:08.572992+0800 | INFO | Step 1425: loss=0.0362, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:21:10.068670+0800 | INFO | Step 1426: loss=0.1179, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:21:11.526916+0800 | INFO | Step 1427: loss=0.2155, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:21:13.008238+0800 | INFO | Step 1428: loss=0.1093, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:21:14.465666+0800 | INFO | Step 1429: loss=0.0349, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:21:15.944235+0800 | INFO | Step 1430: loss=0.2510, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T22:21:17.388071+0800 | INFO | Step 1431: loss=0.1535, acc=0.928 (IF=0.923, MQ=0.933)
2025-12-21T22:21:18.835308+0800 | INFO | Step 1432: loss=0.1217, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:21:19.915621+0800 | INFO | Step 1433: loss=0.1993, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:21:21.364158+0800 | INFO | Step 1434: loss=0.1112, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:21:22.806501+0800 | INFO | Step 1435: loss=0.2914, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:21:24.276355+0800 | INFO | Step 1436: loss=0.1778, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:21:25.728674+0800 | INFO | Step 1437: loss=0.3392, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:21:27.166707+0800 | INFO | Step 1438: loss=0.2101, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T22:21:28.599498+0800 | INFO | Step 1439: loss=0.2045, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:21:29.657533+0800 | INFO | Step 1440: loss=0.1213, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:21:31.107361+0800 | INFO | Step 1441: loss=0.0584, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:21:32.537087+0800 | INFO | Step 1442: loss=0.1725, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:21:35.141597+0800 | INFO | Step 1443: loss=0.0471, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:21:36.581324+0800 | INFO | Step 1444: loss=0.0609, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:21:37.999870+0800 | INFO | Step 1445: loss=0.0763, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:21:39.539313+0800 | INFO | Step 1446: loss=0.0786, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:21:40.973079+0800 | INFO | Step 1447: loss=0.0312, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:21:42.397032+0800 | INFO | Step 1448: loss=0.1583, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:21:43.823708+0800 | INFO | Step 1449: loss=0.0537, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:21:45.262800+0800 | INFO | Step 1450: loss=0.0685, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:21:46.685047+0800 | INFO | Step 1451: loss=0.0753, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:21:48.118628+0800 | INFO | Step 1452: loss=0.2559, acc=0.856 (IF=0.900, MQ=0.812)
2025-12-21T22:21:49.542408+0800 | INFO | Step 1453: loss=0.0868, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:21:50.980208+0800 | INFO | Step 1454: loss=0.1195, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:21:52.413804+0800 | INFO | Step 1455: loss=0.2012, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:21:53.858172+0800 | INFO | Step 1456: loss=0.1241, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:21:55.275206+0800 | INFO | Step 1457: loss=0.1987, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:21:56.723153+0800 | INFO | Step 1458: loss=0.1145, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T22:21:58.163624+0800 | INFO | Step 1459: loss=0.0395, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:21:59.616311+0800 | INFO | Step 1460: loss=0.2625, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:22:01.057501+0800 | INFO | Step 1461: loss=0.2559, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:22:02.086685+0800 | INFO | Step 1462: loss=0.3166, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:22:03.584298+0800 | INFO | Step 1463: loss=0.1206, acc=0.935 (IF=0.933, MQ=0.938)
2025-12-21T22:22:05.032814+0800 | INFO | Step 1464: loss=0.0448, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:22:06.477413+0800 | INFO | Step 1465: loss=0.1016, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:22:07.919628+0800 | INFO | Step 1466: loss=0.0338, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:22:09.365549+0800 | INFO | Step 1467: loss=0.2110, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T22:22:10.849534+0800 | INFO | Step 1468: loss=0.1426, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:22:11.875100+0800 | INFO | Step 1469: loss=0.3355, acc=0.838 (IF=0.800, MQ=0.875)
2025-12-21T22:22:13.330685+0800 | INFO | Step 1470: loss=0.0510, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:22:14.764393+0800 | INFO | Step 1471: loss=0.0840, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:22:16.234049+0800 | INFO | Step 1472: loss=0.0584, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:22:17.665539+0800 | INFO | Step 1473: loss=0.0913, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:22:19.097496+0800 | INFO | Step 1474: loss=0.2019, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:22:20.540755+0800 | INFO | Step 1475: loss=0.1175, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:22:21.979099+0800 | INFO | Step 1476: loss=0.1824, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:22:23.421599+0800 | INFO | Step 1477: loss=0.1522, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:22:24.895371+0800 | INFO | Step 1478: loss=0.2057, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T22:22:26.370794+0800 | INFO | Step 1479: loss=0.1521, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:22:27.825514+0800 | INFO | Step 1480: loss=0.0623, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:22:29.269673+0800 | INFO | Step 1481: loss=0.0800, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:22:30.811479+0800 | INFO | Step 1482: loss=0.0729, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:22:32.305679+0800 | INFO | Step 1483: loss=0.1335, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:22:33.767744+0800 | INFO | Step 1484: loss=0.1378, acc=0.925 (IF=0.917, MQ=0.933)
2025-12-21T22:22:35.200316+0800 | INFO | Step 1485: loss=0.2630, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T22:22:36.630972+0800 | INFO | Step 1486: loss=0.1306, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:22:38.070357+0800 | INFO | Step 1487: loss=0.0968, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:22:39.521595+0800 | INFO | Step 1488: loss=0.3577, acc=0.799 (IF=0.786, MQ=0.812)
2025-12-21T22:22:40.979726+0800 | INFO | Step 1489: loss=0.4576, acc=0.708 (IF=0.667, MQ=0.750)
2025-12-21T22:22:42.414069+0800 | INFO | Step 1490: loss=0.1975, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:22:43.847223+0800 | INFO | Step 1491: loss=0.0835, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:22:45.287433+0800 | INFO | Step 1492: loss=0.1138, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:22:46.724229+0800 | INFO | Step 1493: loss=0.1161, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:22:48.166292+0800 | INFO | Step 1494: loss=0.2559, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:22:49.623442+0800 | INFO | Step 1495: loss=0.0759, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:22:51.072688+0800 | INFO | Step 1496: loss=0.0412, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:22:52.531614+0800 | INFO | Step 1497: loss=0.2765, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:22:55.087057+0800 | INFO | Step 1498: loss=0.1287, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:22:56.512634+0800 | INFO | Step 1499: loss=0.0245, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:22:57.566632+0800 | INFO | Step 1500: loss=0.3397, acc=0.897 (IF=0.857, MQ=0.938)
2025-12-21T22:23:04.914179+0800 | INFO |
============================================================
Validation Results (took 7.32s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.5345
Quality Acc: 0.6625
Average Acc: 0.5985
Total Loss: 0.6710
Instruction Loss: 0.6816
Quality Loss: 0.6604
============================================================
2025-12-21T22:23:07.607769+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_1500.pt
2025-12-21T22:23:07.608277+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:23:07.608377+0800 | INFO | 1. Step 1000: acc=0.6369 (reward_model.best_1000.pt)
2025-12-21T22:23:07.608436+0800 | INFO | 2. Step 1200: acc=0.6157 (reward_model.best_1200.pt)
2025-12-21T22:23:07.608486+0800 | INFO | 3. Step 1400: acc=0.6071 (reward_model.best_1400.pt)
2025-12-21T22:23:09.099673+0800 | INFO | Step 1501: loss=0.0381, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:23:10.585508+0800 | INFO | Step 1502: loss=0.1322, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:23:12.024136+0800 | INFO | Step 1503: loss=0.0988, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:23:13.462005+0800 | INFO | Step 1504: loss=0.0270, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:23:14.902323+0800 | INFO | Step 1505: loss=0.1578, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:23:16.376246+0800 | INFO | Step 1506: loss=0.0696, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:23:17.829985+0800 | INFO | Step 1507: loss=0.0661, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:23:19.274582+0800 | INFO | Step 1508: loss=0.1027, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:23:20.725139+0800 | INFO | Step 1509: loss=0.2848, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T22:23:22.172843+0800 | INFO | Step 1510: loss=0.1798, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:23:23.606090+0800 | INFO | Step 1511: loss=0.1563, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:23:24.621757+0800 | INFO | Step 1512: loss=0.2385, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:23:26.058680+0800 | INFO | Step 1513: loss=0.0700, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:23:27.502583+0800 | INFO | Step 1514: loss=0.2061, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T22:23:28.966173+0800 | INFO | Step 1515: loss=0.1328, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:23:30.411437+0800 | INFO | Step 1516: loss=0.1400, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:23:31.869563+0800 | INFO | Step 1517: loss=0.0639, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:23:33.330468+0800 | INFO | Step 1518: loss=0.2079, acc=0.871 (IF=0.875, MQ=0.867)
2025-12-21T22:23:34.764183+0800 | INFO | Step 1519: loss=0.2307, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:23:36.210588+0800 | INFO | Step 1520: loss=0.0430, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:23:37.649618+0800 | INFO | Step 1521: loss=0.3663, acc=0.826 (IF=0.778, MQ=0.875)
2025-12-21T22:23:39.079377+0800 | INFO | Step 1522: loss=0.1348, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:23:40.513499+0800 | INFO | Step 1523: loss=0.1389, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:23:41.953319+0800 | INFO | Step 1524: loss=0.0176, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:23:43.391378+0800 | INFO | Step 1525: loss=0.1451, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:23:44.838824+0800 | INFO | Step 1526: loss=0.1755, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:23:46.288265+0800 | INFO | Step 1527: loss=0.0437, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:23:47.734569+0800 | INFO | Step 1528: loss=0.2561, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:23:49.201894+0800 | INFO | Step 1529: loss=0.1143, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:23:50.647140+0800 | INFO | Step 1530: loss=0.1482, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T22:23:52.108612+0800 | INFO | Step 1531: loss=0.1578, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:23:53.554156+0800 | INFO | Step 1532: loss=0.3616, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T22:23:54.984239+0800 | INFO | Step 1533: loss=0.0731, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:23:56.439584+0800 | INFO | Step 1534: loss=0.2128, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T22:23:57.865021+0800 | INFO | Step 1535: loss=0.0767, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:23:59.307759+0800 | INFO | Step 1536: loss=0.1423, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:24:00.744907+0800 | INFO | Step 1537: loss=0.1088, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:24:02.197229+0800 | INFO | Step 1538: loss=0.0193, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:24:03.640023+0800 | INFO | Step 1539: loss=0.1517, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:24:05.067928+0800 | INFO | Step 1540: loss=0.7620, acc=0.770 (IF=0.727, MQ=0.812)
2025-12-21T22:24:06.509542+0800 | INFO | Step 1541: loss=0.1241, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:24:07.945049+0800 | INFO | Step 1542: loss=0.0794, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:24:09.451616+0800 | INFO | Step 1543: loss=0.1291, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:24:10.886377+0800 | INFO | Step 1544: loss=0.1702, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:24:12.316937+0800 | INFO | Step 1545: loss=0.1481, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:24:13.740938+0800 | INFO | Step 1546: loss=0.0872, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:24:15.173776+0800 | INFO | Step 1547: loss=0.3244, acc=0.825 (IF=0.900, MQ=0.750)
2025-12-21T22:24:16.197841+0800 | INFO | Step 1548: loss=0.2415, acc=0.856 (IF=0.900, MQ=0.812)
2025-12-21T22:24:17.225110+0800 | INFO | Step 1549: loss=0.0716, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:24:18.662248+0800 | INFO | Step 1550: loss=0.1000, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:24:20.101928+0800 | INFO | Step 1551: loss=0.0337, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:24:21.549544+0800 | INFO | Step 1552: loss=0.1834, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:24:23.000613+0800 | INFO | Step 1553: loss=0.1331, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:24:25.759296+0800 | INFO | Step 1554: loss=0.1253, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:24:27.206580+0800 | INFO | Step 1555: loss=0.1169, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:24:28.681020+0800 | INFO | Step 1556: loss=0.0459, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:24:29.760533+0800 | INFO | Step 1557: loss=0.1709, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:24:31.197426+0800 | INFO | Step 1558: loss=0.0502, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:24:32.638713+0800 | INFO | Step 1559: loss=0.1425, acc=0.875 (IF=0.875, MQ=0.875)
2025-12-21T22:24:34.082070+0800 | INFO | Step 1560: loss=0.1071, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:24:35.529002+0800 | INFO | Step 1561: loss=0.1323, acc=0.933 (IF=1.000, MQ=0.867)
2025-12-21T22:24:37.001733+0800 | INFO | Step 1562: loss=0.4381, acc=0.839 (IF=0.929, MQ=0.750)
2025-12-21T22:24:38.462160+0800 | INFO | Step 1563: loss=0.0554, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:24:39.901767+0800 | INFO | Step 1564: loss=0.0640, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:24:41.327075+0800 | INFO | Step 1565: loss=0.1279, acc=0.897 (IF=0.857, MQ=0.938)
2025-12-21T22:24:42.757600+0800 | INFO | Step 1566: loss=0.0675, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:24:44.192214+0800 | INFO | Step 1567: loss=0.1321, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:24:45.631661+0800 | INFO | Step 1568: loss=0.0650, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:24:47.063704+0800 | INFO | Step 1569: loss=0.0258, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:24:48.495131+0800 | INFO | Step 1570: loss=0.1672, acc=0.923 (IF=0.846, MQ=1.000)
2025-12-21T22:24:49.957426+0800 | INFO | Step 1571: loss=0.3793, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:24:51.405574+0800 | INFO | Step 1572: loss=0.4824, acc=0.875 (IF=0.875, MQ=0.875)
2025-12-21T22:24:52.836695+0800 | INFO | Step 1573: loss=0.1273, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:24:54.269954+0800 | INFO | Step 1574: loss=0.0150, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:24:55.744077+0800 | INFO | Step 1575: loss=0.1191, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:24:57.217135+0800 | INFO | Step 1576: loss=0.0921, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:24:58.656203+0800 | INFO | Step 1577: loss=0.0812, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:25:00.088230+0800 | INFO | Step 1578: loss=0.2051, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:25:01.547429+0800 | INFO | Step 1579: loss=0.0717, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:25:02.993832+0800 | INFO | Step 1580: loss=0.1026, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:25:04.424031+0800 | INFO | Step 1581: loss=0.1132, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:25:05.881717+0800 | INFO | Step 1582: loss=0.0636, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:25:07.328021+0800 | INFO | Step 1583: loss=0.0153, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:25:08.816532+0800 | INFO | Step 1584: loss=0.0966, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:25:10.280465+0800 | INFO | Step 1585: loss=0.2074, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:25:11.723941+0800 | INFO | Step 1586: loss=0.1572, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:25:13.180516+0800 | INFO | Step 1587: loss=0.0839, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:25:14.625347+0800 | INFO | Step 1588: loss=0.1490, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:25:16.070503+0800 | INFO | Step 1589: loss=0.0354, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:25:17.134419+0800 | INFO | Step 1590: loss=0.3603, acc=0.772 (IF=0.857, MQ=0.688)
2025-12-21T22:25:18.584913+0800 | INFO | Step 1591: loss=0.0947, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:25:20.056281+0800 | INFO | Step 1592: loss=0.2751, acc=0.833 (IF=0.917, MQ=0.750)
2025-12-21T22:25:21.500675+0800 | INFO | Step 1593: loss=0.0914, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:25:22.956383+0800 | INFO | Step 1594: loss=0.0980, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:25:24.414439+0800 | INFO | Step 1595: loss=0.1564, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:25:25.859573+0800 | INFO | Step 1596: loss=0.0380, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:25:27.296541+0800 | INFO | Step 1597: loss=0.2061, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T22:25:28.743348+0800 | INFO | Step 1598: loss=0.3611, acc=0.823 (IF=0.833, MQ=0.812)
2025-12-21T22:25:30.191475+0800 | INFO | Step 1599: loss=0.0234, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:25:31.644492+0800 | INFO | Step 1600: loss=0.0923, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:25:38.925305+0800 | INFO |
============================================================
Validation Results (took 7.26s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.5862
Quality Acc: 0.6750
Average Acc: 0.6306
Total Loss: 0.6677
Instruction Loss: 0.6790
Quality Loss: 0.6564
============================================================
2025-12-21T22:25:41.599840+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_1400.pt
2025-12-21T22:25:41.600400+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:25:41.600509+0800 | INFO | 1. Step 1000: acc=0.6369 (reward_model.best_1000.pt)
2025-12-21T22:25:41.600566+0800 | INFO | 2. Step 1600: acc=0.6306 (reward_model.best_1600.pt)
2025-12-21T22:25:41.600618+0800 | INFO | 3. Step 1200: acc=0.6157 (reward_model.best_1200.pt)
2025-12-21T22:25:43.097484+0800 | INFO | Step 1601: loss=0.0485, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:25:44.548104+0800 | INFO | Step 1602: loss=0.0756, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:25:45.998578+0800 | INFO | Step 1603: loss=0.0264, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:25:47.473225+0800 | INFO | Step 1604: loss=0.0867, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T22:25:48.924515+0800 | INFO | Step 1605: loss=0.0175, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:25:50.371892+0800 | INFO | Step 1606: loss=0.1264, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:25:51.822703+0800 | INFO | Step 1607: loss=0.2350, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T22:25:53.260219+0800 | INFO | Step 1608: loss=0.6144, acc=0.825 (IF=0.900, MQ=0.750)
2025-12-21T22:25:56.031185+0800 | INFO | Step 1609: loss=0.2228, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:25:57.546800+0800 | INFO | Step 1610: loss=0.1247, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:25:59.047848+0800 | INFO | Step 1611: loss=0.0911, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:26:00.529876+0800 | INFO | Step 1612: loss=0.0661, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:26:01.973642+0800 | INFO | Step 1613: loss=0.0117, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:26:03.407334+0800 | INFO | Step 1614: loss=0.1490, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:26:04.862767+0800 | INFO | Step 1615: loss=0.0200, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:26:06.309424+0800 | INFO | Step 1616: loss=0.0673, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T22:26:07.753431+0800 | INFO | Step 1617: loss=0.0178, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:26:08.828162+0800 | INFO | Step 1618: loss=0.2694, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:26:10.277713+0800 | INFO | Step 1619: loss=0.0793, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:26:11.740105+0800 | INFO | Step 1620: loss=0.1635, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:26:13.247707+0800 | INFO | Step 1621: loss=0.0486, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:26:14.715586+0800 | INFO | Step 1622: loss=0.1129, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:26:16.159426+0800 | INFO | Step 1623: loss=0.1209, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:26:17.631029+0800 | INFO | Step 1624: loss=0.0709, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:26:19.095686+0800 | INFO | Step 1625: loss=0.0708, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:26:20.536377+0800 | INFO | Step 1626: loss=0.0436, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:26:21.990802+0800 | INFO | Step 1627: loss=0.0939, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:26:23.427378+0800 | INFO | Step 1628: loss=0.1050, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:26:24.856101+0800 | INFO | Step 1629: loss=0.1248, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:26:26.293547+0800 | INFO | Step 1630: loss=0.2824, acc=0.800 (IF=0.600, MQ=1.000)
2025-12-21T22:26:27.761833+0800 | INFO | Step 1631: loss=0.3226, acc=0.795 (IF=0.778, MQ=0.812)
2025-12-21T22:26:29.208203+0800 | INFO | Step 1632: loss=0.2348, acc=0.861 (IF=0.846, MQ=0.875)
2025-12-21T22:26:30.223616+0800 | INFO | Step 1633: loss=0.1858, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:26:31.663228+0800 | INFO | Step 1634: loss=0.0417, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:26:33.110568+0800 | INFO | Step 1635: loss=0.0400, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:26:34.545675+0800 | INFO | Step 1636: loss=0.0398, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:26:35.968803+0800 | INFO | Step 1637: loss=0.1470, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:26:37.409618+0800 | INFO | Step 1638: loss=0.0555, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:26:38.834540+0800 | INFO | Step 1639: loss=0.3396, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:26:40.264130+0800 | INFO | Step 1640: loss=0.2497, acc=0.868 (IF=0.923, MQ=0.812)
2025-12-21T22:26:41.693084+0800 | INFO | Step 1641: loss=0.0347, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:26:42.747306+0800 | INFO | Step 1642: loss=0.0955, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:26:44.199905+0800 | INFO | Step 1643: loss=0.0673, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:26:45.634494+0800 | INFO | Step 1644: loss=0.0881, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:26:47.087221+0800 | INFO | Step 1645: loss=0.0488, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:26:48.532231+0800 | INFO | Step 1646: loss=0.3027, acc=0.875 (IF=1.000, MQ=0.750)
2025-12-21T22:26:49.974662+0800 | INFO | Step 1647: loss=0.1194, acc=0.900 (IF=0.800, MQ=1.000)
2025-12-21T22:26:51.431263+0800 | INFO | Step 1648: loss=0.0484, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:26:52.870411+0800 | INFO | Step 1649: loss=0.2764, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:26:54.311160+0800 | INFO | Step 1650: loss=0.1055, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:26:55.745843+0800 | INFO | Step 1651: loss=0.0942, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:26:57.177080+0800 | INFO | Step 1652: loss=0.0198, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:26:58.633166+0800 | INFO | Step 1653: loss=0.1735, acc=0.929 (IF=0.857, MQ=1.000)
2025-12-21T22:27:00.088982+0800 | INFO | Step 1654: loss=0.1050, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:27:01.537605+0800 | INFO | Step 1655: loss=0.2406, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:27:02.977884+0800 | INFO | Step 1656: loss=0.1292, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:27:04.421204+0800 | INFO | Step 1657: loss=0.0137, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:27:05.910519+0800 | INFO | Step 1658: loss=0.0994, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:27:07.362806+0800 | INFO | Step 1659: loss=0.3503, acc=0.883 (IF=0.900, MQ=0.867)
2025-12-21T22:27:08.834351+0800 | INFO | Step 1660: loss=0.0406, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:27:10.255650+0800 | INFO | Step 1661: loss=0.1283, acc=0.967 (IF=0.933, MQ=1.000)
2025-12-21T22:27:11.680173+0800 | INFO | Step 1662: loss=0.1909, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:27:13.118340+0800 | INFO | Step 1663: loss=0.3125, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T22:27:14.578539+0800 | INFO | Step 1664: loss=0.4659, acc=0.823 (IF=0.833, MQ=0.812)
2025-12-21T22:27:17.278610+0800 | INFO | Step 1665: loss=0.1258, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:27:18.787587+0800 | INFO | Step 1666: loss=0.1554, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:27:20.238487+0800 | INFO | Step 1667: loss=0.2357, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:27:21.726954+0800 | INFO | Step 1668: loss=0.0257, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:27:23.175266+0800 | INFO | Step 1669: loss=0.3241, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T22:27:24.608355+0800 | INFO | Step 1670: loss=0.0802, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:27:26.039030+0800 | INFO | Step 1671: loss=0.0742, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:27:27.472612+0800 | INFO | Step 1672: loss=0.0240, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:27:28.913397+0800 | INFO | Step 1673: loss=0.0266, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:27:30.357404+0800 | INFO | Step 1674: loss=0.1271, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:27:31.805031+0800 | INFO | Step 1675: loss=0.1434, acc=0.928 (IF=0.923, MQ=0.933)
2025-12-21T22:27:33.274196+0800 | INFO | Step 1676: loss=0.0871, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:27:34.734147+0800 | INFO | Step 1677: loss=0.0773, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:27:36.187052+0800 | INFO | Step 1678: loss=0.0445, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:27:37.616468+0800 | INFO | Step 1679: loss=0.1230, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T22:27:39.060789+0800 | INFO | Step 1680: loss=0.1224, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:27:40.525212+0800 | INFO | Step 1681: loss=0.5858, acc=0.812 (IF=0.750, MQ=0.875)
2025-12-21T22:27:41.985369+0800 | INFO | Step 1682: loss=0.1160, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:27:43.448413+0800 | INFO | Step 1683: loss=0.2094, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:27:44.887740+0800 | INFO | Step 1684: loss=0.0813, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:27:46.343586+0800 | INFO | Step 1685: loss=0.1133, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:27:47.403285+0800 | INFO | Step 1686: loss=0.2177, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:27:48.843772+0800 | INFO | Step 1687: loss=0.0098, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:27:50.283308+0800 | INFO | Step 1688: loss=0.1052, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:27:51.725431+0800 | INFO | Step 1689: loss=0.1123, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:27:53.166718+0800 | INFO | Step 1690: loss=0.3235, acc=0.838 (IF=0.800, MQ=0.875)
2025-12-21T22:27:54.631324+0800 | INFO | Step 1691: loss=0.0482, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:27:56.054064+0800 | INFO | Step 1692: loss=0.1699, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:27:57.491031+0800 | INFO | Step 1693: loss=0.0870, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:27:58.933317+0800 | INFO | Step 1694: loss=0.2597, acc=0.906 (IF=0.875, MQ=0.938)
2025-12-21T22:28:00.376599+0800 | INFO | Step 1695: loss=0.1187, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:28:01.817691+0800 | INFO | Step 1696: loss=0.0707, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:28:03.256958+0800 | INFO | Step 1697: loss=0.0856, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:28:04.697785+0800 | INFO | Step 1698: loss=0.1828, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:28:06.129210+0800 | INFO | Step 1699: loss=0.2961, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T22:28:07.630070+0800 | INFO | Step 1700: loss=0.0366, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:28:15.112616+0800 | INFO |
============================================================
Validation Results (took 7.46s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.5690
Quality Acc: 0.6750
Average Acc: 0.6220
Total Loss: 0.6669
Instruction Loss: 0.6778
Quality Loss: 0.6560
============================================================
2025-12-21T22:28:18.027715+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_1200.pt
2025-12-21T22:28:18.028358+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:28:18.028483+0800 | INFO | 1. Step 1000: acc=0.6369 (reward_model.best_1000.pt)
2025-12-21T22:28:18.028559+0800 | INFO | 2. Step 1600: acc=0.6306 (reward_model.best_1600.pt)
2025-12-21T22:28:18.028624+0800 | INFO | 3. Step 1700: acc=0.6220 (reward_model.best_1700.pt)
2025-12-21T22:28:19.509448+0800 | INFO | Step 1701: loss=0.1496, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:28:20.569755+0800 | INFO | Step 1702: loss=0.1771, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:28:21.996511+0800 | INFO | Step 1703: loss=0.0506, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:28:23.435235+0800 | INFO | Step 1704: loss=0.3210, acc=0.826 (IF=0.778, MQ=0.875)
2025-12-21T22:28:24.881686+0800 | INFO | Step 1705: loss=0.1591, acc=0.851 (IF=0.889, MQ=0.812)
2025-12-21T22:28:26.337212+0800 | INFO | Step 1706: loss=0.1733, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:28:27.416759+0800 | INFO | Step 1707: loss=0.3970, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:28:28.859694+0800 | INFO | Step 1708: loss=0.1292, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:28:30.305079+0800 | INFO | Step 1709: loss=0.0399, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:28:31.773221+0800 | INFO | Step 1710: loss=0.1659, acc=0.917 (IF=0.900, MQ=0.933)
2025-12-21T22:28:33.242518+0800 | INFO | Step 1711: loss=0.0286, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:28:34.704730+0800 | INFO | Step 1712: loss=0.0757, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:28:36.150887+0800 | INFO | Step 1713: loss=0.0565, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:28:36.819985+0800 | INFO | Step 1714: loss=0.0701, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:28:38.286241+0800 | INFO | Step 1715: loss=0.3195, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T22:28:39.719635+0800 | INFO | Step 1716: loss=0.1385, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:28:41.155531+0800 | INFO | Step 1717: loss=0.1189, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:28:42.595639+0800 | INFO | Step 1718: loss=0.0778, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:28:44.057967+0800 | INFO | Step 1719: loss=0.1262, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:28:46.819435+0800 | INFO | Step 1720: loss=0.0488, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:28:48.259411+0800 | INFO | Step 1721: loss=0.1724, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:28:49.736276+0800 | INFO | Step 1722: loss=0.0755, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T22:28:51.216897+0800 | INFO | Step 1723: loss=0.1129, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:28:52.661166+0800 | INFO | Step 1724: loss=0.2142, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:28:54.104133+0800 | INFO | Step 1725: loss=0.0295, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:28:55.572464+0800 | INFO | Step 1726: loss=0.0406, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:28:57.028004+0800 | INFO | Step 1727: loss=0.0202, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:28:58.495631+0800 | INFO | Step 1728: loss=0.0490, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:28:59.954277+0800 | INFO | Step 1729: loss=0.0492, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:29:01.409191+0800 | INFO | Step 1730: loss=0.0942, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:29:02.899531+0800 | INFO | Step 1731: loss=0.1098, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:29:04.359148+0800 | INFO | Step 1732: loss=0.0754, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:29:05.793919+0800 | INFO | Step 1733: loss=0.0623, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:29:07.230487+0800 | INFO | Step 1734: loss=0.1191, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:29:08.292595+0800 | INFO | Step 1735: loss=0.0860, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T22:29:09.741112+0800 | INFO | Step 1736: loss=0.0092, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:29:10.818116+0800 | INFO | Step 1737: loss=0.0522, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:29:12.243293+0800 | INFO | Step 1738: loss=0.2837, acc=0.826 (IF=0.778, MQ=0.875)
2025-12-21T22:29:13.726245+0800 | INFO | Step 1739: loss=0.5153, acc=0.866 (IF=0.857, MQ=0.875)
2025-12-21T22:29:15.165142+0800 | INFO | Step 1740: loss=0.0854, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:29:16.633329+0800 | INFO | Step 1741: loss=0.1811, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:29:18.103951+0800 | INFO | Step 1742: loss=0.1095, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:29:19.541109+0800 | INFO | Step 1743: loss=0.1288, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:29:20.988694+0800 | INFO | Step 1744: loss=0.0259, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:29:22.409163+0800 | INFO | Step 1745: loss=0.0128, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:29:23.851105+0800 | INFO | Step 1746: loss=0.1247, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:29:25.294846+0800 | INFO | Step 1747: loss=0.0993, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:29:26.741585+0800 | INFO | Step 1748: loss=0.0969, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:29:28.211175+0800 | INFO | Step 1749: loss=0.2264, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:29:29.657386+0800 | INFO | Step 1750: loss=0.0562, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:29:31.109079+0800 | INFO | Step 1751: loss=0.2096, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:29:32.557615+0800 | INFO | Step 1752: loss=0.2007, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T22:29:33.984789+0800 | INFO | Step 1753: loss=0.1086, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:29:35.431084+0800 | INFO | Step 1754: loss=0.1853, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:29:36.853267+0800 | INFO | Step 1755: loss=0.0629, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:29:38.310097+0800 | INFO | Step 1756: loss=0.0329, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:29:39.750259+0800 | INFO | Step 1757: loss=0.4213, acc=0.819 (IF=0.889, MQ=0.750)
2025-12-21T22:29:41.203022+0800 | INFO | Step 1758: loss=0.1132, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T22:29:42.663111+0800 | INFO | Step 1759: loss=0.0478, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:29:44.115310+0800 | INFO | Step 1760: loss=0.0866, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:29:45.558669+0800 | INFO | Step 1761: loss=0.0442, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:29:46.603364+0800 | INFO | Step 1762: loss=0.1683, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:29:48.045676+0800 | INFO | Step 1763: loss=0.2050, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:29:49.491188+0800 | INFO | Step 1764: loss=0.0793, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:29:50.937142+0800 | INFO | Step 1765: loss=0.1937, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:29:52.385741+0800 | INFO | Step 1766: loss=0.1183, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:29:53.831565+0800 | INFO | Step 1767: loss=0.0293, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:29:55.292412+0800 | INFO | Step 1768: loss=0.0582, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:29:56.323301+0800 | INFO | Step 1769: loss=0.1371, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:29:57.762458+0800 | INFO | Step 1770: loss=0.2299, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:29:59.204696+0800 | INFO | Step 1771: loss=0.1649, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T22:30:00.645485+0800 | INFO | Step 1772: loss=0.0649, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:30:02.105763+0800 | INFO | Step 1773: loss=0.0197, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:30:03.558474+0800 | INFO | Step 1774: loss=0.1403, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:30:04.974930+0800 | INFO | Step 1775: loss=0.2310, acc=0.866 (IF=0.857, MQ=0.875)
2025-12-21T22:30:08.070529+0800 | INFO | Step 1776: loss=0.0905, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:30:09.605477+0800 | INFO | Step 1777: loss=0.1440, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:30:11.073681+0800 | INFO | Step 1778: loss=0.1020, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:30:12.642207+0800 | INFO | Step 1779: loss=0.1492, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:30:14.073233+0800 | INFO | Step 1780: loss=0.1158, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:30:15.512525+0800 | INFO | Step 1781: loss=0.1701, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:30:16.965765+0800 | INFO | Step 1782: loss=0.2059, acc=0.917 (IF=0.833, MQ=1.000)
2025-12-21T22:30:18.401074+0800 | INFO | Step 1783: loss=0.0453, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:30:19.862342+0800 | INFO | Step 1784: loss=0.1923, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:30:21.326385+0800 | INFO | Step 1785: loss=0.1617, acc=0.844 (IF=0.750, MQ=0.938)
2025-12-21T22:30:22.773793+0800 | INFO | Step 1786: loss=0.0606, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:30:24.203580+0800 | INFO | Step 1787: loss=0.3403, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T22:30:25.639124+0800 | INFO | Step 1788: loss=0.0443, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:30:27.087317+0800 | INFO | Step 1789: loss=0.0382, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:30:28.510927+0800 | INFO | Step 1790: loss=0.0469, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:30:29.943701+0800 | INFO | Step 1791: loss=0.0666, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:30:31.381420+0800 | INFO | Step 1792: loss=0.0502, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:30:32.837325+0800 | INFO | Step 1793: loss=0.1340, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:30:34.311073+0800 | INFO | Step 1794: loss=0.0298, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:30:35.748852+0800 | INFO | Step 1795: loss=0.1055, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:30:37.183784+0800 | INFO | Step 1796: loss=0.3200, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T22:30:38.630974+0800 | INFO | Step 1797: loss=0.0994, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:30:40.079966+0800 | INFO | Step 1798: loss=0.0288, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:30:41.509999+0800 | INFO | Step 1799: loss=0.0815, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:30:42.937435+0800 | INFO | Step 1800: loss=0.1548, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T22:30:50.835356+0800 | INFO |
============================================================
Validation Results (took 7.87s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6034
Quality Acc: 0.6125
Average Acc: 0.6080
Total Loss: 0.6655
Instruction Loss: 0.6732
Quality Loss: 0.6578
============================================================
2025-12-21T22:30:53.989040+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_1800.pt
2025-12-21T22:30:53.989682+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:30:53.989830+0800 | INFO | 1. Step 1000: acc=0.6369 (reward_model.best_1000.pt)
2025-12-21T22:30:53.989922+0800 | INFO | 2. Step 1600: acc=0.6306 (reward_model.best_1600.pt)
2025-12-21T22:30:53.990004+0800 | INFO | 3. Step 1700: acc=0.6220 (reward_model.best_1700.pt)
2025-12-21T22:30:55.111461+0800 | INFO | Step 1801: loss=0.2163, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:30:56.558603+0800 | INFO | Step 1802: loss=0.1211, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:30:57.992437+0800 | INFO | Step 1803: loss=0.0990, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:30:59.427900+0800 | INFO | Step 1804: loss=0.3692, acc=0.826 (IF=0.778, MQ=0.875)
2025-12-21T22:31:00.868877+0800 | INFO | Step 1805: loss=0.0700, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:31:02.313289+0800 | INFO | Step 1806: loss=0.1632, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:31:03.385430+0800 | INFO | Step 1807: loss=0.0934, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:31:04.820425+0800 | INFO | Step 1808: loss=0.3511, acc=0.917 (IF=0.833, MQ=1.000)
2025-12-21T22:31:06.261765+0800 | INFO | Step 1809: loss=0.0201, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:31:07.698473+0800 | INFO | Step 1810: loss=0.1036, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:31:09.163985+0800 | INFO | Step 1811: loss=0.0796, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:31:10.617964+0800 | INFO | Step 1812: loss=0.0668, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:31:12.071867+0800 | INFO | Step 1813: loss=0.4598, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T22:31:13.526846+0800 | INFO | Step 1814: loss=0.1278, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:31:14.982645+0800 | INFO | Step 1815: loss=0.0231, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:31:16.408220+0800 | INFO | Step 1816: loss=0.1269, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:31:17.850826+0800 | INFO | Step 1817: loss=0.0243, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:31:19.283265+0800 | INFO | Step 1818: loss=0.0206, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:31:20.728507+0800 | INFO | Step 1819: loss=0.2008, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:31:22.232646+0800 | INFO | Step 1820: loss=0.0516, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:31:23.681829+0800 | INFO | Step 1821: loss=0.2754, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:31:25.119333+0800 | INFO | Step 1822: loss=0.5187, acc=0.787 (IF=0.700, MQ=0.875)
2025-12-21T22:31:26.554884+0800 | INFO | Step 1823: loss=0.2125, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:31:27.989866+0800 | INFO | Step 1824: loss=0.0125, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:31:29.429427+0800 | INFO | Step 1825: loss=0.0578, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:31:30.872660+0800 | INFO | Step 1826: loss=0.1324, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:31:32.340697+0800 | INFO | Step 1827: loss=0.0895, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:31:33.789160+0800 | INFO | Step 1828: loss=0.0828, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T22:31:35.249374+0800 | INFO | Step 1829: loss=0.1721, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:31:36.693432+0800 | INFO | Step 1830: loss=0.1096, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:31:39.833864+0800 | INFO | Step 1831: loss=0.0949, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:31:41.273360+0800 | INFO | Step 1832: loss=0.0483, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:31:42.703396+0800 | INFO | Step 1833: loss=0.0863, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:31:44.163496+0800 | INFO | Step 1834: loss=0.1888, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:31:45.638443+0800 | INFO | Step 1835: loss=0.0800, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:31:47.078052+0800 | INFO | Step 1836: loss=0.1117, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:31:48.535606+0800 | INFO | Step 1837: loss=0.1637, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:31:49.987361+0800 | INFO | Step 1838: loss=0.0427, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:31:51.424896+0800 | INFO | Step 1839: loss=0.0371, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:31:52.853094+0800 | INFO | Step 1840: loss=0.0214, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:31:54.292564+0800 | INFO | Step 1841: loss=0.1931, acc=0.933 (IF=1.000, MQ=0.867)
2025-12-21T22:31:55.732153+0800 | INFO | Step 1842: loss=0.0302, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:31:57.164221+0800 | INFO | Step 1843: loss=0.1245, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:31:58.593725+0800 | INFO | Step 1844: loss=0.4694, acc=0.906 (IF=0.875, MQ=0.938)
2025-12-21T22:32:00.029214+0800 | INFO | Step 1845: loss=0.0260, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:32:01.489480+0800 | INFO | Step 1846: loss=0.0182, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:32:02.925193+0800 | INFO | Step 1847: loss=0.0566, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:32:04.363559+0800 | INFO | Step 1848: loss=0.0181, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:32:05.801520+0800 | INFO | Step 1849: loss=0.1559, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:32:07.225981+0800 | INFO | Step 1850: loss=0.0495, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:32:08.651692+0800 | INFO | Step 1851: loss=0.0278, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:32:10.096220+0800 | INFO | Step 1852: loss=0.0551, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:32:11.597656+0800 | INFO | Step 1853: loss=0.0331, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:32:13.076311+0800 | INFO | Step 1854: loss=0.0504, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:32:14.516635+0800 | INFO | Step 1855: loss=0.0975, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:32:15.957824+0800 | INFO | Step 1856: loss=0.2509, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:32:17.434274+0800 | INFO | Step 1857: loss=0.0164, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:32:18.880102+0800 | INFO | Step 1858: loss=0.1430, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:32:20.386207+0800 | INFO | Step 1859: loss=0.3730, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T22:32:21.807332+0800 | INFO | Step 1860: loss=0.3611, acc=0.851 (IF=0.889, MQ=0.812)
2025-12-21T22:32:23.249557+0800 | INFO | Step 1861: loss=0.0099, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:32:24.686212+0800 | INFO | Step 1862: loss=0.0259, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:32:26.166345+0800 | INFO | Step 1863: loss=0.0627, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T22:32:27.603732+0800 | INFO | Step 1864: loss=0.0944, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:32:29.038697+0800 | INFO | Step 1865: loss=0.0671, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:32:30.476435+0800 | INFO | Step 1866: loss=0.2466, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:32:31.506698+0800 | INFO | Step 1867: loss=0.1894, acc=0.897 (IF=0.857, MQ=0.938)
2025-12-21T22:32:32.937497+0800 | INFO | Step 1868: loss=0.1279, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:32:34.394448+0800 | INFO | Step 1869: loss=0.0621, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:32:35.840542+0800 | INFO | Step 1870: loss=0.1845, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T22:32:37.267059+0800 | INFO | Step 1871: loss=0.3020, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:32:38.710030+0800 | INFO | Step 1872: loss=0.0487, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:32:40.157047+0800 | INFO | Step 1873: loss=0.0904, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T22:32:41.603056+0800 | INFO | Step 1874: loss=0.0820, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:32:43.043116+0800 | INFO | Step 1875: loss=0.0344, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:32:44.474721+0800 | INFO | Step 1876: loss=0.0691, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:32:45.915913+0800 | INFO | Step 1877: loss=0.2777, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:32:47.350337+0800 | INFO | Step 1878: loss=0.1629, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:32:48.789440+0800 | INFO | Step 1879: loss=0.6849, acc=0.812 (IF=0.750, MQ=0.875)
2025-12-21T22:32:50.219308+0800 | INFO | Step 1880: loss=0.2341, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:32:51.667491+0800 | INFO | Step 1881: loss=0.0571, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:32:53.106424+0800 | INFO | Step 1882: loss=0.1760, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:32:54.548946+0800 | INFO | Step 1883: loss=0.0097, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:32:55.992938+0800 | INFO | Step 1884: loss=0.3675, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T22:32:57.439990+0800 | INFO | Step 1885: loss=0.0786, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:32:58.872358+0800 | INFO | Step 1886: loss=0.0544, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:33:01.740962+0800 | INFO | Step 1887: loss=0.2419, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T22:33:03.239589+0800 | INFO | Step 1888: loss=0.1118, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:33:04.673461+0800 | INFO | Step 1889: loss=0.1232, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:33:06.143707+0800 | INFO | Step 1890: loss=0.1289, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:33:07.596596+0800 | INFO | Step 1891: loss=0.1397, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:33:09.051998+0800 | INFO | Step 1892: loss=0.0629, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:33:10.505863+0800 | INFO | Step 1893: loss=0.1804, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:33:11.956612+0800 | INFO | Step 1894: loss=0.1436, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:33:13.404119+0800 | INFO | Step 1895: loss=0.0204, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:33:14.860902+0800 | INFO | Step 1896: loss=0.2063, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T22:33:16.310882+0800 | INFO | Step 1897: loss=0.0213, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:33:17.750845+0800 | INFO | Step 1898: loss=0.0341, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:33:19.235700+0800 | INFO | Step 1899: loss=0.0161, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:33:20.686355+0800 | INFO | Step 1900: loss=0.1194, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:33:28.069451+0800 | INFO |
============================================================
Validation Results (took 7.36s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.5690
Quality Acc: 0.6750
Average Acc: 0.6220
Total Loss: 0.6661
Instruction Loss: 0.6787
Quality Loss: 0.6534
============================================================
2025-12-21T22:33:30.728700+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_1900.pt
2025-12-21T22:33:30.729219+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:33:30.729323+0800 | INFO | 1. Step 1000: acc=0.6369 (reward_model.best_1000.pt)
2025-12-21T22:33:30.729383+0800 | INFO | 2. Step 1600: acc=0.6306 (reward_model.best_1600.pt)
2025-12-21T22:33:30.729436+0800 | INFO | 3. Step 1700: acc=0.6220 (reward_model.best_1700.pt)
2025-12-21T22:33:32.200519+0800 | INFO | Step 1901: loss=0.0053, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:33:33.639468+0800 | INFO | Step 1902: loss=0.1040, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:33:35.072694+0800 | INFO | Step 1903: loss=0.1862, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:33:36.539384+0800 | INFO | Step 1904: loss=0.0633, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:33:37.985249+0800 | INFO | Step 1905: loss=0.2611, acc=0.909 (IF=0.818, MQ=1.000)
2025-12-21T22:33:39.431567+0800 | INFO | Step 1906: loss=0.0733, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:33:40.878210+0800 | INFO | Step 1907: loss=0.1090, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:33:42.306435+0800 | INFO | Step 1908: loss=0.0873, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:33:43.740207+0800 | INFO | Step 1909: loss=0.1424, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T22:33:45.203582+0800 | INFO | Step 1910: loss=0.0473, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:33:46.642001+0800 | INFO | Step 1911: loss=0.2992, acc=0.858 (IF=0.778, MQ=0.938)
2025-12-21T22:33:48.090042+0800 | INFO | Step 1912: loss=0.1345, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:33:49.535335+0800 | INFO | Step 1913: loss=0.2153, acc=0.850 (IF=0.900, MQ=0.800)
2025-12-21T22:33:50.980354+0800 | INFO | Step 1914: loss=0.0305, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:33:52.404433+0800 | INFO | Step 1915: loss=0.2134, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:33:53.840372+0800 | INFO | Step 1916: loss=0.1597, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:33:55.284275+0800 | INFO | Step 1917: loss=0.1750, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:33:56.727500+0800 | INFO | Step 1918: loss=0.1372, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:33:58.181637+0800 | INFO | Step 1919: loss=0.0432, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:33:59.640598+0800 | INFO | Step 1920: loss=0.2398, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:34:01.125572+0800 | INFO | Step 1921: loss=0.0577, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:34:02.590042+0800 | INFO | Step 1922: loss=0.1494, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:34:04.038375+0800 | INFO | Step 1923: loss=0.3786, acc=0.875 (IF=0.875, MQ=0.875)
2025-12-21T22:34:05.480162+0800 | INFO | Step 1924: loss=0.0851, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:34:06.935777+0800 | INFO | Step 1925: loss=0.0648, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:34:08.394740+0800 | INFO | Step 1926: loss=0.1169, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:34:09.844202+0800 | INFO | Step 1927: loss=0.1675, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:34:11.293376+0800 | INFO | Step 1928: loss=0.0725, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:34:12.771374+0800 | INFO | Step 1929: loss=0.0871, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:34:14.224936+0800 | INFO | Step 1930: loss=0.0365, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:34:15.672135+0800 | INFO | Step 1931: loss=0.2002, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:34:17.132481+0800 | INFO | Step 1932: loss=0.0646, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:34:18.602023+0800 | INFO | Step 1933: loss=0.0313, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:34:20.046355+0800 | INFO | Step 1934: loss=0.0875, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:34:21.111660+0800 | INFO | Step 1935: loss=0.2277, acc=0.866 (IF=0.857, MQ=0.875)
2025-12-21T22:34:22.605274+0800 | INFO | Step 1936: loss=0.1783, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:34:23.821509+0800 | INFO | Step 1937: loss=0.1693, acc=0.923 (IF=0.846, MQ=1.000)
2025-12-21T22:34:25.279525+0800 | INFO | Step 1938: loss=0.0088, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:34:26.722585+0800 | INFO | Step 1939: loss=0.0924, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:34:28.179577+0800 | INFO | Step 1940: loss=0.1893, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:34:29.629517+0800 | INFO | Step 1941: loss=0.2575, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:34:31.827961+0800 | INFO | Step 1942: loss=0.0315, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:34:33.346080+0800 | INFO | Step 1943: loss=0.0404, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:34:34.829554+0800 | INFO | Step 1944: loss=0.0754, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:34:36.264367+0800 | INFO | Step 1945: loss=0.0633, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:34:37.693376+0800 | INFO | Step 1946: loss=0.2591, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:34:38.721264+0800 | INFO | Step 1947: loss=0.2002, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:34:40.172899+0800 | INFO | Step 1948: loss=0.0294, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:34:41.616423+0800 | INFO | Step 1949: loss=0.0270, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:34:42.635886+0800 | INFO | Step 1950: loss=0.2696, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T22:34:44.078343+0800 | INFO | Step 1951: loss=0.0758, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:34:45.529916+0800 | INFO | Step 1952: loss=0.0202, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:34:46.580567+0800 | INFO | Step 1953: loss=0.1722, acc=0.861 (IF=0.846, MQ=0.875)
2025-12-21T22:34:48.036021+0800 | INFO | Step 1954: loss=0.0131, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:34:49.473414+0800 | INFO | Step 1955: loss=0.2476, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:34:50.911238+0800 | INFO | Step 1956: loss=0.0299, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:34:52.362290+0800 | INFO | Step 1957: loss=0.3217, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:34:53.782954+0800 | INFO | Step 1958: loss=0.1392, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:34:55.215925+0800 | INFO | Step 1959: loss=0.0428, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:34:56.683668+0800 | INFO | Step 1960: loss=0.1205, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:34:58.125039+0800 | INFO | Step 1961: loss=0.0823, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:34:59.570023+0800 | INFO | Step 1962: loss=0.1065, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:35:01.011293+0800 | INFO | Step 1963: loss=0.1989, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:35:02.456894+0800 | INFO | Step 1964: loss=0.0671, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:35:03.896995+0800 | INFO | Step 1965: loss=0.2299, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:35:05.321416+0800 | INFO | Step 1966: loss=0.1157, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:35:06.749583+0800 | INFO | Step 1967: loss=0.1205, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:35:08.196133+0800 | INFO | Step 1968: loss=0.1370, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:35:09.654886+0800 | INFO | Step 1969: loss=0.1877, acc=0.906 (IF=0.875, MQ=0.938)
2025-12-21T22:35:11.092135+0800 | INFO | Step 1970: loss=0.1917, acc=0.875 (IF=0.875, MQ=0.875)
2025-12-21T22:35:12.115473+0800 | INFO | Step 1971: loss=0.0514, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:35:13.567411+0800 | INFO | Step 1972: loss=0.0066, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:35:15.015964+0800 | INFO | Step 1973: loss=0.0249, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:35:16.469653+0800 | INFO | Step 1974: loss=0.1329, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:35:17.507120+0800 | INFO | Step 1975: loss=0.0897, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:35:19.005099+0800 | INFO | Step 1976: loss=0.2260, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:35:20.454462+0800 | INFO | Step 1977: loss=0.2618, acc=0.900 (IF=0.800, MQ=1.000)
2025-12-21T22:35:21.901164+0800 | INFO | Step 1978: loss=0.0205, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:35:23.338426+0800 | INFO | Step 1979: loss=0.0874, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:35:24.786604+0800 | INFO | Step 1980: loss=0.1501, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:35:25.844223+0800 | INFO | Step 1981: loss=0.0656, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:35:27.272466+0800 | INFO | Step 1982: loss=0.1134, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:35:28.718394+0800 | INFO | Step 1983: loss=0.0357, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:35:30.166562+0800 | INFO | Step 1984: loss=0.2971, acc=0.844 (IF=0.750, MQ=0.938)
2025-12-21T22:35:31.602206+0800 | INFO | Step 1985: loss=0.4251, acc=0.828 (IF=0.923, MQ=0.733)
2025-12-21T22:35:33.029703+0800 | INFO | Step 1986: loss=0.1135, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:35:34.481586+0800 | INFO | Step 1987: loss=0.0561, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:35:35.920836+0800 | INFO | Step 1988: loss=0.1077, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:35:37.355896+0800 | INFO | Step 1989: loss=0.3820, acc=0.878 (IF=0.889, MQ=0.867)
2025-12-21T22:35:38.790376+0800 | INFO | Step 1990: loss=0.1533, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:35:40.227240+0800 | INFO | Step 1991: loss=0.1517, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:35:41.669735+0800 | INFO | Step 1992: loss=0.2707, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:35:43.120382+0800 | INFO | Step 1993: loss=0.0742, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:35:44.562350+0800 | INFO | Step 1994: loss=0.0812, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:35:45.981102+0800 | INFO | Step 1995: loss=0.0944, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:35:47.415680+0800 | INFO | Step 1996: loss=0.1263, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:35:48.843710+0800 | INFO | Step 1997: loss=0.0203, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:35:51.537299+0800 | INFO | Step 1998: loss=0.0270, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:35:52.655169+0800 | INFO | Step 1999: loss=0.0273, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:35:54.142607+0800 | INFO | Step 2000: loss=0.0704, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:36:01.532823+0800 | INFO |
============================================================
Validation Results (took 7.36s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6034
Quality Acc: 0.6625
Average Acc: 0.6330
Total Loss: 0.6629
Instruction Loss: 0.6725
Quality Loss: 0.6532
============================================================
2025-12-21T22:36:04.208185+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_1700.pt
2025-12-21T22:36:04.208776+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:36:04.208875+0800 | INFO | 1. Step 1000: acc=0.6369 (reward_model.best_1000.pt)
2025-12-21T22:36:04.208931+0800 | INFO | 2. Step 2000: acc=0.6330 (reward_model.best_2000.pt)
2025-12-21T22:36:04.208987+0800 | INFO | 3. Step 1600: acc=0.6306 (reward_model.best_1600.pt)
2025-12-21T22:36:06.568878+0800 | INFO | Step 2000: Saved to /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.2000.pt
2025-12-21T22:36:08.018642+0800 | INFO | Step 2001: loss=0.0869, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:36:09.465844+0800 | INFO | Step 2002: loss=0.0406, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:36:10.893371+0800 | INFO | Step 2003: loss=0.0588, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:36:12.320023+0800 | INFO | Step 2004: loss=0.0384, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:36:13.772616+0800 | INFO | Step 2005: loss=0.0079, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:36:15.209726+0800 | INFO | Step 2006: loss=0.0463, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:36:16.655481+0800 | INFO | Step 2007: loss=0.1737, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T22:36:18.100184+0800 | INFO | Step 2008: loss=0.1811, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T22:36:19.552231+0800 | INFO | Step 2009: loss=0.0200, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:36:20.599643+0800 | INFO | Step 2010: loss=0.0677, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:36:22.037703+0800 | INFO | Step 2011: loss=0.0506, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:36:23.459422+0800 | INFO | Step 2012: loss=0.2713, acc=0.844 (IF=0.875, MQ=0.812)
2025-12-21T22:36:24.949120+0800 | INFO | Step 2013: loss=0.0191, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:36:26.397938+0800 | INFO | Step 2014: loss=0.2306, acc=0.829 (IF=0.846, MQ=0.812)
2025-12-21T22:36:27.842246+0800 | INFO | Step 2015: loss=0.0178, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:36:29.278483+0800 | INFO | Step 2016: loss=0.1485, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:36:30.714873+0800 | INFO | Step 2017: loss=0.0070, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:36:32.168491+0800 | INFO | Step 2018: loss=0.3977, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T22:36:33.617420+0800 | INFO | Step 2019: loss=0.0551, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:36:35.086695+0800 | INFO | Step 2020: loss=0.0078, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:36:36.533792+0800 | INFO | Step 2021: loss=0.0976, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:36:37.988362+0800 | INFO | Step 2022: loss=0.0232, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:36:39.444975+0800 | INFO | Step 2023: loss=0.0682, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:36:40.915467+0800 | INFO | Step 2024: loss=0.2197, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:36:42.364957+0800 | INFO | Step 2025: loss=0.1340, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T22:36:43.816181+0800 | INFO | Step 2026: loss=0.1595, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T22:36:45.289874+0800 | INFO | Step 2027: loss=0.0865, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:36:46.741445+0800 | INFO | Step 2028: loss=0.1525, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:36:48.186326+0800 | INFO | Step 2029: loss=0.2014, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T22:36:49.614128+0800 | INFO | Step 2030: loss=0.1836, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:36:51.071788+0800 | INFO | Step 2031: loss=0.6743, acc=0.693 (IF=0.636, MQ=0.750)
2025-12-21T22:36:52.500416+0800 | INFO | Step 2032: loss=0.1701, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:36:53.936838+0800 | INFO | Step 2033: loss=0.1283, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:36:55.373065+0800 | INFO | Step 2034: loss=0.1694, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:36:56.818567+0800 | INFO | Step 2035: loss=0.0646, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:36:58.265340+0800 | INFO | Step 2036: loss=0.1732, acc=0.909 (IF=0.818, MQ=1.000)
2025-12-21T22:36:59.712492+0800 | INFO | Step 2037: loss=0.0465, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:37:01.171009+0800 | INFO | Step 2038: loss=0.1314, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:37:02.674429+0800 | INFO | Step 2039: loss=0.1252, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:37:04.183682+0800 | INFO | Step 2040: loss=0.0833, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:37:05.640875+0800 | INFO | Step 2041: loss=0.0771, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:37:07.110556+0800 | INFO | Step 2042: loss=0.0794, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:37:08.607788+0800 | INFO | Step 2043: loss=0.0765, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:37:09.631380+0800 | INFO | Step 2044: loss=0.1747, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:37:11.066631+0800 | INFO | Step 2045: loss=0.1188, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:37:12.495623+0800 | INFO | Step 2046: loss=0.3780, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T22:37:13.939360+0800 | INFO | Step 2047: loss=0.1572, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:37:15.383955+0800 | INFO | Step 2048: loss=0.1326, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:37:16.822932+0800 | INFO | Step 2049: loss=0.0463, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:37:18.261122+0800 | INFO | Step 2050: loss=0.1537, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:37:19.736621+0800 | INFO | Step 2051: loss=0.2634, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:37:21.185296+0800 | INFO | Step 2052: loss=0.2248, acc=0.826 (IF=0.778, MQ=0.875)
2025-12-21T22:37:23.936371+0800 | INFO | Step 2053: loss=0.1093, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:37:25.400191+0800 | INFO | Step 2054: loss=0.0697, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T22:37:26.922843+0800 | INFO | Step 2055: loss=0.0864, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:37:28.398005+0800 | INFO | Step 2056: loss=0.0288, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:37:29.847346+0800 | INFO | Step 2057: loss=0.1019, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:37:31.287660+0800 | INFO | Step 2058: loss=0.1145, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:37:32.740299+0800 | INFO | Step 2059: loss=0.0433, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:37:34.193124+0800 | INFO | Step 2060: loss=0.0042, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:37:35.663497+0800 | INFO | Step 2061: loss=0.0392, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:37:37.128607+0800 | INFO | Step 2062: loss=0.0563, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:37:38.584799+0800 | INFO | Step 2063: loss=0.1046, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:37:40.028827+0800 | INFO | Step 2064: loss=0.2036, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:37:41.490890+0800 | INFO | Step 2065: loss=0.0343, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:37:42.960452+0800 | INFO | Step 2066: loss=0.0292, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:37:44.445913+0800 | INFO | Step 2067: loss=0.2643, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T22:37:45.957440+0800 | INFO | Step 2068: loss=0.0969, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:37:47.435462+0800 | INFO | Step 2069: loss=0.1132, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:37:48.887122+0800 | INFO | Step 2070: loss=0.1147, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:37:50.380453+0800 | INFO | Step 2071: loss=0.1329, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:37:51.812039+0800 | INFO | Step 2072: loss=0.0441, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:37:53.264982+0800 | INFO | Step 2073: loss=0.0445, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:37:54.716665+0800 | INFO | Step 2074: loss=0.3609, acc=0.904 (IF=0.875, MQ=0.933)
2025-12-21T22:37:56.163189+0800 | INFO | Step 2075: loss=0.0993, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:37:57.622499+0800 | INFO | Step 2076: loss=0.0901, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:37:59.113730+0800 | INFO | Step 2077: loss=0.1671, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:38:00.573454+0800 | INFO | Step 2078: loss=0.0259, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:38:02.009717+0800 | INFO | Step 2079: loss=0.1190, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:38:03.450249+0800 | INFO | Step 2080: loss=0.0503, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:38:04.894252+0800 | INFO | Step 2081: loss=0.1396, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T22:38:06.327978+0800 | INFO | Step 2082: loss=0.0360, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:38:07.773302+0800 | INFO | Step 2083: loss=0.1187, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T22:38:09.222617+0800 | INFO | Step 2084: loss=0.0462, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:38:10.686071+0800 | INFO | Step 2085: loss=0.1415, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T22:38:12.153967+0800 | INFO | Step 2086: loss=0.0338, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:38:13.623872+0800 | INFO | Step 2087: loss=0.0562, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:38:15.069060+0800 | INFO | Step 2088: loss=0.0337, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:38:16.516499+0800 | INFO | Step 2089: loss=0.0048, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:38:17.967904+0800 | INFO | Step 2090: loss=0.6838, acc=0.781 (IF=0.625, MQ=0.938)
2025-12-21T22:38:19.431591+0800 | INFO | Step 2091: loss=0.1327, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:38:20.905933+0800 | INFO | Step 2092: loss=0.0844, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:38:22.354287+0800 | INFO | Step 2093: loss=0.0232, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:38:23.831944+0800 | INFO | Step 2094: loss=0.2087, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T22:38:25.269375+0800 | INFO | Step 2095: loss=0.4782, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T22:38:26.704069+0800 | INFO | Step 2096: loss=0.1832, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:38:28.142766+0800 | INFO | Step 2097: loss=0.2511, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:38:29.577658+0800 | INFO | Step 2098: loss=0.0129, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:38:31.028575+0800 | INFO | Step 2099: loss=0.3587, acc=0.833 (IF=0.667, MQ=1.000)
2025-12-21T22:38:32.467438+0800 | INFO | Step 2100: loss=0.1122, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:38:40.142469+0800 | INFO |
============================================================
Validation Results (took 7.65s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6034
Quality Acc: 0.6250
Average Acc: 0.6142
Total Loss: 0.6591
Instruction Loss: 0.6699
Quality Loss: 0.6483
============================================================
2025-12-21T22:38:42.880587+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_2100.pt
2025-12-21T22:38:42.881254+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:38:42.881360+0800 | INFO | 1. Step 1000: acc=0.6369 (reward_model.best_1000.pt)
2025-12-21T22:38:42.881419+0800 | INFO | 2. Step 2000: acc=0.6330 (reward_model.best_2000.pt)
2025-12-21T22:38:42.881468+0800 | INFO | 3. Step 1600: acc=0.6306 (reward_model.best_1600.pt)
2025-12-21T22:38:44.348902+0800 | INFO | Step 2101: loss=0.1600, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:38:45.775810+0800 | INFO | Step 2102: loss=0.0625, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:38:47.225588+0800 | INFO | Step 2103: loss=0.0632, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:38:48.282775+0800 | INFO | Step 2104: loss=0.0654, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:38:49.728837+0800 | INFO | Step 2105: loss=0.0735, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:38:50.783097+0800 | INFO | Step 2106: loss=0.2804, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:38:52.226173+0800 | INFO | Step 2107: loss=0.0339, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:38:53.664110+0800 | INFO | Step 2108: loss=0.2733, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T22:38:56.400366+0800 | INFO | Step 2109: loss=0.0607, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:38:57.855394+0800 | INFO | Step 2110: loss=0.0682, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:38:59.375874+0800 | INFO | Step 2111: loss=0.1065, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:39:00.864119+0800 | INFO | Step 2112: loss=0.0358, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:39:02.289438+0800 | INFO | Step 2113: loss=0.0751, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:39:03.723477+0800 | INFO | Step 2114: loss=0.0345, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:39:05.153046+0800 | INFO | Step 2115: loss=0.0716, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:39:06.598232+0800 | INFO | Step 2116: loss=0.0568, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:39:08.040665+0800 | INFO | Step 2117: loss=0.2028, acc=0.929 (IF=0.857, MQ=1.000)
2025-12-21T22:39:09.503959+0800 | INFO | Step 2118: loss=0.2744, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T22:39:10.948368+0800 | INFO | Step 2119: loss=0.2583, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:39:12.373584+0800 | INFO | Step 2120: loss=0.1473, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:39:13.798726+0800 | INFO | Step 2121: loss=0.0176, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:39:15.231758+0800 | INFO | Step 2122: loss=0.0252, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:39:16.267431+0800 | INFO | Step 2123: loss=0.0399, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:39:17.708201+0800 | INFO | Step 2124: loss=0.1874, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T22:39:19.149492+0800 | INFO | Step 2125: loss=0.0300, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:39:20.609196+0800 | INFO | Step 2126: loss=0.2033, acc=0.875 (IF=1.000, MQ=0.750)
2025-12-21T22:39:22.056541+0800 | INFO | Step 2127: loss=0.0210, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:39:23.503906+0800 | INFO | Step 2128: loss=0.0919, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:39:24.957685+0800 | INFO | Step 2129: loss=0.2537, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:39:26.398918+0800 | INFO | Step 2130: loss=0.0940, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:39:27.834803+0800 | INFO | Step 2131: loss=0.0793, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:39:29.348859+0800 | INFO | Step 2132: loss=0.0537, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:39:30.778582+0800 | INFO | Step 2133: loss=0.0892, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:39:32.222373+0800 | INFO | Step 2134: loss=0.1152, acc=0.929 (IF=0.857, MQ=1.000)
2025-12-21T22:39:33.663340+0800 | INFO | Step 2135: loss=0.0737, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:39:35.109527+0800 | INFO | Step 2136: loss=0.1888, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:39:36.563150+0800 | INFO | Step 2137: loss=0.0120, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:39:38.017167+0800 | INFO | Step 2138: loss=0.1231, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:39:39.467177+0800 | INFO | Step 2139: loss=0.0669, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:39:40.922075+0800 | INFO | Step 2140: loss=0.3195, acc=0.838 (IF=0.800, MQ=0.875)
2025-12-21T22:39:42.355900+0800 | INFO | Step 2141: loss=0.1415, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:39:43.796693+0800 | INFO | Step 2142: loss=0.0278, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:39:45.268885+0800 | INFO | Step 2143: loss=0.3029, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T22:39:46.717724+0800 | INFO | Step 2144: loss=0.0049, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:39:47.754435+0800 | INFO | Step 2145: loss=0.1445, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T22:39:49.184180+0800 | INFO | Step 2146: loss=0.0598, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:39:50.619952+0800 | INFO | Step 2147: loss=0.1750, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:39:52.064881+0800 | INFO | Step 2148: loss=0.0499, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:39:53.520335+0800 | INFO | Step 2149: loss=0.1361, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:39:54.954398+0800 | INFO | Step 2150: loss=0.0652, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:39:56.387973+0800 | INFO | Step 2151: loss=0.0258, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:39:57.828427+0800 | INFO | Step 2152: loss=0.0227, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:39:59.268574+0800 | INFO | Step 2153: loss=0.0557, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:40:00.736931+0800 | INFO | Step 2154: loss=0.0924, acc=0.969 (IF=0.938, MQ=1.000)
2025-12-21T22:40:02.173354+0800 | INFO | Step 2155: loss=0.1795, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:40:03.603466+0800 | INFO | Step 2156: loss=0.1642, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:40:05.041255+0800 | INFO | Step 2157: loss=0.3293, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:40:06.474264+0800 | INFO | Step 2158: loss=0.1730, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T22:40:07.899341+0800 | INFO | Step 2159: loss=0.2965, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:40:09.330713+0800 | INFO | Step 2160: loss=0.0728, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:40:10.759490+0800 | INFO | Step 2161: loss=0.0418, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:40:12.181190+0800 | INFO | Step 2162: loss=0.1058, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:40:13.605949+0800 | INFO | Step 2163: loss=0.1977, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:40:16.400176+0800 | INFO | Step 2164: loss=0.0334, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:40:17.852559+0800 | INFO | Step 2165: loss=0.0324, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:40:19.327902+0800 | INFO | Step 2166: loss=0.0132, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:40:20.363178+0800 | INFO | Step 2167: loss=0.0357, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:40:21.415134+0800 | INFO | Step 2168: loss=0.0332, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:40:22.870837+0800 | INFO | Step 2169: loss=0.0323, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:40:24.338578+0800 | INFO | Step 2170: loss=0.0544, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:40:25.431483+0800 | INFO | Step 2171: loss=0.5247, acc=0.771 (IF=0.667, MQ=0.875)
2025-12-21T22:40:26.920308+0800 | INFO | Step 2172: loss=0.0497, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:40:28.371832+0800 | INFO | Step 2173: loss=0.0425, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:40:29.825993+0800 | INFO | Step 2174: loss=0.0310, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:40:31.301002+0800 | INFO | Step 2175: loss=0.0109, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:40:32.748993+0800 | INFO | Step 2176: loss=0.0782, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:40:34.197087+0800 | INFO | Step 2177: loss=0.0694, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:40:35.649992+0800 | INFO | Step 2178: loss=0.0041, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:40:37.107791+0800 | INFO | Step 2179: loss=0.1212, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:40:38.573701+0800 | INFO | Step 2180: loss=0.0430, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:40:40.011572+0800 | INFO | Step 2181: loss=0.1614, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:40:41.448271+0800 | INFO | Step 2182: loss=0.0382, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:40:42.892742+0800 | INFO | Step 2183: loss=0.1718, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:40:44.333439+0800 | INFO | Step 2184: loss=0.0318, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:40:45.779837+0800 | INFO | Step 2185: loss=0.0302, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:40:47.231669+0800 | INFO | Step 2186: loss=0.0928, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T22:40:48.292680+0800 | INFO | Step 2187: loss=0.1919, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:40:49.742386+0800 | INFO | Step 2188: loss=0.2247, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:40:51.217429+0800 | INFO | Step 2189: loss=0.0478, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:40:52.667925+0800 | INFO | Step 2190: loss=0.0646, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:40:54.127401+0800 | INFO | Step 2191: loss=0.0519, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:40:55.555696+0800 | INFO | Step 2192: loss=0.1101, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:40:56.994184+0800 | INFO | Step 2193: loss=0.0091, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:40:58.439809+0800 | INFO | Step 2194: loss=0.1753, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:40:59.862144+0800 | INFO | Step 2195: loss=0.0766, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:41:01.307535+0800 | INFO | Step 2196: loss=0.0841, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:41:02.743528+0800 | INFO | Step 2197: loss=0.1260, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:41:03.796787+0800 | INFO | Step 2198: loss=0.0225, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:41:05.227901+0800 | INFO | Step 2199: loss=0.1332, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T22:41:06.671400+0800 | INFO | Step 2200: loss=0.2086, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:41:14.342685+0800 | INFO |
============================================================
Validation Results (took 7.65s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.5862
Quality Acc: 0.6375
Average Acc: 0.6119
Total Loss: 0.6616
Instruction Loss: 0.6753
Quality Loss: 0.6478
============================================================
2025-12-21T22:41:16.955867+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_2200.pt
2025-12-21T22:41:16.956499+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:41:16.956617+0800 | INFO | 1. Step 1000: acc=0.6369 (reward_model.best_1000.pt)
2025-12-21T22:41:16.956679+0800 | INFO | 2. Step 2000: acc=0.6330 (reward_model.best_2000.pt)
2025-12-21T22:41:16.956740+0800 | INFO | 3. Step 1600: acc=0.6306 (reward_model.best_1600.pt)
2025-12-21T22:41:18.438184+0800 | INFO | Step 2201: loss=0.0980, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:41:19.883758+0800 | INFO | Step 2202: loss=0.0112, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:41:21.365323+0800 | INFO | Step 2203: loss=0.1945, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:41:22.818661+0800 | INFO | Step 2204: loss=0.1194, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:41:24.257888+0800 | INFO | Step 2205: loss=0.3722, acc=0.826 (IF=0.778, MQ=0.875)
2025-12-21T22:41:25.712175+0800 | INFO | Step 2206: loss=0.0961, acc=0.933 (IF=1.000, MQ=0.867)
2025-12-21T22:41:27.168615+0800 | INFO | Step 2207: loss=0.2509, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T22:41:28.616513+0800 | INFO | Step 2208: loss=0.2765, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:41:29.702862+0800 | INFO | Step 2209: loss=0.0968, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:41:30.910269+0800 | INFO | Step 2210: loss=0.0460, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:41:32.356542+0800 | INFO | Step 2211: loss=0.1033, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:41:33.811223+0800 | INFO | Step 2212: loss=0.0226, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:41:35.261429+0800 | INFO | Step 2213: loss=0.6453, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:41:36.698812+0800 | INFO | Step 2214: loss=0.3229, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:41:38.137170+0800 | INFO | Step 2215: loss=0.1014, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:41:39.587531+0800 | INFO | Step 2216: loss=0.0262, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:41:41.066115+0800 | INFO | Step 2217: loss=0.3112, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:41:42.523691+0800 | INFO | Step 2218: loss=0.1845, acc=0.923 (IF=0.846, MQ=1.000)
2025-12-21T22:41:44.032202+0800 | INFO | Step 2219: loss=0.2304, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:41:46.666530+0800 | INFO | Step 2220: loss=0.1596, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:41:48.136936+0800 | INFO | Step 2221: loss=0.1243, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:41:49.595829+0800 | INFO | Step 2222: loss=0.0362, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:41:51.073354+0800 | INFO | Step 2223: loss=0.1261, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T22:41:52.534616+0800 | INFO | Step 2224: loss=0.3369, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T22:41:53.567353+0800 | INFO | Step 2225: loss=0.5572, acc=0.770 (IF=0.727, MQ=0.812)
2025-12-21T22:41:55.038984+0800 | INFO | Step 2226: loss=0.0298, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:41:56.507669+0800 | INFO | Step 2227: loss=0.2248, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:41:57.952037+0800 | INFO | Step 2228: loss=0.0778, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:41:59.401793+0800 | INFO | Step 2229: loss=0.0640, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:42:00.909838+0800 | INFO | Step 2230: loss=0.0170, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:42:02.409806+0800 | INFO | Step 2231: loss=0.0975, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:42:03.855767+0800 | INFO | Step 2232: loss=0.0814, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:42:05.300495+0800 | INFO | Step 2233: loss=0.1933, acc=0.935 (IF=0.933, MQ=0.938)
2025-12-21T22:42:06.776827+0800 | INFO | Step 2234: loss=0.0153, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:42:08.209701+0800 | INFO | Step 2235: loss=0.2754, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:42:09.652395+0800 | INFO | Step 2236: loss=0.0228, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:42:11.121134+0800 | INFO | Step 2237: loss=0.0261, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:42:12.570088+0800 | INFO | Step 2238: loss=0.1040, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:42:14.023622+0800 | INFO | Step 2239: loss=0.1285, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:42:15.490234+0800 | INFO | Step 2240: loss=0.0711, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:42:16.522067+0800 | INFO | Step 2241: loss=0.0397, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:42:17.955421+0800 | INFO | Step 2242: loss=0.0172, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:42:19.400612+0800 | INFO | Step 2243: loss=0.0397, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:42:20.850010+0800 | INFO | Step 2244: loss=0.1987, acc=0.906 (IF=0.938, MQ=0.875)
2025-12-21T22:42:22.300780+0800 | INFO | Step 2245: loss=0.0127, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:42:23.754422+0800 | INFO | Step 2246: loss=0.0960, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:42:24.790042+0800 | INFO | Step 2247: loss=0.0315, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:42:26.235754+0800 | INFO | Step 2248: loss=0.0628, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:42:27.750804+0800 | INFO | Step 2249: loss=0.0624, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:42:29.204037+0800 | INFO | Step 2250: loss=0.0436, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:42:30.668617+0800 | INFO | Step 2251: loss=0.0518, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:42:32.138066+0800 | INFO | Step 2252: loss=0.1076, acc=0.909 (IF=0.818, MQ=1.000)
2025-12-21T22:42:33.586893+0800 | INFO | Step 2253: loss=0.1646, acc=0.902 (IF=0.929, MQ=0.875)
2025-12-21T22:42:35.029100+0800 | INFO | Step 2254: loss=0.1334, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:42:36.470833+0800 | INFO | Step 2255: loss=0.1160, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:42:37.911448+0800 | INFO | Step 2256: loss=0.4090, acc=0.856 (IF=0.900, MQ=0.812)
2025-12-21T22:42:39.362071+0800 | INFO | Step 2257: loss=0.1659, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T22:42:40.807990+0800 | INFO | Step 2258: loss=0.3032, acc=0.855 (IF=0.909, MQ=0.800)
2025-12-21T22:42:42.244333+0800 | INFO | Step 2259: loss=0.2061, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:42:43.693836+0800 | INFO | Step 2260: loss=0.0195, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:42:45.149263+0800 | INFO | Step 2261: loss=0.0172, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:42:46.644069+0800 | INFO | Step 2262: loss=0.1786, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:42:48.133889+0800 | INFO | Step 2263: loss=0.0339, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:42:49.589943+0800 | INFO | Step 2264: loss=0.3900, acc=0.833 (IF=0.917, MQ=0.750)
2025-12-21T22:42:51.028795+0800 | INFO | Step 2265: loss=0.0918, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:42:52.483968+0800 | INFO | Step 2266: loss=0.0965, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:42:53.558951+0800 | INFO | Step 2267: loss=0.0326, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:42:55.000708+0800 | INFO | Step 2268: loss=0.0754, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:42:56.436702+0800 | INFO | Step 2269: loss=0.0425, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:42:57.876480+0800 | INFO | Step 2270: loss=0.2058, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T22:42:59.322273+0800 | INFO | Step 2271: loss=0.2513, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T22:43:00.372551+0800 | INFO | Step 2272: loss=0.1582, acc=0.906 (IF=0.875, MQ=0.938)
2025-12-21T22:43:01.815857+0800 | INFO | Step 2273: loss=0.0704, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:43:03.285654+0800 | INFO | Step 2274: loss=0.0470, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:43:06.168314+0800 | INFO | Step 2275: loss=0.0558, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:43:07.598894+0800 | INFO | Step 2276: loss=0.0765, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:43:09.106141+0800 | INFO | Step 2277: loss=0.0891, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:43:10.556528+0800 | INFO | Step 2278: loss=0.1341, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:43:12.094258+0800 | INFO | Step 2279: loss=0.0416, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:43:13.522999+0800 | INFO | Step 2280: loss=0.0641, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:43:14.976069+0800 | INFO | Step 2281: loss=0.1065, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:43:16.433640+0800 | INFO | Step 2282: loss=0.0502, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:43:17.861726+0800 | INFO | Step 2283: loss=0.0162, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:43:19.310544+0800 | INFO | Step 2284: loss=0.0711, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:43:20.757008+0800 | INFO | Step 2285: loss=0.0531, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:43:22.221000+0800 | INFO | Step 2286: loss=0.2908, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:43:23.681412+0800 | INFO | Step 2287: loss=0.0935, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:43:25.140749+0800 | INFO | Step 2288: loss=0.0712, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:43:26.658071+0800 | INFO | Step 2289: loss=0.0464, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:43:28.108811+0800 | INFO | Step 2290: loss=0.0196, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:43:29.545912+0800 | INFO | Step 2291: loss=0.3077, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T22:43:30.975776+0800 | INFO | Step 2292: loss=0.0437, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:43:32.404410+0800 | INFO | Step 2293: loss=0.0586, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:43:33.845461+0800 | INFO | Step 2294: loss=0.0094, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:43:35.313914+0800 | INFO | Step 2295: loss=0.0384, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:43:36.773634+0800 | INFO | Step 2296: loss=0.4384, acc=0.861 (IF=0.909, MQ=0.812)
2025-12-21T22:43:38.232555+0800 | INFO | Step 2297: loss=0.0230, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:43:39.683107+0800 | INFO | Step 2298: loss=0.1602, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:43:41.132796+0800 | INFO | Step 2299: loss=0.1263, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:43:42.586496+0800 | INFO | Step 2300: loss=0.0413, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:43:50.173764+0800 | INFO |
============================================================
Validation Results (took 7.56s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6034
Quality Acc: 0.6750
Average Acc: 0.6392
Total Loss: 0.6548
Instruction Loss: 0.6689
Quality Loss: 0.6408
============================================================
2025-12-21T22:43:52.887040+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_1600.pt
2025-12-21T22:43:52.887636+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:43:52.887736+0800 | INFO | 1. Step 2300: acc=0.6392 (reward_model.best_2300.pt)
2025-12-21T22:43:52.887796+0800 | INFO | 2. Step 1000: acc=0.6369 (reward_model.best_1000.pt)
2025-12-21T22:43:52.887848+0800 | INFO | 3. Step 2000: acc=0.6330 (reward_model.best_2000.pt)
2025-12-21T22:43:54.392780+0800 | INFO | Step 2301: loss=0.0501, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:43:55.871237+0800 | INFO | Step 2302: loss=0.0075, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:43:57.334217+0800 | INFO | Step 2303: loss=0.0697, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:43:58.790486+0800 | INFO | Step 2304: loss=0.1297, acc=0.917 (IF=0.833, MQ=1.000)
2025-12-21T22:44:00.232225+0800 | INFO | Step 2305: loss=0.0126, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:44:01.691906+0800 | INFO | Step 2306: loss=0.4171, acc=0.853 (IF=0.769, MQ=0.938)
2025-12-21T22:44:03.172453+0800 | INFO | Step 2307: loss=0.1121, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:44:04.612898+0800 | INFO | Step 2308: loss=0.0068, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:44:06.059759+0800 | INFO | Step 2309: loss=0.0520, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:44:07.516051+0800 | INFO | Step 2310: loss=0.1388, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:44:08.556190+0800 | INFO | Step 2311: loss=0.1524, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:44:10.003866+0800 | INFO | Step 2312: loss=0.1561, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:44:11.452374+0800 | INFO | Step 2313: loss=0.3502, acc=0.822 (IF=0.769, MQ=0.875)
2025-12-21T22:44:12.904130+0800 | INFO | Step 2314: loss=0.0794, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:44:14.375064+0800 | INFO | Step 2315: loss=0.1638, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:44:15.821349+0800 | INFO | Step 2316: loss=0.1030, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:44:17.277098+0800 | INFO | Step 2317: loss=0.0773, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:44:18.706940+0800 | INFO | Step 2318: loss=0.1662, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T22:44:20.158717+0800 | INFO | Step 2319: loss=0.0619, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:44:21.597499+0800 | INFO | Step 2320: loss=0.1509, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:44:23.052550+0800 | INFO | Step 2321: loss=0.0931, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:44:24.496402+0800 | INFO | Step 2322: loss=0.1048, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:44:25.939723+0800 | INFO | Step 2323: loss=0.5784, acc=0.829 (IF=0.846, MQ=0.812)
2025-12-21T22:44:27.380486+0800 | INFO | Step 2324: loss=0.0127, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:44:28.818620+0800 | INFO | Step 2325: loss=0.0865, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:44:29.852390+0800 | INFO | Step 2326: loss=0.1184, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:44:31.335540+0800 | INFO | Step 2327: loss=0.1561, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:44:32.780367+0800 | INFO | Step 2328: loss=0.2918, acc=0.906 (IF=0.875, MQ=0.938)
2025-12-21T22:44:34.227955+0800 | INFO | Step 2329: loss=0.1000, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:44:35.653209+0800 | INFO | Step 2330: loss=0.0144, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:44:38.204348+0800 | INFO | Step 2331: loss=0.1275, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:44:39.671316+0800 | INFO | Step 2332: loss=0.1642, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:44:41.138884+0800 | INFO | Step 2333: loss=0.1494, acc=0.890 (IF=0.923, MQ=0.857)
2025-12-21T22:44:42.696802+0800 | INFO | Step 2334: loss=0.2773, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:44:44.158386+0800 | INFO | Step 2335: loss=0.0734, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:44:45.599275+0800 | INFO | Step 2336: loss=0.0430, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:44:47.065551+0800 | INFO | Step 2337: loss=0.0649, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T22:44:48.524367+0800 | INFO | Step 2338: loss=0.1135, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:44:50.011496+0800 | INFO | Step 2339: loss=0.1326, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:44:51.495489+0800 | INFO | Step 2340: loss=0.0351, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:44:52.933187+0800 | INFO | Step 2341: loss=0.0570, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:44:54.365919+0800 | INFO | Step 2342: loss=0.0452, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:44:55.810173+0800 | INFO | Step 2343: loss=0.1338, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:44:57.251150+0800 | INFO | Step 2344: loss=0.2590, acc=0.844 (IF=0.750, MQ=0.938)
2025-12-21T22:44:58.697843+0800 | INFO | Step 2345: loss=0.0727, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:45:00.127422+0800 | INFO | Step 2346: loss=0.0703, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:45:01.564785+0800 | INFO | Step 2347: loss=0.1897, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T22:45:03.028882+0800 | INFO | Step 2348: loss=0.0857, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:45:04.494435+0800 | INFO | Step 2349: loss=0.1909, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:45:05.979903+0800 | INFO | Step 2350: loss=0.1193, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:45:07.415751+0800 | INFO | Step 2351: loss=0.1106, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:45:08.853286+0800 | INFO | Step 2352: loss=0.0377, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:45:10.293433+0800 | INFO | Step 2353: loss=0.0466, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:45:11.735791+0800 | INFO | Step 2354: loss=0.0302, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:45:13.183928+0800 | INFO | Step 2355: loss=0.0610, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:45:14.635532+0800 | INFO | Step 2356: loss=0.1499, acc=0.925 (IF=0.917, MQ=0.933)
2025-12-21T22:45:16.074400+0800 | INFO | Step 2357: loss=0.2546, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T22:45:17.503520+0800 | INFO | Step 2358: loss=0.0748, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:45:18.937867+0800 | INFO | Step 2359: loss=0.0717, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:45:20.378230+0800 | INFO | Step 2360: loss=0.0309, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:45:21.833743+0800 | INFO | Step 2361: loss=0.0778, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:45:23.265226+0800 | INFO | Step 2362: loss=0.2923, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:45:24.704262+0800 | INFO | Step 2363: loss=0.0770, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:45:26.137355+0800 | INFO | Step 2364: loss=0.0937, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:45:27.175068+0800 | INFO | Step 2365: loss=0.0794, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:45:28.618766+0800 | INFO | Step 2366: loss=0.0331, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:45:30.089180+0800 | INFO | Step 2367: loss=0.1030, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:45:31.119349+0800 | INFO | Step 2368: loss=0.0472, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:45:32.164778+0800 | INFO | Step 2369: loss=0.1803, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:45:33.639682+0800 | INFO | Step 2370: loss=0.1502, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:45:35.083948+0800 | INFO | Step 2371: loss=0.0118, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:45:36.530054+0800 | INFO | Step 2372: loss=0.2992, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T22:45:37.970996+0800 | INFO | Step 2373: loss=0.0951, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:45:39.444967+0800 | INFO | Step 2374: loss=0.1520, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:45:40.867880+0800 | INFO | Step 2375: loss=0.0350, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:45:42.311210+0800 | INFO | Step 2376: loss=0.1435, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:45:43.749644+0800 | INFO | Step 2377: loss=0.2767, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:45:45.193919+0800 | INFO | Step 2378: loss=0.0897, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:45:46.635999+0800 | INFO | Step 2379: loss=0.0540, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:45:47.699505+0800 | INFO | Step 2380: loss=0.0670, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:45:49.170784+0800 | INFO | Step 2381: loss=0.0776, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:45:50.242173+0800 | INFO | Step 2382: loss=0.0150, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:45:51.697487+0800 | INFO | Step 2383: loss=0.0125, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:45:53.140766+0800 | INFO | Step 2384: loss=0.1419, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:45:54.581425+0800 | INFO | Step 2385: loss=0.0220, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:45:57.197167+0800 | INFO | Step 2386: loss=0.0631, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:45:58.675922+0800 | INFO | Step 2387: loss=0.0553, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:46:00.142131+0800 | INFO | Step 2388: loss=0.0390, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:46:01.581907+0800 | INFO | Step 2389: loss=0.1571, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:46:03.026865+0800 | INFO | Step 2390: loss=0.3359, acc=0.906 (IF=0.875, MQ=0.938)
2025-12-21T22:46:04.479960+0800 | INFO | Step 2391: loss=0.0308, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:46:05.957316+0800 | INFO | Step 2392: loss=0.1692, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:46:07.011580+0800 | INFO | Step 2393: loss=0.0231, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:46:08.457951+0800 | INFO | Step 2394: loss=0.0631, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:46:09.880333+0800 | INFO | Step 2395: loss=0.0152, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:46:11.324191+0800 | INFO | Step 2396: loss=0.0270, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:46:12.751392+0800 | INFO | Step 2397: loss=0.0846, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:46:14.196789+0800 | INFO | Step 2398: loss=0.2427, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:46:15.640248+0800 | INFO | Step 2399: loss=0.1337, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:46:17.074201+0800 | INFO | Step 2400: loss=0.0231, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:46:24.625037+0800 | INFO |
============================================================
Validation Results (took 7.53s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6034
Quality Acc: 0.6500
Average Acc: 0.6267
Total Loss: 0.6548
Instruction Loss: 0.6683
Quality Loss: 0.6414
============================================================
2025-12-21T22:46:27.922918+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_2400.pt
2025-12-21T22:46:27.923549+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:46:27.923692+0800 | INFO | 1. Step 2300: acc=0.6392 (reward_model.best_2300.pt)
2025-12-21T22:46:27.923776+0800 | INFO | 2. Step 1000: acc=0.6369 (reward_model.best_1000.pt)
2025-12-21T22:46:27.923838+0800 | INFO | 3. Step 2000: acc=0.6330 (reward_model.best_2000.pt)
2025-12-21T22:46:29.516250+0800 | INFO | Step 2401: loss=0.1385, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:46:30.627305+0800 | INFO | Step 2402: loss=0.0775, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:46:32.101891+0800 | INFO | Step 2403: loss=0.2960, acc=0.935 (IF=0.933, MQ=0.938)
2025-12-21T22:46:33.556726+0800 | INFO | Step 2404: loss=0.0070, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:46:34.997106+0800 | INFO | Step 2405: loss=0.4495, acc=0.861 (IF=0.846, MQ=0.875)
2025-12-21T22:46:36.431353+0800 | INFO | Step 2406: loss=0.0164, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:46:37.881502+0800 | INFO | Step 2407: loss=0.1259, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:46:39.369684+0800 | INFO | Step 2408: loss=0.2245, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:46:40.811026+0800 | INFO | Step 2409: loss=0.0660, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:46:42.258369+0800 | INFO | Step 2410: loss=0.1528, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:46:43.726554+0800 | INFO | Step 2411: loss=0.0903, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:46:45.187469+0800 | INFO | Step 2412: loss=0.0760, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:46:46.292685+0800 | INFO | Step 2413: loss=0.0883, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:46:47.724229+0800 | INFO | Step 2414: loss=0.0705, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:46:48.782277+0800 | INFO | Step 2415: loss=0.0441, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:46:50.225491+0800 | INFO | Step 2416: loss=0.1132, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:46:51.683337+0800 | INFO | Step 2417: loss=0.0167, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:46:53.131403+0800 | INFO | Step 2418: loss=0.1475, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:46:54.579643+0800 | INFO | Step 2419: loss=0.1329, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:46:56.034270+0800 | INFO | Step 2420: loss=0.0876, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:46:57.480353+0800 | INFO | Step 2421: loss=0.0572, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:46:58.933986+0800 | INFO | Step 2422: loss=0.1931, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:47:00.387210+0800 | INFO | Step 2423: loss=0.1290, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:47:01.839353+0800 | INFO | Step 2424: loss=0.1493, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:47:03.306759+0800 | INFO | Step 2425: loss=0.0243, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:47:04.750322+0800 | INFO | Step 2426: loss=0.0619, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:47:06.202599+0800 | INFO | Step 2427: loss=0.1349, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:47:07.649047+0800 | INFO | Step 2428: loss=0.6045, acc=0.861 (IF=0.846, MQ=0.875)
2025-12-21T22:47:09.088081+0800 | INFO | Step 2429: loss=0.0492, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:47:10.531902+0800 | INFO | Step 2430: loss=0.0294, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:47:11.973399+0800 | INFO | Step 2431: loss=0.1954, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:47:13.414343+0800 | INFO | Step 2432: loss=0.1159, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:47:14.894393+0800 | INFO | Step 2433: loss=0.0489, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:47:16.360772+0800 | INFO | Step 2434: loss=0.0261, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:47:17.828526+0800 | INFO | Step 2435: loss=0.2498, acc=0.856 (IF=0.900, MQ=0.812)
2025-12-21T22:47:19.271076+0800 | INFO | Step 2436: loss=0.0502, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:47:20.734505+0800 | INFO | Step 2437: loss=0.3175, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:47:22.211298+0800 | INFO | Step 2438: loss=0.0071, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:47:23.658788+0800 | INFO | Step 2439: loss=0.0963, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:47:25.105905+0800 | INFO | Step 2440: loss=0.0317, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:47:26.539075+0800 | INFO | Step 2441: loss=0.0334, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:47:29.290032+0800 | INFO | Step 2442: loss=0.2270, acc=0.897 (IF=0.857, MQ=0.938)
2025-12-21T22:47:30.769579+0800 | INFO | Step 2443: loss=0.0862, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:47:31.842564+0800 | INFO | Step 2444: loss=0.1672, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:47:33.301304+0800 | INFO | Step 2445: loss=0.3271, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:47:34.755470+0800 | INFO | Step 2446: loss=0.2952, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T22:47:36.246201+0800 | INFO | Step 2447: loss=0.0770, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:47:37.284097+0800 | INFO | Step 2448: loss=0.2359, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T22:47:38.723855+0800 | INFO | Step 2449: loss=0.1043, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:47:40.160671+0800 | INFO | Step 2450: loss=0.0468, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:47:41.598056+0800 | INFO | Step 2451: loss=0.1055, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:47:43.094081+0800 | INFO | Step 2452: loss=0.1225, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:47:44.584755+0800 | INFO | Step 2453: loss=0.0578, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:47:46.040239+0800 | INFO | Step 2454: loss=0.1508, acc=0.889 (IF=0.778, MQ=1.000)
2025-12-21T22:47:47.513409+0800 | INFO | Step 2455: loss=0.0402, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:47:48.975223+0800 | INFO | Step 2456: loss=0.0110, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:47:50.421023+0800 | INFO | Step 2457: loss=0.4113, acc=0.851 (IF=0.889, MQ=0.812)
2025-12-21T22:47:51.849371+0800 | INFO | Step 2458: loss=0.0174, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:47:52.878872+0800 | INFO | Step 2459: loss=0.0541, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:47:54.314533+0800 | INFO | Step 2460: loss=0.0690, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:47:55.754700+0800 | INFO | Step 2461: loss=0.5676, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:47:57.214421+0800 | INFO | Step 2462: loss=0.1933, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:47:58.669732+0800 | INFO | Step 2463: loss=0.3595, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:48:00.116992+0800 | INFO | Step 2464: loss=0.0868, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:48:01.573479+0800 | INFO | Step 2465: loss=0.1144, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:48:03.035681+0800 | INFO | Step 2466: loss=0.0180, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:48:04.483451+0800 | INFO | Step 2467: loss=0.1544, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:48:05.944465+0800 | INFO | Step 2468: loss=0.0429, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:48:07.402096+0800 | INFO | Step 2469: loss=0.1647, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:48:08.833356+0800 | INFO | Step 2470: loss=0.0619, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:48:10.276560+0800 | INFO | Step 2471: loss=0.2118, acc=0.875 (IF=0.875, MQ=0.875)
2025-12-21T22:48:11.705719+0800 | INFO | Step 2472: loss=0.0347, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:48:13.157614+0800 | INFO | Step 2473: loss=0.1018, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:48:14.595935+0800 | INFO | Step 2474: loss=0.0734, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:48:16.051905+0800 | INFO | Step 2475: loss=0.1376, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:48:17.498764+0800 | INFO | Step 2476: loss=0.0357, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:48:18.952817+0800 | INFO | Step 2477: loss=0.1110, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:48:20.425726+0800 | INFO | Step 2478: loss=0.0740, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:48:21.876755+0800 | INFO | Step 2479: loss=0.0749, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:48:23.326409+0800 | INFO | Step 2480: loss=0.1494, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:48:24.790346+0800 | INFO | Step 2481: loss=0.1816, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:48:26.230917+0800 | INFO | Step 2482: loss=0.0477, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:48:27.688920+0800 | INFO | Step 2483: loss=0.1754, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:48:29.130675+0800 | INFO | Step 2484: loss=0.0236, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:48:30.584789+0800 | INFO | Step 2485: loss=0.0643, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:48:32.048579+0800 | INFO | Step 2486: loss=0.2978, acc=0.917 (IF=0.833, MQ=1.000)
2025-12-21T22:48:33.532793+0800 | INFO | Step 2487: loss=0.0645, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:48:34.963223+0800 | INFO | Step 2488: loss=0.0328, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:48:36.403009+0800 | INFO | Step 2489: loss=0.1367, acc=0.889 (IF=0.778, MQ=1.000)
2025-12-21T22:48:37.835036+0800 | INFO | Step 2490: loss=0.1486, acc=0.917 (IF=0.833, MQ=1.000)
2025-12-21T22:48:39.266986+0800 | INFO | Step 2491: loss=0.0096, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:48:40.706007+0800 | INFO | Step 2492: loss=0.1378, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:48:42.149449+0800 | INFO | Step 2493: loss=0.1555, acc=0.933 (IF=1.000, MQ=0.867)
2025-12-21T22:48:43.586825+0800 | INFO | Step 2494: loss=0.1377, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:48:45.036649+0800 | INFO | Step 2495: loss=0.2107, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:48:46.477009+0800 | INFO | Step 2496: loss=0.0274, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:48:49.220562+0800 | INFO | Step 2497: loss=0.1658, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:48:50.660628+0800 | INFO | Step 2498: loss=0.0987, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:48:52.120874+0800 | INFO | Step 2499: loss=0.1292, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:48:53.591180+0800 | INFO | Step 2500: loss=0.1714, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:49:01.024286+0800 | INFO |
============================================================
Validation Results (took 7.41s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6207
Quality Acc: 0.6375
Average Acc: 0.6291
Total Loss: 0.6525
Instruction Loss: 0.6631
Quality Loss: 0.6419
============================================================
2025-12-21T22:49:03.702174+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_2500.pt
2025-12-21T22:49:03.702780+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:49:03.702881+0800 | INFO | 1. Step 2300: acc=0.6392 (reward_model.best_2300.pt)
2025-12-21T22:49:03.702938+0800 | INFO | 2. Step 1000: acc=0.6369 (reward_model.best_1000.pt)
2025-12-21T22:49:03.702983+0800 | INFO | 3. Step 2000: acc=0.6330 (reward_model.best_2000.pt)
2025-12-21T22:49:05.188265+0800 | INFO | Step 2501: loss=0.0161, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:49:06.625596+0800 | INFO | Step 2502: loss=0.0776, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:49:08.051229+0800 | INFO | Step 2503: loss=0.1798, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:49:09.484292+0800 | INFO | Step 2504: loss=0.0166, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:49:10.918880+0800 | INFO | Step 2505: loss=0.0136, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:49:12.341122+0800 | INFO | Step 2506: loss=0.0586, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:49:13.776977+0800 | INFO | Step 2507: loss=0.0274, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:49:15.204828+0800 | INFO | Step 2508: loss=0.0717, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:49:16.640647+0800 | INFO | Step 2509: loss=0.0083, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:49:18.099710+0800 | INFO | Step 2510: loss=0.1967, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:49:19.556888+0800 | INFO | Step 2511: loss=0.0067, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:49:21.006938+0800 | INFO | Step 2512: loss=0.0285, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:49:22.453258+0800 | INFO | Step 2513: loss=0.2707, acc=0.873 (IF=0.933, MQ=0.812)
2025-12-21T22:49:23.891788+0800 | INFO | Step 2514: loss=0.0076, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:49:25.363715+0800 | INFO | Step 2515: loss=0.0568, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:49:26.840329+0800 | INFO | Step 2516: loss=0.3220, acc=0.865 (IF=0.917, MQ=0.812)
2025-12-21T22:49:28.300789+0800 | INFO | Step 2517: loss=0.0909, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:49:29.746471+0800 | INFO | Step 2518: loss=0.0585, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:49:31.194649+0800 | INFO | Step 2519: loss=0.3106, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:49:32.636919+0800 | INFO | Step 2520: loss=0.6601, acc=0.784 (IF=0.818, MQ=0.750)
2025-12-21T22:49:34.071326+0800 | INFO | Step 2521: loss=0.0916, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:49:35.512025+0800 | INFO | Step 2522: loss=0.0963, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:49:36.955705+0800 | INFO | Step 2523: loss=0.1463, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:49:38.429918+0800 | INFO | Step 2524: loss=0.2307, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T22:49:39.958215+0800 | INFO | Step 2525: loss=0.0415, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:49:41.406378+0800 | INFO | Step 2526: loss=0.2626, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:49:42.854893+0800 | INFO | Step 2527: loss=0.0849, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:49:44.338052+0800 | INFO | Step 2528: loss=0.0125, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:49:45.791793+0800 | INFO | Step 2529: loss=0.0200, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:49:46.833559+0800 | INFO | Step 2530: loss=0.0257, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:49:48.289529+0800 | INFO | Step 2531: loss=0.0249, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:49:49.330910+0800 | INFO | Step 2532: loss=0.3324, acc=0.835 (IF=0.857, MQ=0.812)
2025-12-21T22:49:50.363302+0800 | INFO | Step 2533: loss=0.0894, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:49:51.807683+0800 | INFO | Step 2534: loss=0.0720, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:49:53.257527+0800 | INFO | Step 2535: loss=0.0603, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:49:54.703676+0800 | INFO | Step 2536: loss=0.1144, acc=0.933 (IF=0.867, MQ=1.000)
2025-12-21T22:49:56.150030+0800 | INFO | Step 2537: loss=0.2468, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:49:57.588035+0800 | INFO | Step 2538: loss=0.0920, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:49:59.029741+0800 | INFO | Step 2539: loss=0.0299, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:50:00.476192+0800 | INFO | Step 2540: loss=0.0549, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:50:01.924043+0800 | INFO | Step 2541: loss=0.0492, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:50:03.371803+0800 | INFO | Step 2542: loss=0.0826, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:50:04.831015+0800 | INFO | Step 2543: loss=0.0445, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:50:06.273735+0800 | INFO | Step 2544: loss=0.0837, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:50:07.714515+0800 | INFO | Step 2545: loss=0.2557, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:50:09.153566+0800 | INFO | Step 2546: loss=0.0693, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:50:10.621742+0800 | INFO | Step 2547: loss=0.0224, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:50:12.075464+0800 | INFO | Step 2548: loss=0.0343, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:50:13.513604+0800 | INFO | Step 2549: loss=0.0534, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:50:14.936502+0800 | INFO | Step 2550: loss=0.0561, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:50:16.366089+0800 | INFO | Step 2551: loss=0.0742, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:50:17.809114+0800 | INFO | Step 2552: loss=0.0459, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:50:20.564020+0800 | INFO | Step 2553: loss=0.0196, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:50:22.047102+0800 | INFO | Step 2554: loss=0.0501, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:50:23.491491+0800 | INFO | Step 2555: loss=0.2383, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T22:50:24.938911+0800 | INFO | Step 2556: loss=0.1019, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:50:26.377253+0800 | INFO | Step 2557: loss=0.0513, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:50:27.823919+0800 | INFO | Step 2558: loss=0.0316, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:50:29.304960+0800 | INFO | Step 2559: loss=0.1228, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:50:30.735514+0800 | INFO | Step 2560: loss=0.0205, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:50:32.156306+0800 | INFO | Step 2561: loss=0.0845, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:50:33.607114+0800 | INFO | Step 2562: loss=0.2486, acc=0.828 (IF=0.923, MQ=0.733)
2025-12-21T22:50:35.050091+0800 | INFO | Step 2563: loss=0.1042, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:50:36.515548+0800 | INFO | Step 2564: loss=0.0090, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:50:38.036916+0800 | INFO | Step 2565: loss=0.0749, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:50:39.488693+0800 | INFO | Step 2566: loss=0.1094, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:50:40.935787+0800 | INFO | Step 2567: loss=0.0345, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:50:41.977868+0800 | INFO | Step 2568: loss=0.0408, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:50:43.427242+0800 | INFO | Step 2569: loss=0.1557, acc=0.906 (IF=0.875, MQ=0.938)
2025-12-21T22:50:44.877280+0800 | INFO | Step 2570: loss=0.0377, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:50:46.323280+0800 | INFO | Step 2571: loss=0.0851, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:50:47.768109+0800 | INFO | Step 2572: loss=0.0462, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:50:49.232490+0800 | INFO | Step 2573: loss=0.1761, acc=0.929 (IF=0.857, MQ=1.000)
2025-12-21T22:50:50.717832+0800 | INFO | Step 2574: loss=0.0578, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:50:52.157892+0800 | INFO | Step 2575: loss=0.0530, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:50:53.597523+0800 | INFO | Step 2576: loss=0.0079, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:50:55.037282+0800 | INFO | Step 2577: loss=0.1586, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:50:56.494175+0800 | INFO | Step 2578: loss=0.1693, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:50:57.935743+0800 | INFO | Step 2579: loss=0.0089, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:50:59.377149+0800 | INFO | Step 2580: loss=0.4489, acc=0.917 (IF=0.833, MQ=1.000)
2025-12-21T22:51:00.817980+0800 | INFO | Step 2581: loss=0.0635, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:51:02.267169+0800 | INFO | Step 2582: loss=0.0799, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:51:03.714563+0800 | INFO | Step 2583: loss=0.0693, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:51:05.166732+0800 | INFO | Step 2584: loss=0.0455, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:51:06.611504+0800 | INFO | Step 2585: loss=0.0174, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:51:08.055680+0800 | INFO | Step 2586: loss=0.0180, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:51:09.505113+0800 | INFO | Step 2587: loss=0.4544, acc=0.875 (IF=0.875, MQ=0.875)
2025-12-21T22:51:10.949414+0800 | INFO | Step 2588: loss=0.1805, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:51:12.417657+0800 | INFO | Step 2589: loss=0.2020, acc=0.909 (IF=0.818, MQ=1.000)
2025-12-21T22:51:13.859703+0800 | INFO | Step 2590: loss=0.0338, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:51:15.336494+0800 | INFO | Step 2591: loss=0.0448, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:51:16.775356+0800 | INFO | Step 2592: loss=0.2977, acc=0.865 (IF=0.917, MQ=0.812)
2025-12-21T22:51:18.234346+0800 | INFO | Step 2593: loss=0.0376, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:51:19.669429+0800 | INFO | Step 2594: loss=0.1397, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:51:21.097634+0800 | INFO | Step 2595: loss=0.0761, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:51:22.532997+0800 | INFO | Step 2596: loss=0.3080, acc=0.875 (IF=1.000, MQ=0.750)
2025-12-21T22:51:23.974014+0800 | INFO | Step 2597: loss=0.0141, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:51:25.418017+0800 | INFO | Step 2598: loss=0.3793, acc=0.844 (IF=0.750, MQ=0.938)
2025-12-21T22:51:26.881060+0800 | INFO | Step 2599: loss=0.0610, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:51:28.307564+0800 | INFO | Step 2600: loss=0.0504, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:51:35.639637+0800 | INFO |
============================================================
Validation Results (took 7.31s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6207
Quality Acc: 0.6750
Average Acc: 0.6478
Total Loss: 0.6499
Instruction Loss: 0.6633
Quality Loss: 0.6366
============================================================
2025-12-21T22:51:38.272537+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_2000.pt
2025-12-21T22:51:38.272908+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:51:38.273135+0800 | INFO | 1. Step 2600: acc=0.6478 (reward_model.best_2600.pt)
2025-12-21T22:51:38.273198+0800 | INFO | 2. Step 2300: acc=0.6392 (reward_model.best_2300.pt)
2025-12-21T22:51:38.273249+0800 | INFO | 3. Step 1000: acc=0.6369 (reward_model.best_1000.pt)
2025-12-21T22:51:39.748206+0800 | INFO | Step 2601: loss=0.0477, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:51:41.254954+0800 | INFO | Step 2602: loss=0.0816, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:51:42.692936+0800 | INFO | Step 2603: loss=0.0792, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:51:44.160907+0800 | INFO | Step 2604: loss=0.2764, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:51:45.618216+0800 | INFO | Step 2605: loss=0.0372, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:51:47.053107+0800 | INFO | Step 2606: loss=0.0248, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:51:48.489098+0800 | INFO | Step 2607: loss=0.0490, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:51:50.963283+0800 | INFO | Step 2608: loss=0.0945, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:51:52.397862+0800 | INFO | Step 2609: loss=0.0447, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:51:53.860593+0800 | INFO | Step 2610: loss=0.0190, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:51:55.316483+0800 | INFO | Step 2611: loss=0.0117, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:51:56.790070+0800 | INFO | Step 2612: loss=0.0355, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:51:58.222561+0800 | INFO | Step 2613: loss=0.1014, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:51:59.672507+0800 | INFO | Step 2614: loss=0.0875, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:52:01.108322+0800 | INFO | Step 2615: loss=0.0330, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:52:02.544574+0800 | INFO | Step 2616: loss=0.1667, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:52:04.000757+0800 | INFO | Step 2617: loss=0.5061, acc=0.895 (IF=0.857, MQ=0.933)
2025-12-21T22:52:05.439988+0800 | INFO | Step 2618: loss=0.0387, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:52:06.880038+0800 | INFO | Step 2619: loss=0.0280, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:52:08.313606+0800 | INFO | Step 2620: loss=0.0379, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:52:09.754769+0800 | INFO | Step 2621: loss=0.0050, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:52:11.215238+0800 | INFO | Step 2622: loss=0.0498, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:52:12.666606+0800 | INFO | Step 2623: loss=0.0763, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:52:13.724392+0800 | INFO | Step 2624: loss=0.0348, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:52:15.228174+0800 | INFO | Step 2625: loss=0.0121, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:52:16.675682+0800 | INFO | Step 2626: loss=0.3447, acc=0.875 (IF=0.875, MQ=0.875)
2025-12-21T22:52:18.097513+0800 | INFO | Step 2627: loss=0.1679, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:52:19.550891+0800 | INFO | Step 2628: loss=0.0753, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:52:20.993527+0800 | INFO | Step 2629: loss=0.0323, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:52:22.434750+0800 | INFO | Step 2630: loss=0.0358, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:52:23.911630+0800 | INFO | Step 2631: loss=0.3328, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T22:52:25.345297+0800 | INFO | Step 2632: loss=0.0157, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:52:26.778772+0800 | INFO | Step 2633: loss=0.2633, acc=0.875 (IF=0.750, MQ=1.000)
2025-12-21T22:52:28.218136+0800 | INFO | Step 2634: loss=0.0717, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:52:29.661355+0800 | INFO | Step 2635: loss=0.0521, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:52:31.101014+0800 | INFO | Step 2636: loss=0.0460, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:52:32.528947+0800 | INFO | Step 2637: loss=0.5083, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T22:52:33.969394+0800 | INFO | Step 2638: loss=0.0370, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:52:35.028722+0800 | INFO | Step 2639: loss=0.0117, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:52:36.472893+0800 | INFO | Step 2640: loss=0.2062, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:52:37.907535+0800 | INFO | Step 2641: loss=0.1631, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:52:39.396641+0800 | INFO | Step 2642: loss=0.0178, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:52:40.836492+0800 | INFO | Step 2643: loss=0.0841, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:52:42.274148+0800 | INFO | Step 2644: loss=0.0956, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:52:43.710892+0800 | INFO | Step 2645: loss=0.0350, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:52:45.152346+0800 | INFO | Step 2646: loss=0.0996, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:52:46.592721+0800 | INFO | Step 2647: loss=0.2805, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:52:48.037586+0800 | INFO | Step 2648: loss=0.0972, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:52:49.457967+0800 | INFO | Step 2649: loss=0.0103, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:52:50.891025+0800 | INFO | Step 2650: loss=0.0717, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:52:52.331255+0800 | INFO | Step 2651: loss=0.0748, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:52:53.774574+0800 | INFO | Step 2652: loss=0.1875, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:52:55.220139+0800 | INFO | Step 2653: loss=0.1796, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:52:56.671368+0800 | INFO | Step 2654: loss=0.0909, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:52:58.132382+0800 | INFO | Step 2655: loss=0.3134, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T22:52:59.568498+0800 | INFO | Step 2656: loss=0.0519, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:53:01.013605+0800 | INFO | Step 2657: loss=0.0190, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:53:02.459450+0800 | INFO | Step 2658: loss=0.1274, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:53:03.929380+0800 | INFO | Step 2659: loss=0.1921, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:53:05.379838+0800 | INFO | Step 2660: loss=0.2548, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:53:06.815801+0800 | INFO | Step 2661: loss=0.1518, acc=0.929 (IF=0.857, MQ=1.000)
2025-12-21T22:53:08.274913+0800 | INFO | Step 2662: loss=0.0397, acc=0.969 (IF=0.938, MQ=1.000)
2025-12-21T22:53:09.692790+0800 | INFO | Step 2663: loss=0.1621, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:53:12.461910+0800 | INFO | Step 2664: loss=0.0874, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:53:13.875524+0800 | INFO | Step 2665: loss=0.0593, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:53:15.302112+0800 | INFO | Step 2666: loss=0.0528, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:53:16.723493+0800 | INFO | Step 2667: loss=0.0257, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:53:18.165533+0800 | INFO | Step 2668: loss=0.0243, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:53:19.669354+0800 | INFO | Step 2669: loss=0.0437, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:53:21.112841+0800 | INFO | Step 2670: loss=0.0288, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:53:22.550242+0800 | INFO | Step 2671: loss=0.1056, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:53:23.977212+0800 | INFO | Step 2672: loss=0.0505, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:53:25.002473+0800 | INFO | Step 2673: loss=0.1278, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:53:26.468976+0800 | INFO | Step 2674: loss=0.1583, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:53:27.920983+0800 | INFO | Step 2675: loss=0.0633, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:53:29.336510+0800 | INFO | Step 2676: loss=0.1782, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:53:30.768579+0800 | INFO | Step 2677: loss=0.2503, acc=0.921 (IF=0.909, MQ=0.933)
2025-12-21T22:53:32.205971+0800 | INFO | Step 2678: loss=0.0103, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:53:33.652661+0800 | INFO | Step 2679: loss=0.1447, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:53:35.123771+0800 | INFO | Step 2680: loss=0.1625, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:53:36.606810+0800 | INFO | Step 2681: loss=0.2010, acc=0.861 (IF=0.846, MQ=0.875)
2025-12-21T22:53:38.031338+0800 | INFO | Step 2682: loss=0.2539, acc=0.830 (IF=0.909, MQ=0.750)
2025-12-21T22:53:39.475770+0800 | INFO | Step 2683: loss=0.0376, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:53:40.910251+0800 | INFO | Step 2684: loss=0.0987, acc=0.917 (IF=0.833, MQ=1.000)
2025-12-21T22:53:42.343350+0800 | INFO | Step 2685: loss=0.0285, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:53:43.844819+0800 | INFO | Step 2686: loss=0.0472, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:53:45.292328+0800 | INFO | Step 2687: loss=0.1996, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:53:46.728114+0800 | INFO | Step 2688: loss=0.0120, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:53:48.173363+0800 | INFO | Step 2689: loss=0.0457, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:53:49.605204+0800 | INFO | Step 2690: loss=0.0669, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:53:51.059538+0800 | INFO | Step 2691: loss=0.1161, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:53:52.498231+0800 | INFO | Step 2692: loss=0.4121, acc=0.911 (IF=0.889, MQ=0.933)
2025-12-21T22:53:53.947273+0800 | INFO | Step 2693: loss=0.0752, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:53:55.396551+0800 | INFO | Step 2694: loss=0.0412, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:53:56.830226+0800 | INFO | Step 2695: loss=0.0558, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:53:58.273232+0800 | INFO | Step 2696: loss=0.1128, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:53:59.338680+0800 | INFO | Step 2697: loss=0.1096, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:54:00.769249+0800 | INFO | Step 2698: loss=0.0772, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:54:02.207491+0800 | INFO | Step 2699: loss=0.0048, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:54:03.654519+0800 | INFO | Step 2700: loss=0.1150, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:54:10.842478+0800 | INFO |
============================================================
Validation Results (took 7.16s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6207
Quality Acc: 0.6375
Average Acc: 0.6291
Total Loss: 0.6495
Instruction Loss: 0.6592
Quality Loss: 0.6398
============================================================
2025-12-21T22:54:13.402376+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_2700.pt
2025-12-21T22:54:13.402855+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:54:13.402956+0800 | INFO | 1. Step 2600: acc=0.6478 (reward_model.best_2600.pt)
2025-12-21T22:54:13.403013+0800 | INFO | 2. Step 2300: acc=0.6392 (reward_model.best_2300.pt)
2025-12-21T22:54:13.403062+0800 | INFO | 3. Step 1000: acc=0.6369 (reward_model.best_1000.pt)
2025-12-21T22:54:14.881266+0800 | INFO | Step 2701: loss=0.1268, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:54:16.339657+0800 | INFO | Step 2702: loss=0.1475, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:54:17.801331+0800 | INFO | Step 2703: loss=0.0871, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:54:19.238810+0800 | INFO | Step 2704: loss=0.0401, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:54:20.668142+0800 | INFO | Step 2705: loss=0.0889, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:54:22.099996+0800 | INFO | Step 2706: loss=0.0836, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T22:54:23.545059+0800 | INFO | Step 2707: loss=0.0729, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:54:24.987560+0800 | INFO | Step 2708: loss=0.0740, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:54:26.055551+0800 | INFO | Step 2709: loss=0.2439, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T22:54:27.504990+0800 | INFO | Step 2710: loss=0.1015, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:54:28.973451+0800 | INFO | Step 2711: loss=0.1224, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:54:30.424834+0800 | INFO | Step 2712: loss=0.2074, acc=0.906 (IF=0.875, MQ=0.938)
2025-12-21T22:54:31.905726+0800 | INFO | Step 2713: loss=0.3451, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T22:54:33.368306+0800 | INFO | Step 2714: loss=0.0075, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:54:34.801041+0800 | INFO | Step 2715: loss=0.1355, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:54:36.231703+0800 | INFO | Step 2716: loss=0.2906, acc=0.909 (IF=0.818, MQ=1.000)
2025-12-21T22:54:37.666774+0800 | INFO | Step 2717: loss=0.0448, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:54:39.110050+0800 | INFO | Step 2718: loss=0.1950, acc=0.935 (IF=0.933, MQ=0.938)
2025-12-21T22:54:41.937336+0800 | INFO | Step 2719: loss=0.0313, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:54:43.393151+0800 | INFO | Step 2720: loss=0.1240, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:54:44.880828+0800 | INFO | Step 2721: loss=0.0208, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:54:46.312261+0800 | INFO | Step 2722: loss=0.0207, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:54:47.744678+0800 | INFO | Step 2723: loss=0.2178, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T22:54:49.162381+0800 | INFO | Step 2724: loss=0.0056, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:54:50.596261+0800 | INFO | Step 2725: loss=0.0830, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:54:52.013913+0800 | INFO | Step 2726: loss=0.1963, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:54:53.449538+0800 | INFO | Step 2727: loss=0.1505, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:54:54.907200+0800 | INFO | Step 2728: loss=0.0118, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:54:56.351955+0800 | INFO | Step 2729: loss=0.0955, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:54:57.787257+0800 | INFO | Step 2730: loss=0.0965, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:54:59.237890+0800 | INFO | Step 2731: loss=0.0379, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:55:00.692874+0800 | INFO | Step 2732: loss=0.1482, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:55:02.108611+0800 | INFO | Step 2733: loss=0.0762, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:55:03.552397+0800 | INFO | Step 2734: loss=0.3048, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:55:04.997261+0800 | INFO | Step 2735: loss=0.0153, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:55:06.434879+0800 | INFO | Step 2736: loss=0.1671, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:55:07.879857+0800 | INFO | Step 2737: loss=0.0043, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:55:08.910022+0800 | INFO | Step 2738: loss=0.2382, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:55:10.342752+0800 | INFO | Step 2739: loss=0.0150, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:55:11.785840+0800 | INFO | Step 2740: loss=0.0752, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:55:13.252662+0800 | INFO | Step 2741: loss=0.0921, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:55:14.692099+0800 | INFO | Step 2742: loss=0.2464, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:55:16.169354+0800 | INFO | Step 2743: loss=0.0221, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:55:17.644633+0800 | INFO | Step 2744: loss=0.0857, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:55:19.083115+0800 | INFO | Step 2745: loss=0.0922, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:55:20.523020+0800 | INFO | Step 2746: loss=0.1330, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T22:55:21.961558+0800 | INFO | Step 2747: loss=0.0522, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:55:23.396975+0800 | INFO | Step 2748: loss=0.0451, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:55:24.859884+0800 | INFO | Step 2749: loss=0.1138, acc=0.900 (IF=0.800, MQ=1.000)
2025-12-21T22:55:26.293759+0800 | INFO | Step 2750: loss=0.0546, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:55:27.730636+0800 | INFO | Step 2751: loss=0.0241, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:55:29.174620+0800 | INFO | Step 2752: loss=0.0478, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:55:30.616954+0800 | INFO | Step 2753: loss=0.0424, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:55:32.059954+0800 | INFO | Step 2754: loss=0.0519, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:55:33.502127+0800 | INFO | Step 2755: loss=0.0708, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:55:34.931245+0800 | INFO | Step 2756: loss=0.1009, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:55:36.406558+0800 | INFO | Step 2757: loss=0.1354, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:55:37.846460+0800 | INFO | Step 2758: loss=0.2012, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T22:55:39.294916+0800 | INFO | Step 2759: loss=0.0623, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:55:40.789625+0800 | INFO | Step 2760: loss=0.1755, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:55:42.220790+0800 | INFO | Step 2761: loss=0.0321, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:55:43.664357+0800 | INFO | Step 2762: loss=0.0347, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:55:45.107393+0800 | INFO | Step 2763: loss=0.0183, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:55:46.557557+0800 | INFO | Step 2764: loss=0.0981, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:55:48.001378+0800 | INFO | Step 2765: loss=0.0639, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:55:49.458718+0800 | INFO | Step 2766: loss=0.1803, acc=0.875 (IF=0.750, MQ=1.000)
2025-12-21T22:55:50.887885+0800 | INFO | Step 2767: loss=0.0851, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:55:52.327313+0800 | INFO | Step 2768: loss=0.0663, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:55:53.765627+0800 | INFO | Step 2769: loss=0.0339, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:55:55.209660+0800 | INFO | Step 2770: loss=0.0985, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:55:56.656705+0800 | INFO | Step 2771: loss=0.0883, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:55:58.099928+0800 | INFO | Step 2772: loss=0.1816, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:55:59.544324+0800 | INFO | Step 2773: loss=0.1807, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:56:00.975719+0800 | INFO | Step 2774: loss=0.2877, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:56:03.666566+0800 | INFO | Step 2775: loss=0.1060, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:56:05.268495+0800 | INFO | Step 2776: loss=0.0079, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:56:06.823653+0800 | INFO | Step 2777: loss=0.1524, acc=0.897 (IF=0.857, MQ=0.938)
2025-12-21T22:56:08.537628+0800 | INFO | Step 2778: loss=0.0965, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T22:56:10.022251+0800 | INFO | Step 2779: loss=0.0360, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:56:11.481695+0800 | INFO | Step 2780: loss=0.0271, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:56:12.972983+0800 | INFO | Step 2781: loss=0.0546, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:56:14.425022+0800 | INFO | Step 2782: loss=0.0080, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:56:15.852039+0800 | INFO | Step 2783: loss=0.1603, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:56:17.282018+0800 | INFO | Step 2784: loss=0.0290, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:56:18.722421+0800 | INFO | Step 2785: loss=0.0686, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:56:20.165727+0800 | INFO | Step 2786: loss=0.0636, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:56:21.627550+0800 | INFO | Step 2787: loss=0.0709, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:56:23.076469+0800 | INFO | Step 2788: loss=0.0875, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:56:24.516655+0800 | INFO | Step 2789: loss=0.0753, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:56:25.951753+0800 | INFO | Step 2790: loss=0.0617, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:56:27.381871+0800 | INFO | Step 2791: loss=0.0101, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:56:28.819345+0800 | INFO | Step 2792: loss=0.1164, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:56:30.258703+0800 | INFO | Step 2793: loss=0.0413, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:56:31.703796+0800 | INFO | Step 2794: loss=0.0151, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:56:33.136508+0800 | INFO | Step 2795: loss=0.1934, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:56:34.612051+0800 | INFO | Step 2796: loss=0.0180, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:56:36.053364+0800 | INFO | Step 2797: loss=0.2623, acc=0.865 (IF=0.917, MQ=0.812)
2025-12-21T22:56:37.492376+0800 | INFO | Step 2798: loss=0.1547, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:56:38.938103+0800 | INFO | Step 2799: loss=0.3008, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:56:40.453984+0800 | INFO | Step 2800: loss=0.0685, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:56:47.749487+0800 | INFO |
============================================================
Validation Results (took 7.27s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6207
Quality Acc: 0.6875
Average Acc: 0.6541
Total Loss: 0.6445
Instruction Loss: 0.6599
Quality Loss: 0.6292
============================================================
2025-12-21T22:56:50.445111+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_1000.pt
2025-12-21T22:56:50.445606+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:56:50.445700+0800 | INFO | 1. Step 2800: acc=0.6541 (reward_model.best_2800.pt)
2025-12-21T22:56:50.445762+0800 | INFO | 2. Step 2600: acc=0.6478 (reward_model.best_2600.pt)
2025-12-21T22:56:50.445812+0800 | INFO | 3. Step 2300: acc=0.6392 (reward_model.best_2300.pt)
2025-12-21T22:56:51.911601+0800 | INFO | Step 2801: loss=0.1420, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:56:53.401884+0800 | INFO | Step 2802: loss=0.0669, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:56:54.817734+0800 | INFO | Step 2803: loss=0.0586, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:56:56.255228+0800 | INFO | Step 2804: loss=0.0217, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:56:57.687190+0800 | INFO | Step 2805: loss=0.0140, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:56:59.123762+0800 | INFO | Step 2806: loss=0.0423, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:57:00.565317+0800 | INFO | Step 2807: loss=0.1059, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:57:02.015513+0800 | INFO | Step 2808: loss=0.0407, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:57:03.446888+0800 | INFO | Step 2809: loss=0.1234, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:57:04.896878+0800 | INFO | Step 2810: loss=0.0148, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:57:06.370800+0800 | INFO | Step 2811: loss=0.0230, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:57:07.805906+0800 | INFO | Step 2812: loss=0.1350, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:57:09.233420+0800 | INFO | Step 2813: loss=0.0903, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:57:10.665377+0800 | INFO | Step 2814: loss=0.1093, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:57:12.108849+0800 | INFO | Step 2815: loss=0.0126, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:57:13.537534+0800 | INFO | Step 2816: loss=0.0248, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:57:14.970903+0800 | INFO | Step 2817: loss=0.1110, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:57:16.414868+0800 | INFO | Step 2818: loss=0.2176, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:57:17.860922+0800 | INFO | Step 2819: loss=0.2186, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T22:57:19.327369+0800 | INFO | Step 2820: loss=0.2149, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:57:20.843372+0800 | INFO | Step 2821: loss=0.0324, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:57:22.324560+0800 | INFO | Step 2822: loss=0.1864, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:57:23.820693+0800 | INFO | Step 2823: loss=0.0252, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:57:25.251582+0800 | INFO | Step 2824: loss=0.0541, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:57:26.681232+0800 | INFO | Step 2825: loss=0.0046, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:57:28.138602+0800 | INFO | Step 2826: loss=0.0814, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:57:29.593963+0800 | INFO | Step 2827: loss=0.1531, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:57:31.039504+0800 | INFO | Step 2828: loss=0.4572, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:57:32.143166+0800 | INFO | Step 2829: loss=0.1377, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:57:34.795110+0800 | INFO | Step 2830: loss=0.0166, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:57:36.246859+0800 | INFO | Step 2831: loss=0.0406, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:57:37.724641+0800 | INFO | Step 2832: loss=0.0877, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:57:39.193399+0800 | INFO | Step 2833: loss=0.0152, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:57:40.673739+0800 | INFO | Step 2834: loss=0.0336, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:57:42.153410+0800 | INFO | Step 2835: loss=0.0855, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:57:43.609053+0800 | INFO | Step 2836: loss=0.0783, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:57:45.069028+0800 | INFO | Step 2837: loss=0.1009, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T22:57:46.524013+0800 | INFO | Step 2838: loss=0.1592, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T22:57:47.971576+0800 | INFO | Step 2839: loss=0.0213, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:57:49.417073+0800 | INFO | Step 2840: loss=0.1764, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T22:57:50.870942+0800 | INFO | Step 2841: loss=0.0195, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:57:52.311381+0800 | INFO | Step 2842: loss=0.0746, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:57:53.746104+0800 | INFO | Step 2843: loss=0.2012, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:57:55.178525+0800 | INFO | Step 2844: loss=0.0459, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:57:56.619819+0800 | INFO | Step 2845: loss=0.0233, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:57:58.044217+0800 | INFO | Step 2846: loss=0.0400, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:57:59.489781+0800 | INFO | Step 2847: loss=0.2198, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T22:58:00.953602+0800 | INFO | Step 2848: loss=0.0289, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:58:02.395704+0800 | INFO | Step 2849: loss=0.2647, acc=0.906 (IF=0.875, MQ=0.938)
2025-12-21T22:58:03.843315+0800 | INFO | Step 2850: loss=0.0751, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:58:05.287777+0800 | INFO | Step 2851: loss=0.1190, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T22:58:06.733898+0800 | INFO | Step 2852: loss=0.2043, acc=0.900 (IF=1.000, MQ=0.800)
2025-12-21T22:58:08.179886+0800 | INFO | Step 2853: loss=0.0781, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:58:09.628358+0800 | INFO | Step 2854: loss=0.0819, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:58:11.080308+0800 | INFO | Step 2855: loss=0.1640, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:58:12.528399+0800 | INFO | Step 2856: loss=0.1475, acc=0.917 (IF=0.833, MQ=1.000)
2025-12-21T22:58:13.976797+0800 | INFO | Step 2857: loss=0.1895, acc=0.900 (IF=0.800, MQ=1.000)
2025-12-21T22:58:15.427684+0800 | INFO | Step 2858: loss=0.3423, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:58:16.876247+0800 | INFO | Step 2859: loss=0.0156, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:58:18.348917+0800 | INFO | Step 2860: loss=0.0565, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:58:19.811290+0800 | INFO | Step 2861: loss=0.0831, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:58:21.291652+0800 | INFO | Step 2862: loss=0.1068, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:58:22.749785+0800 | INFO | Step 2863: loss=0.1576, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:58:24.186493+0800 | INFO | Step 2864: loss=0.0945, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:58:25.613590+0800 | INFO | Step 2865: loss=0.1088, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:58:27.058099+0800 | INFO | Step 2866: loss=0.0968, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:58:28.112988+0800 | INFO | Step 2867: loss=0.0392, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:58:29.571435+0800 | INFO | Step 2868: loss=0.1553, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:58:31.024144+0800 | INFO | Step 2869: loss=0.2955, acc=0.933 (IF=1.000, MQ=0.867)
2025-12-21T22:58:32.469817+0800 | INFO | Step 2870: loss=0.1217, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:58:33.918210+0800 | INFO | Step 2871: loss=0.0711, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:58:35.370753+0800 | INFO | Step 2872: loss=0.0320, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:58:36.821485+0800 | INFO | Step 2873: loss=0.0208, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:58:38.268603+0800 | INFO | Step 2874: loss=0.2317, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:58:39.710959+0800 | INFO | Step 2875: loss=0.0499, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:58:41.152870+0800 | INFO | Step 2876: loss=0.2950, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T22:58:42.653009+0800 | INFO | Step 2877: loss=0.1676, acc=0.868 (IF=0.923, MQ=0.812)
2025-12-21T22:58:44.093484+0800 | INFO | Step 2878: loss=0.0255, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:58:45.576916+0800 | INFO | Step 2879: loss=0.0414, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:58:47.044528+0800 | INFO | Step 2880: loss=0.2647, acc=0.866 (IF=0.857, MQ=0.875)
2025-12-21T22:58:48.485705+0800 | INFO | Step 2881: loss=0.0673, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T22:58:49.926703+0800 | INFO | Step 2882: loss=0.1075, acc=0.909 (IF=0.818, MQ=1.000)
2025-12-21T22:58:51.373604+0800 | INFO | Step 2883: loss=0.1274, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T22:58:52.821979+0800 | INFO | Step 2884: loss=0.0239, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:58:54.243809+0800 | INFO | Step 2885: loss=0.0321, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:58:57.154618+0800 | INFO | Step 2886: loss=0.1943, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T22:58:58.597991+0800 | INFO | Step 2887: loss=0.0397, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T22:59:00.060989+0800 | INFO | Step 2888: loss=0.0090, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:59:01.564956+0800 | INFO | Step 2889: loss=0.0513, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:59:03.010620+0800 | INFO | Step 2890: loss=0.1583, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:59:04.451602+0800 | INFO | Step 2891: loss=0.2288, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T22:59:05.940862+0800 | INFO | Step 2892: loss=0.0147, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:59:07.388363+0800 | INFO | Step 2893: loss=0.1163, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T22:59:08.830474+0800 | INFO | Step 2894: loss=0.0205, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:59:10.264261+0800 | INFO | Step 2895: loss=0.0251, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:59:11.692424+0800 | INFO | Step 2896: loss=0.0090, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:59:13.125101+0800 | INFO | Step 2897: loss=0.0255, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:59:14.609217+0800 | INFO | Step 2898: loss=0.2637, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T22:59:16.047491+0800 | INFO | Step 2899: loss=0.1137, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:59:17.474832+0800 | INFO | Step 2900: loss=0.0265, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:59:25.101137+0800 | INFO |
============================================================
Validation Results (took 7.60s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6207
Quality Acc: 0.6500
Average Acc: 0.6353
Total Loss: 0.6456
Instruction Loss: 0.6607
Quality Loss: 0.6304
============================================================
2025-12-21T22:59:27.670150+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_2900.pt
2025-12-21T22:59:27.670726+0800 | INFO | Best 3 checkpoints:
2025-12-21T22:59:27.670826+0800 | INFO | 1. Step 2800: acc=0.6541 (reward_model.best_2800.pt)
2025-12-21T22:59:27.670884+0800 | INFO | 2. Step 2600: acc=0.6478 (reward_model.best_2600.pt)
2025-12-21T22:59:27.670932+0800 | INFO | 3. Step 2300: acc=0.6392 (reward_model.best_2300.pt)
2025-12-21T22:59:29.148397+0800 | INFO | Step 2901: loss=0.1098, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:59:30.590306+0800 | INFO | Step 2902: loss=0.0344, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:59:32.029452+0800 | INFO | Step 2903: loss=0.3128, acc=0.795 (IF=0.778, MQ=0.812)
2025-12-21T22:59:33.487895+0800 | INFO | Step 2904: loss=0.0491, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T22:59:34.939472+0800 | INFO | Step 2905: loss=0.0595, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:59:36.381601+0800 | INFO | Step 2906: loss=0.0456, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:59:37.822921+0800 | INFO | Step 2907: loss=0.1462, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T22:59:38.871194+0800 | INFO | Step 2908: loss=0.0131, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:59:40.308936+0800 | INFO | Step 2909: loss=0.0583, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:59:41.746376+0800 | INFO | Step 2910: loss=0.1022, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:59:43.190004+0800 | INFO | Step 2911: loss=0.0237, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:59:44.636428+0800 | INFO | Step 2912: loss=0.0974, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:59:46.080276+0800 | INFO | Step 2913: loss=0.2163, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T22:59:47.506882+0800 | INFO | Step 2914: loss=0.1755, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T22:59:48.935351+0800 | INFO | Step 2915: loss=0.0588, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:59:50.435770+0800 | INFO | Step 2916: loss=0.2516, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T22:59:51.888236+0800 | INFO | Step 2917: loss=0.3062, acc=0.902 (IF=0.929, MQ=0.875)
2025-12-21T22:59:53.321503+0800 | INFO | Step 2918: loss=0.1947, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T22:59:54.754180+0800 | INFO | Step 2919: loss=0.1257, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T22:59:56.189907+0800 | INFO | Step 2920: loss=0.2798, acc=0.885 (IF=0.769, MQ=1.000)
2025-12-21T22:59:57.620931+0800 | INFO | Step 2921: loss=0.0244, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T22:59:59.060420+0800 | INFO | Step 2922: loss=0.0312, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:00:00.497558+0800 | INFO | Step 2923: loss=0.1191, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:00:01.919833+0800 | INFO | Step 2924: loss=0.0745, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T23:00:03.355070+0800 | INFO | Step 2925: loss=0.1109, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T23:00:04.792728+0800 | INFO | Step 2926: loss=0.0630, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:00:06.221646+0800 | INFO | Step 2927: loss=0.0349, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:00:07.643313+0800 | INFO | Step 2928: loss=0.0391, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:00:09.092048+0800 | INFO | Step 2929: loss=0.0226, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:00:10.527779+0800 | INFO | Step 2930: loss=0.0271, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:00:11.988730+0800 | INFO | Step 2931: loss=0.0472, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:00:13.444807+0800 | INFO | Step 2932: loss=0.0979, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:00:14.886389+0800 | INFO | Step 2933: loss=0.1694, acc=0.911 (IF=0.889, MQ=0.933)
2025-12-21T23:00:16.315894+0800 | INFO | Step 2934: loss=0.0374, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:00:17.742671+0800 | INFO | Step 2935: loss=0.0772, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:00:19.170419+0800 | INFO | Step 2936: loss=0.0233, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:00:20.605129+0800 | INFO | Step 2937: loss=0.0618, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:00:21.627149+0800 | INFO | Step 2938: loss=0.1446, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T23:00:23.069296+0800 | INFO | Step 2939: loss=0.2098, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:00:24.533262+0800 | INFO | Step 2940: loss=0.0953, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:00:27.913188+0800 | INFO | Step 2941: loss=0.0555, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:00:29.373573+0800 | INFO | Step 2942: loss=0.0997, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:00:30.851258+0800 | INFO | Step 2943: loss=0.1059, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:00:32.358940+0800 | INFO | Step 2944: loss=0.0646, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:00:33.853675+0800 | INFO | Step 2945: loss=0.0073, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:00:35.292878+0800 | INFO | Step 2946: loss=0.0466, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:00:36.753137+0800 | INFO | Step 2947: loss=0.0425, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:00:38.198727+0800 | INFO | Step 2948: loss=0.1136, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:00:39.661931+0800 | INFO | Step 2949: loss=0.0068, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:00:41.119164+0800 | INFO | Step 2950: loss=0.0285, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:00:42.561395+0800 | INFO | Step 2951: loss=0.0433, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:00:44.039233+0800 | INFO | Step 2952: loss=0.1231, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:00:45.516686+0800 | INFO | Step 2953: loss=0.0110, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:00:46.954485+0800 | INFO | Step 2954: loss=0.1108, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:00:48.391800+0800 | INFO | Step 2955: loss=0.0274, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:00:49.835809+0800 | INFO | Step 2956: loss=0.0246, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:00:51.333832+0800 | INFO | Step 2957: loss=0.0484, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:00:52.799457+0800 | INFO | Step 2958: loss=0.2026, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:00:54.228513+0800 | INFO | Step 2959: loss=0.0451, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:00:55.685124+0800 | INFO | Step 2960: loss=0.0647, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:00:57.128548+0800 | INFO | Step 2961: loss=0.0075, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:00:58.568025+0800 | INFO | Step 2962: loss=0.1433, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T23:01:00.011936+0800 | INFO | Step 2963: loss=0.0049, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:01:01.460878+0800 | INFO | Step 2964: loss=0.0856, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:01:02.927733+0800 | INFO | Step 2965: loss=0.0370, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:01:04.373132+0800 | INFO | Step 2966: loss=0.1600, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:01:05.813004+0800 | INFO | Step 2967: loss=0.1152, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:01:07.262342+0800 | INFO | Step 2968: loss=0.0558, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:01:08.758899+0800 | INFO | Step 2969: loss=0.3385, acc=0.838 (IF=0.800, MQ=0.875)
2025-12-21T23:01:10.198293+0800 | INFO | Step 2970: loss=0.0305, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:01:11.649934+0800 | INFO | Step 2971: loss=0.0375, acc=0.967 (IF=0.933, MQ=1.000)
2025-12-21T23:01:12.679101+0800 | INFO | Step 2972: loss=0.1982, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:01:14.115676+0800 | INFO | Step 2973: loss=0.0031, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:01:15.581009+0800 | INFO | Step 2974: loss=0.1381, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:01:17.018871+0800 | INFO | Step 2975: loss=0.0284, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:01:18.464739+0800 | INFO | Step 2976: loss=0.1123, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:01:19.915896+0800 | INFO | Step 2977: loss=0.8138, acc=0.725 (IF=0.700, MQ=0.750)
2025-12-21T23:01:21.366858+0800 | INFO | Step 2978: loss=0.1144, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:01:22.811769+0800 | INFO | Step 2979: loss=0.1214, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:01:24.249472+0800 | INFO | Step 2980: loss=0.1141, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T23:01:25.697548+0800 | INFO | Step 2981: loss=0.1687, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:01:27.181477+0800 | INFO | Step 2982: loss=0.2487, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:01:28.639954+0800 | INFO | Step 2983: loss=0.0148, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:01:30.101533+0800 | INFO | Step 2984: loss=0.0854, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:01:31.540205+0800 | INFO | Step 2985: loss=0.0281, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:01:32.964634+0800 | INFO | Step 2986: loss=0.2713, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:01:34.403237+0800 | INFO | Step 2987: loss=0.0677, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:01:35.845288+0800 | INFO | Step 2988: loss=0.1067, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T23:01:37.280700+0800 | INFO | Step 2989: loss=0.0991, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:01:38.723066+0800 | INFO | Step 2990: loss=0.1318, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:01:40.165258+0800 | INFO | Step 2991: loss=0.2120, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T23:01:41.606987+0800 | INFO | Step 2992: loss=0.0827, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:01:43.029509+0800 | INFO | Step 2993: loss=0.0257, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:01:44.468051+0800 | INFO | Step 2994: loss=0.1059, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:01:45.885362+0800 | INFO | Step 2995: loss=0.0635, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T23:01:47.301326+0800 | INFO | Step 2996: loss=0.0682, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:01:50.428346+0800 | INFO | Step 2997: loss=0.0243, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:01:51.869191+0800 | INFO | Step 2998: loss=0.0363, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:01:53.292881+0800 | INFO | Step 2999: loss=0.2889, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T23:01:54.727993+0800 | INFO | Step 3000: loss=0.3311, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T23:02:02.016002+0800 | INFO |
============================================================
Validation Results (took 7.26s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6034
Quality Acc: 0.6625
Average Acc: 0.6330
Total Loss: 0.6428
Instruction Loss: 0.6574
Quality Loss: 0.6283
============================================================
2025-12-21T23:02:04.726974+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_3000.pt
2025-12-21T23:02:04.727474+0800 | INFO | Best 3 checkpoints:
2025-12-21T23:02:04.727573+0800 | INFO | 1. Step 2800: acc=0.6541 (reward_model.best_2800.pt)
2025-12-21T23:02:04.727639+0800 | INFO | 2. Step 2600: acc=0.6478 (reward_model.best_2600.pt)
2025-12-21T23:02:04.727690+0800 | INFO | 3. Step 2300: acc=0.6392 (reward_model.best_2300.pt)
2025-12-21T23:02:07.027122+0800 | INFO | Step 3000: Saved to /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.3000.pt
2025-12-21T23:02:08.494206+0800 | INFO | Step 3001: loss=0.0522, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:02:09.930488+0800 | INFO | Step 3002: loss=0.0757, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:02:11.360721+0800 | INFO | Step 3003: loss=0.1043, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:02:12.864693+0800 | INFO | Step 3004: loss=0.1335, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:02:14.315934+0800 | INFO | Step 3005: loss=0.0257, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:02:15.719662+0800 | INFO | Step 3006: loss=0.1254, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T23:02:17.147544+0800 | INFO | Step 3007: loss=0.1172, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:02:18.653638+0800 | INFO | Step 3008: loss=0.2475, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T23:02:20.166933+0800 | INFO | Step 3009: loss=0.0635, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:02:21.764963+0800 | INFO | Step 3010: loss=0.0067, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:02:23.390048+0800 | INFO | Step 3011: loss=0.0126, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:02:24.888125+0800 | INFO | Step 3012: loss=0.0337, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:02:26.349930+0800 | INFO | Step 3013: loss=0.0084, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:02:27.817420+0800 | INFO | Step 3014: loss=0.3423, acc=0.832 (IF=0.727, MQ=0.938)
2025-12-21T23:02:29.258582+0800 | INFO | Step 3015: loss=0.0465, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:02:30.699963+0800 | INFO | Step 3016: loss=0.0018, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:02:32.140768+0800 | INFO | Step 3017: loss=0.1571, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:02:33.585081+0800 | INFO | Step 3018: loss=0.1325, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T23:02:35.025638+0800 | INFO | Step 3019: loss=0.0236, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:02:36.470691+0800 | INFO | Step 3020: loss=0.1289, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T23:02:37.918583+0800 | INFO | Step 3021: loss=0.0123, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:02:39.364552+0800 | INFO | Step 3022: loss=0.1829, acc=0.929 (IF=0.857, MQ=1.000)
2025-12-21T23:02:40.812477+0800 | INFO | Step 3023: loss=0.2398, acc=0.906 (IF=0.875, MQ=0.938)
2025-12-21T23:02:42.273381+0800 | INFO | Step 3024: loss=0.0388, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:02:43.725587+0800 | INFO | Step 3025: loss=0.0381, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:02:45.173885+0800 | INFO | Step 3026: loss=0.0091, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:02:46.630991+0800 | INFO | Step 3027: loss=0.0435, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:02:48.079533+0800 | INFO | Step 3028: loss=0.2119, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T23:02:49.542994+0800 | INFO | Step 3029: loss=0.0267, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:02:51.023467+0800 | INFO | Step 3030: loss=0.0068, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:02:52.463527+0800 | INFO | Step 3031: loss=0.1678, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:02:53.913929+0800 | INFO | Step 3032: loss=0.0808, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T23:02:55.347988+0800 | INFO | Step 3033: loss=0.0118, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:02:56.779764+0800 | INFO | Step 3034: loss=0.0110, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:02:58.269117+0800 | INFO | Step 3035: loss=0.0392, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:02:59.702770+0800 | INFO | Step 3036: loss=0.0532, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:03:01.147426+0800 | INFO | Step 3037: loss=0.1022, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:03:02.589431+0800 | INFO | Step 3038: loss=0.0214, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:03:03.631257+0800 | INFO | Step 3039: loss=0.0113, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:03:05.085274+0800 | INFO | Step 3040: loss=0.1706, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:03:06.532327+0800 | INFO | Step 3041: loss=0.1151, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:03:07.968186+0800 | INFO | Step 3042: loss=0.0164, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:03:09.403518+0800 | INFO | Step 3043: loss=0.0155, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:03:10.824885+0800 | INFO | Step 3044: loss=0.0425, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:03:12.258836+0800 | INFO | Step 3045: loss=0.0219, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:03:13.688401+0800 | INFO | Step 3046: loss=0.0156, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:03:15.117228+0800 | INFO | Step 3047: loss=0.1125, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:03:16.548241+0800 | INFO | Step 3048: loss=0.2710, acc=0.900 (IF=1.000, MQ=0.800)
2025-12-21T23:03:17.978745+0800 | INFO | Step 3049: loss=0.1870, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:03:19.419874+0800 | INFO | Step 3050: loss=0.2040, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:03:20.874719+0800 | INFO | Step 3051: loss=0.1685, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T23:03:23.658449+0800 | INFO | Step 3052: loss=0.4898, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T23:03:25.097103+0800 | INFO | Step 3053: loss=0.0084, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:03:26.526953+0800 | INFO | Step 3054: loss=0.0242, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:03:27.950540+0800 | INFO | Step 3055: loss=0.0501, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:03:29.401839+0800 | INFO | Step 3056: loss=0.3387, acc=0.856 (IF=0.778, MQ=0.933)
2025-12-21T23:03:30.832112+0800 | INFO | Step 3057: loss=0.1553, acc=0.929 (IF=0.857, MQ=1.000)
2025-12-21T23:03:32.247235+0800 | INFO | Step 3058: loss=0.0162, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:03:33.677255+0800 | INFO | Step 3059: loss=0.0420, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:03:35.109510+0800 | INFO | Step 3060: loss=0.0422, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:03:36.551085+0800 | INFO | Step 3061: loss=0.0038, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:03:37.976059+0800 | INFO | Step 3062: loss=0.1481, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T23:03:39.410311+0800 | INFO | Step 3063: loss=0.0436, acc=0.967 (IF=0.933, MQ=1.000)
2025-12-21T23:03:40.849709+0800 | INFO | Step 3064: loss=0.0022, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:03:42.287956+0800 | INFO | Step 3065: loss=0.0877, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:03:43.724878+0800 | INFO | Step 3066: loss=0.0795, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:03:45.153312+0800 | INFO | Step 3067: loss=0.1023, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:03:46.592283+0800 | INFO | Step 3068: loss=0.0034, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:03:48.035726+0800 | INFO | Step 3069: loss=0.1136, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:03:49.485884+0800 | INFO | Step 3070: loss=0.1283, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:03:50.934311+0800 | INFO | Step 3071: loss=0.0057, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:03:52.382358+0800 | INFO | Step 3072: loss=0.0772, acc=0.929 (IF=0.857, MQ=1.000)
2025-12-21T23:03:53.839974+0800 | INFO | Step 3073: loss=0.0076, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:03:55.309050+0800 | INFO | Step 3074: loss=0.2127, acc=0.875 (IF=1.000, MQ=0.750)
2025-12-21T23:03:56.768137+0800 | INFO | Step 3075: loss=0.0229, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:03:58.208536+0800 | INFO | Step 3076: loss=0.0127, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:03:59.643316+0800 | INFO | Step 3077: loss=0.0485, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:04:01.085926+0800 | INFO | Step 3078: loss=0.0219, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:04:02.553346+0800 | INFO | Step 3079: loss=0.1637, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:04:03.989736+0800 | INFO | Step 3080: loss=0.2461, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T23:04:05.443319+0800 | INFO | Step 3081: loss=0.0113, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:04:06.942003+0800 | INFO | Step 3082: loss=0.3918, acc=0.829 (IF=0.846, MQ=0.812)
2025-12-21T23:04:07.972300+0800 | INFO | Step 3083: loss=0.0007, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:04:09.419344+0800 | INFO | Step 3084: loss=0.0767, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:04:10.873156+0800 | INFO | Step 3085: loss=0.2094, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T23:04:12.323686+0800 | INFO | Step 3086: loss=0.3325, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T23:04:13.771973+0800 | INFO | Step 3087: loss=0.1018, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:04:15.219682+0800 | INFO | Step 3088: loss=0.1025, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:04:16.682424+0800 | INFO | Step 3089: loss=0.0081, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:04:18.137408+0800 | INFO | Step 3090: loss=0.0733, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:04:19.577080+0800 | INFO | Step 3091: loss=0.1166, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T23:04:21.002404+0800 | INFO | Step 3092: loss=0.0144, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:04:22.446411+0800 | INFO | Step 3093: loss=0.0636, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T23:04:23.879035+0800 | INFO | Step 3094: loss=0.1361, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T23:04:25.315279+0800 | INFO | Step 3095: loss=0.0630, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:04:26.752069+0800 | INFO | Step 3096: loss=0.0379, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:04:28.189319+0800 | INFO | Step 3097: loss=0.0689, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:04:29.637012+0800 | INFO | Step 3098: loss=0.2045, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:04:31.083454+0800 | INFO | Step 3099: loss=0.0287, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:04:32.521791+0800 | INFO | Step 3100: loss=0.6236, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:04:40.200583+0800 | INFO |
============================================================
Validation Results (took 7.65s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6207
Quality Acc: 0.6375
Average Acc: 0.6291
Total Loss: 0.6398
Instruction Loss: 0.6520
Quality Loss: 0.6277
============================================================
2025-12-21T23:04:42.980296+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_3100.pt
2025-12-21T23:04:42.980977+0800 | INFO | Best 3 checkpoints:
2025-12-21T23:04:42.981110+0800 | INFO | 1. Step 2800: acc=0.6541 (reward_model.best_2800.pt)
2025-12-21T23:04:42.981177+0800 | INFO | 2. Step 2600: acc=0.6478 (reward_model.best_2600.pt)
2025-12-21T23:04:42.981229+0800 | INFO | 3. Step 2300: acc=0.6392 (reward_model.best_2300.pt)
2025-12-21T23:04:44.479681+0800 | INFO | Step 3101: loss=0.0196, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:04:45.935101+0800 | INFO | Step 3102: loss=0.1174, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:04:47.413412+0800 | INFO | Step 3103: loss=0.5366, acc=0.866 (IF=0.857, MQ=0.875)
2025-12-21T23:04:48.866033+0800 | INFO | Step 3104: loss=0.3989, acc=0.826 (IF=0.778, MQ=0.875)
2025-12-21T23:04:50.314237+0800 | INFO | Step 3105: loss=0.0720, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T23:04:51.779483+0800 | INFO | Step 3106: loss=0.1924, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T23:04:53.232405+0800 | INFO | Step 3107: loss=0.0647, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:04:56.077744+0800 | INFO | Step 3108: loss=0.2239, acc=0.844 (IF=0.889, MQ=0.800)
2025-12-21T23:04:57.594830+0800 | INFO | Step 3109: loss=0.1299, acc=0.900 (IF=0.800, MQ=1.000)
2025-12-21T23:04:58.686706+0800 | INFO | Step 3110: loss=0.0706, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:05:00.132316+0800 | INFO | Step 3111: loss=0.0122, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:05:01.661303+0800 | INFO | Step 3112: loss=0.0405, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:05:02.686963+0800 | INFO | Step 3113: loss=0.3579, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T23:05:04.146709+0800 | INFO | Step 3114: loss=0.0551, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:05:05.614161+0800 | INFO | Step 3115: loss=0.1299, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:05:07.065800+0800 | INFO | Step 3116: loss=0.0374, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:05:08.578174+0800 | INFO | Step 3117: loss=0.1358, acc=0.902 (IF=0.929, MQ=0.875)
2025-12-21T23:05:10.036340+0800 | INFO | Step 3118: loss=0.0310, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:05:11.512412+0800 | INFO | Step 3119: loss=0.0632, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:05:13.002903+0800 | INFO | Step 3120: loss=0.0271, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:05:14.445381+0800 | INFO | Step 3121: loss=0.3654, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T23:05:15.909533+0800 | INFO | Step 3122: loss=0.0179, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:05:17.376031+0800 | INFO | Step 3123: loss=0.0648, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:05:18.811024+0800 | INFO | Step 3124: loss=0.0023, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:05:20.244268+0800 | INFO | Step 3125: loss=0.1225, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:05:21.680280+0800 | INFO | Step 3126: loss=0.0804, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:05:23.119659+0800 | INFO | Step 3127: loss=0.0207, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:05:24.568331+0800 | INFO | Step 3128: loss=0.0570, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:05:26.039904+0800 | INFO | Step 3129: loss=0.1053, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:05:27.482558+0800 | INFO | Step 3130: loss=0.1200, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T23:05:28.949450+0800 | INFO | Step 3131: loss=0.0650, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:05:30.389910+0800 | INFO | Step 3132: loss=0.0317, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:05:31.837252+0800 | INFO | Step 3133: loss=0.0871, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T23:05:33.282459+0800 | INFO | Step 3134: loss=0.0301, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:05:34.315641+0800 | INFO | Step 3135: loss=0.2757, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T23:05:35.762110+0800 | INFO | Step 3136: loss=0.0786, acc=0.923 (IF=0.846, MQ=1.000)
2025-12-21T23:05:37.210230+0800 | INFO | Step 3137: loss=0.2501, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:05:38.652520+0800 | INFO | Step 3138: loss=0.1287, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:05:40.124013+0800 | INFO | Step 3139: loss=0.0501, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:05:41.597625+0800 | INFO | Step 3140: loss=0.1049, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:05:43.076319+0800 | INFO | Step 3141: loss=0.4171, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T23:05:44.526619+0800 | INFO | Step 3142: loss=0.1953, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T23:05:45.995678+0800 | INFO | Step 3143: loss=0.3097, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T23:05:47.444352+0800 | INFO | Step 3144: loss=0.0384, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:05:48.950149+0800 | INFO | Step 3145: loss=0.1081, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:05:49.995027+0800 | INFO | Step 3146: loss=0.3934, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T23:05:51.441086+0800 | INFO | Step 3147: loss=0.4105, acc=0.795 (IF=0.714, MQ=0.875)
2025-12-21T23:05:52.883997+0800 | INFO | Step 3148: loss=0.3732, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T23:05:54.326697+0800 | INFO | Step 3149: loss=0.1612, acc=0.917 (IF=0.833, MQ=1.000)
2025-12-21T23:05:55.774130+0800 | INFO | Step 3150: loss=0.0717, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:05:57.261826+0800 | INFO | Step 3151: loss=0.0285, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:05:58.694872+0800 | INFO | Step 3152: loss=0.1079, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:06:00.178613+0800 | INFO | Step 3153: loss=0.0264, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:01.641270+0800 | INFO | Step 3154: loss=0.0095, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:03.124045+0800 | INFO | Step 3155: loss=0.1141, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:06:04.567506+0800 | INFO | Step 3156: loss=0.1037, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:06:06.009296+0800 | INFO | Step 3157: loss=0.0141, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:07.438300+0800 | INFO | Step 3158: loss=0.0071, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:08.893564+0800 | INFO | Step 3159: loss=0.0723, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:06:10.331943+0800 | INFO | Step 3160: loss=0.0477, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:06:11.362612+0800 | INFO | Step 3161: loss=0.0166, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:12.783612+0800 | INFO | Step 3162: loss=0.1525, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:06:15.412786+0800 | INFO | Step 3163: loss=0.0462, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:16.860352+0800 | INFO | Step 3164: loss=0.0774, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:06:18.316543+0800 | INFO | Step 3165: loss=0.0406, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:19.750545+0800 | INFO | Step 3166: loss=0.0485, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:06:21.196983+0800 | INFO | Step 3167: loss=0.0263, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:22.631102+0800 | INFO | Step 3168: loss=0.0132, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:24.060601+0800 | INFO | Step 3169: loss=0.1674, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:06:25.503391+0800 | INFO | Step 3170: loss=0.3868, acc=0.883 (IF=0.900, MQ=0.867)
2025-12-21T23:06:26.978719+0800 | INFO | Step 3171: loss=0.0036, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:28.413397+0800 | INFO | Step 3172: loss=0.0525, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:06:29.908166+0800 | INFO | Step 3173: loss=0.0258, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:31.350068+0800 | INFO | Step 3174: loss=0.1387, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:06:32.782678+0800 | INFO | Step 3175: loss=0.1437, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:06:34.214649+0800 | INFO | Step 3176: loss=0.0187, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:35.659868+0800 | INFO | Step 3177: loss=0.0898, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:06:37.127019+0800 | INFO | Step 3178: loss=0.0365, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:38.583810+0800 | INFO | Step 3179: loss=0.0243, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:40.034998+0800 | INFO | Step 3180: loss=0.0093, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:41.476506+0800 | INFO | Step 3181: loss=0.0078, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:42.922336+0800 | INFO | Step 3182: loss=0.1569, acc=0.923 (IF=0.846, MQ=1.000)
2025-12-21T23:06:44.353992+0800 | INFO | Step 3183: loss=0.0531, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:45.789138+0800 | INFO | Step 3184: loss=0.1770, acc=0.875 (IF=0.875, MQ=0.875)
2025-12-21T23:06:47.228982+0800 | INFO | Step 3185: loss=0.1099, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:06:48.664801+0800 | INFO | Step 3186: loss=0.0387, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:50.127022+0800 | INFO | Step 3187: loss=0.0674, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:06:51.580116+0800 | INFO | Step 3188: loss=0.0132, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:53.029532+0800 | INFO | Step 3189: loss=0.0744, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:06:54.453745+0800 | INFO | Step 3190: loss=0.0673, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:06:55.887942+0800 | INFO | Step 3191: loss=0.0650, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:06:57.405289+0800 | INFO | Step 3192: loss=0.0459, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:06:58.848897+0800 | INFO | Step 3193: loss=0.0313, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:07:00.314440+0800 | INFO | Step 3194: loss=0.0557, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T23:07:01.782697+0800 | INFO | Step 3195: loss=0.0313, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:07:03.228294+0800 | INFO | Step 3196: loss=0.0970, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:07:04.673526+0800 | INFO | Step 3197: loss=0.1676, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T23:07:06.143730+0800 | INFO | Step 3198: loss=0.0711, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:07:07.636122+0800 | INFO | Step 3199: loss=0.1221, acc=0.909 (IF=0.818, MQ=1.000)
2025-12-21T23:07:09.093626+0800 | INFO | Step 3200: loss=0.0454, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:07:16.684101+0800 | INFO |
============================================================
Validation Results (took 7.56s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6034
Quality Acc: 0.6500
Average Acc: 0.6267
Total Loss: 0.6370
Instruction Loss: 0.6512
Quality Loss: 0.6229
============================================================
2025-12-21T23:07:19.315124+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_3200.pt
2025-12-21T23:07:19.315689+0800 | INFO | Best 3 checkpoints:
2025-12-21T23:07:19.315828+0800 | INFO | 1. Step 2800: acc=0.6541 (reward_model.best_2800.pt)
2025-12-21T23:07:19.315887+0800 | INFO | 2. Step 2600: acc=0.6478 (reward_model.best_2600.pt)
2025-12-21T23:07:19.315934+0800 | INFO | 3. Step 2300: acc=0.6392 (reward_model.best_2300.pt)
2025-12-21T23:07:20.822825+0800 | INFO | Step 3201: loss=0.2736, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:07:22.274594+0800 | INFO | Step 3202: loss=0.2440, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:07:23.711464+0800 | INFO | Step 3203: loss=0.0222, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:07:25.158586+0800 | INFO | Step 3204: loss=0.0036, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:07:26.617687+0800 | INFO | Step 3205: loss=0.0235, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:07:28.058052+0800 | INFO | Step 3206: loss=0.0958, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:07:29.524688+0800 | INFO | Step 3207: loss=0.4310, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T23:07:31.042685+0800 | INFO | Step 3208: loss=0.0683, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:07:32.507059+0800 | INFO | Step 3209: loss=0.0111, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:07:33.998122+0800 | INFO | Step 3210: loss=0.0146, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:07:35.451964+0800 | INFO | Step 3211: loss=0.0043, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:07:36.896030+0800 | INFO | Step 3212: loss=0.0364, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:07:38.358832+0800 | INFO | Step 3213: loss=0.2769, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:07:39.810648+0800 | INFO | Step 3214: loss=0.0249, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:07:41.254534+0800 | INFO | Step 3215: loss=0.1127, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:07:42.698320+0800 | INFO | Step 3216: loss=0.0652, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:07:44.138138+0800 | INFO | Step 3217: loss=0.0525, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T23:07:45.567746+0800 | INFO | Step 3218: loss=0.0702, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:07:48.081515+0800 | INFO | Step 3219: loss=0.0431, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:07:49.549280+0800 | INFO | Step 3220: loss=0.0192, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:07:50.606973+0800 | INFO | Step 3221: loss=0.0641, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:07:52.121794+0800 | INFO | Step 3222: loss=0.1776, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T23:07:53.578219+0800 | INFO | Step 3223: loss=0.1087, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T23:07:55.058113+0800 | INFO | Step 3224: loss=0.0393, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:07:56.514331+0800 | INFO | Step 3225: loss=0.0093, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:07:57.965715+0800 | INFO | Step 3226: loss=0.2084, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:07:59.017251+0800 | INFO | Step 3227: loss=0.0871, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T23:08:00.459349+0800 | INFO | Step 3228: loss=0.0389, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:08:01.906054+0800 | INFO | Step 3229: loss=0.0096, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:08:03.429134+0800 | INFO | Step 3230: loss=0.1940, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T23:08:04.498869+0800 | INFO | Step 3231: loss=0.0911, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:08:05.949157+0800 | INFO | Step 3232: loss=0.0041, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:08:07.432454+0800 | INFO | Step 3233: loss=0.3111, acc=0.763 (IF=0.714, MQ=0.812)
2025-12-21T23:08:08.482292+0800 | INFO | Step 3234: loss=0.0933, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:08:09.929559+0800 | INFO | Step 3235: loss=0.0687, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:08:11.383898+0800 | INFO | Step 3236: loss=0.0744, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:08:12.866285+0800 | INFO | Step 3237: loss=0.0525, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:08:14.340528+0800 | INFO | Step 3238: loss=0.0326, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:08:15.794186+0800 | INFO | Step 3239: loss=0.0251, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:08:17.247563+0800 | INFO | Step 3240: loss=0.0220, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:08:18.729338+0800 | INFO | Step 3241: loss=0.1233, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:08:20.192509+0800 | INFO | Step 3242: loss=0.2602, acc=0.875 (IF=0.875, MQ=0.875)
2025-12-21T23:08:21.638505+0800 | INFO | Step 3243: loss=0.2126, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:08:23.107501+0800 | INFO | Step 3244: loss=0.1048, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T23:08:24.560108+0800 | INFO | Step 3245: loss=0.0913, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:08:25.982494+0800 | INFO | Step 3246: loss=0.0078, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:08:27.480608+0800 | INFO | Step 3247: loss=0.0502, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:08:28.953502+0800 | INFO | Step 3248: loss=0.1358, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:08:30.391287+0800 | INFO | Step 3249: loss=0.1592, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T23:08:31.827081+0800 | INFO | Step 3250: loss=0.0205, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:08:33.263129+0800 | INFO | Step 3251: loss=0.0303, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:08:34.696152+0800 | INFO | Step 3252: loss=0.0178, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:08:35.735080+0800 | INFO | Step 3253: loss=0.3317, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T23:08:36.764169+0800 | INFO | Step 3254: loss=0.5789, acc=0.804 (IF=0.857, MQ=0.750)
2025-12-21T23:08:38.206849+0800 | INFO | Step 3255: loss=0.0840, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:08:39.668559+0800 | INFO | Step 3256: loss=0.1294, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:08:41.139365+0800 | INFO | Step 3257: loss=0.0709, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:08:42.597665+0800 | INFO | Step 3258: loss=0.2282, acc=0.858 (IF=0.778, MQ=0.938)
2025-12-21T23:08:43.689541+0800 | INFO | Step 3259: loss=0.1204, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:08:45.152301+0800 | INFO | Step 3260: loss=0.1730, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:08:46.626647+0800 | INFO | Step 3261: loss=0.0470, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T23:08:48.065219+0800 | INFO | Step 3262: loss=0.0388, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:08:49.503004+0800 | INFO | Step 3263: loss=0.0552, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:08:50.937940+0800 | INFO | Step 3264: loss=0.0364, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:08:52.369799+0800 | INFO | Step 3265: loss=0.1210, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T23:08:53.810646+0800 | INFO | Step 3266: loss=0.0716, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:08:55.265627+0800 | INFO | Step 3267: loss=0.0426, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:08:56.708171+0800 | INFO | Step 3268: loss=0.0280, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:08:58.145736+0800 | INFO | Step 3269: loss=0.1286, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:08:59.624136+0800 | INFO | Step 3270: loss=0.0394, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:09:01.087519+0800 | INFO | Step 3271: loss=0.0425, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:09:02.576085+0800 | INFO | Step 3272: loss=0.0259, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:09:04.033931+0800 | INFO | Step 3273: loss=0.1611, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T23:09:06.575383+0800 | INFO | Step 3274: loss=0.0035, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:09:08.052180+0800 | INFO | Step 3275: loss=0.0365, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:09:09.495018+0800 | INFO | Step 3276: loss=0.1302, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:09:10.980949+0800 | INFO | Step 3277: loss=0.0056, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:09:12.425940+0800 | INFO | Step 3278: loss=0.1491, acc=0.933 (IF=1.000, MQ=0.867)
2025-12-21T23:09:13.877613+0800 | INFO | Step 3279: loss=0.0016, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:09:15.308626+0800 | INFO | Step 3280: loss=0.1072, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:09:16.790724+0800 | INFO | Step 3281: loss=0.0205, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:09:18.223024+0800 | INFO | Step 3282: loss=0.0166, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:09:19.659051+0800 | INFO | Step 3283: loss=0.0463, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:09:21.102770+0800 | INFO | Step 3284: loss=0.0191, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:09:22.581928+0800 | INFO | Step 3285: loss=0.0410, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:09:24.011711+0800 | INFO | Step 3286: loss=0.0096, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:09:25.449734+0800 | INFO | Step 3287: loss=0.0443, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:09:26.926468+0800 | INFO | Step 3288: loss=0.1844, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:09:28.398032+0800 | INFO | Step 3289: loss=0.0074, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:09:29.848813+0800 | INFO | Step 3290: loss=0.0518, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:09:31.291783+0800 | INFO | Step 3291: loss=0.0257, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:09:32.738171+0800 | INFO | Step 3292: loss=0.0309, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:09:34.195226+0800 | INFO | Step 3293: loss=0.0323, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:09:35.648985+0800 | INFO | Step 3294: loss=0.5010, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T23:09:37.095486+0800 | INFO | Step 3295: loss=0.0288, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:09:38.529109+0800 | INFO | Step 3296: loss=0.1323, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:09:39.970528+0800 | INFO | Step 3297: loss=0.1403, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T23:09:41.459770+0800 | INFO | Step 3298: loss=0.0918, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:09:42.945247+0800 | INFO | Step 3299: loss=0.0712, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:09:44.378295+0800 | INFO | Step 3300: loss=0.1069, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:09:51.789622+0800 | INFO |
============================================================
Validation Results (took 7.39s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6207
Quality Acc: 0.6625
Average Acc: 0.6416
Total Loss: 0.6349
Instruction Loss: 0.6507
Quality Loss: 0.6191
============================================================
2025-12-21T23:09:54.493188+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_2300.pt
2025-12-21T23:09:54.493713+0800 | INFO | Best 3 checkpoints:
2025-12-21T23:09:54.493814+0800 | INFO | 1. Step 2800: acc=0.6541 (reward_model.best_2800.pt)
2025-12-21T23:09:54.493873+0800 | INFO | 2. Step 2600: acc=0.6478 (reward_model.best_2600.pt)
2025-12-21T23:09:54.493918+0800 | INFO | 3. Step 3300: acc=0.6416 (reward_model.best_3300.pt)
2025-12-21T23:09:55.956587+0800 | INFO | Step 3301: loss=0.0640, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:09:57.007485+0800 | INFO | Step 3302: loss=0.0059, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:09:58.446123+0800 | INFO | Step 3303: loss=0.0252, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:09:59.906671+0800 | INFO | Step 3304: loss=0.0328, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:01.392866+0800 | INFO | Step 3305: loss=0.0013, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:02.827210+0800 | INFO | Step 3306: loss=0.0169, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:04.264786+0800 | INFO | Step 3307: loss=0.0780, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T23:10:05.723678+0800 | INFO | Step 3308: loss=0.0219, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:07.221056+0800 | INFO | Step 3309: loss=0.0988, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T23:10:08.640023+0800 | INFO | Step 3310: loss=0.0253, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:10.073635+0800 | INFO | Step 3311: loss=0.1180, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:10:11.553499+0800 | INFO | Step 3312: loss=0.2806, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T23:10:13.002459+0800 | INFO | Step 3313: loss=0.4386, acc=0.861 (IF=0.846, MQ=0.875)
2025-12-21T23:10:14.463475+0800 | INFO | Step 3314: loss=0.0209, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:15.890230+0800 | INFO | Step 3315: loss=0.0628, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:16.913743+0800 | INFO | Step 3316: loss=0.0476, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:10:18.368468+0800 | INFO | Step 3317: loss=0.0184, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:19.804099+0800 | INFO | Step 3318: loss=0.0246, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:21.244475+0800 | INFO | Step 3319: loss=0.1178, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:10:22.689007+0800 | INFO | Step 3320: loss=0.0683, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:10:24.141002+0800 | INFO | Step 3321: loss=0.1065, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:10:25.592242+0800 | INFO | Step 3322: loss=0.0509, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:10:27.044517+0800 | INFO | Step 3323: loss=0.0295, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:28.473167+0800 | INFO | Step 3324: loss=0.0385, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:29.909558+0800 | INFO | Step 3325: loss=0.0104, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:31.344557+0800 | INFO | Step 3326: loss=0.1259, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T23:10:32.786131+0800 | INFO | Step 3327: loss=0.1458, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:10:34.250497+0800 | INFO | Step 3328: loss=0.2069, acc=0.923 (IF=0.846, MQ=1.000)
2025-12-21T23:10:35.692721+0800 | INFO | Step 3329: loss=0.0089, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:38.478444+0800 | INFO | Step 3330: loss=0.0077, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:39.919821+0800 | INFO | Step 3331: loss=0.0205, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:41.355040+0800 | INFO | Step 3332: loss=0.0098, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:42.798990+0800 | INFO | Step 3333: loss=0.1512, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T23:10:44.348878+0800 | INFO | Step 3334: loss=0.0641, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:10:45.771494+0800 | INFO | Step 3335: loss=0.0108, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:47.215390+0800 | INFO | Step 3336: loss=0.0466, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:10:48.279041+0800 | INFO | Step 3337: loss=0.1981, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:10:49.717432+0800 | INFO | Step 3338: loss=0.0592, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:51.165549+0800 | INFO | Step 3339: loss=0.0030, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:52.622815+0800 | INFO | Step 3340: loss=0.0492, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:54.081585+0800 | INFO | Step 3341: loss=0.1976, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T23:10:55.514508+0800 | INFO | Step 3342: loss=0.0037, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:56.952070+0800 | INFO | Step 3343: loss=0.0274, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:10:58.385648+0800 | INFO | Step 3344: loss=0.0261, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:10:59.875619+0800 | INFO | Step 3345: loss=0.1378, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:11:01.305579+0800 | INFO | Step 3346: loss=0.2458, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T23:11:02.369373+0800 | INFO | Step 3347: loss=0.1469, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:11:03.812859+0800 | INFO | Step 3348: loss=0.0619, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:11:05.289842+0800 | INFO | Step 3349: loss=0.8650, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T23:11:06.746159+0800 | INFO | Step 3350: loss=0.0649, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:11:08.178421+0800 | INFO | Step 3351: loss=0.1353, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:11:09.623830+0800 | INFO | Step 3352: loss=0.0486, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:11:11.069738+0800 | INFO | Step 3353: loss=0.2273, acc=0.906 (IF=0.875, MQ=0.938)
2025-12-21T23:11:12.109665+0800 | INFO | Step 3354: loss=0.1119, acc=0.906 (IF=0.875, MQ=0.938)
2025-12-21T23:11:13.572281+0800 | INFO | Step 3355: loss=0.0844, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:11:15.019928+0800 | INFO | Step 3356: loss=0.0085, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:11:16.067271+0800 | INFO | Step 3357: loss=0.0043, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:11:17.490417+0800 | INFO | Step 3358: loss=0.1302, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:11:18.532776+0800 | INFO | Step 3359: loss=0.0222, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:11:19.967330+0800 | INFO | Step 3360: loss=0.0448, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:11:21.432211+0800 | INFO | Step 3361: loss=0.2130, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T23:11:22.864841+0800 | INFO | Step 3362: loss=0.1299, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:11:24.320376+0800 | INFO | Step 3363: loss=0.0183, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:11:25.770429+0800 | INFO | Step 3364: loss=0.0106, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:11:27.208116+0800 | INFO | Step 3365: loss=0.1736, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:11:28.652216+0800 | INFO | Step 3366: loss=0.0507, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:11:30.089352+0800 | INFO | Step 3367: loss=0.0203, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:11:31.532871+0800 | INFO | Step 3368: loss=0.3944, acc=0.861 (IF=0.846, MQ=0.875)
2025-12-21T23:11:32.981625+0800 | INFO | Step 3369: loss=0.0744, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:11:34.430535+0800 | INFO | Step 3370: loss=0.0831, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:11:35.885194+0800 | INFO | Step 3371: loss=0.0257, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:11:37.338544+0800 | INFO | Step 3372: loss=0.0143, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:11:38.798740+0800 | INFO | Step 3373: loss=0.2077, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:11:40.232978+0800 | INFO | Step 3374: loss=0.0504, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:11:41.661880+0800 | INFO | Step 3375: loss=0.0461, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T23:11:43.107149+0800 | INFO | Step 3376: loss=0.0631, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:11:44.547765+0800 | INFO | Step 3377: loss=0.0189, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:11:45.996422+0800 | INFO | Step 3378: loss=0.1214, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:11:47.441605+0800 | INFO | Step 3379: loss=0.0510, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:11:48.890530+0800 | INFO | Step 3380: loss=0.1240, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:11:50.341308+0800 | INFO | Step 3381: loss=0.0774, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:11:51.782794+0800 | INFO | Step 3382: loss=0.0324, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:11:53.224642+0800 | INFO | Step 3383: loss=0.1452, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T23:11:54.661812+0800 | INFO | Step 3384: loss=0.2426, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T23:11:57.640635+0800 | INFO | Step 3385: loss=0.1175, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T23:11:59.105345+0800 | INFO | Step 3386: loss=0.0415, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:12:00.553758+0800 | INFO | Step 3387: loss=0.0074, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:12:02.083947+0800 | INFO | Step 3388: loss=0.0159, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:12:03.628028+0800 | INFO | Step 3389: loss=0.1269, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:12:05.061105+0800 | INFO | Step 3390: loss=0.1178, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:12:06.495617+0800 | INFO | Step 3391: loss=0.1232, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:12:07.935952+0800 | INFO | Step 3392: loss=0.1332, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:12:09.387408+0800 | INFO | Step 3393: loss=0.0436, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:12:10.865431+0800 | INFO | Step 3394: loss=0.0603, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:12:12.323106+0800 | INFO | Step 3395: loss=0.0072, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:12:13.754623+0800 | INFO | Step 3396: loss=0.0145, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:12:15.187276+0800 | INFO | Step 3397: loss=0.1669, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T23:12:16.626317+0800 | INFO | Step 3398: loss=0.0314, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:12:18.068522+0800 | INFO | Step 3399: loss=0.0747, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:12:19.514071+0800 | INFO | Step 3400: loss=0.0695, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:12:26.780696+0800 | INFO |
============================================================
Validation Results (took 7.24s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.5690
Quality Acc: 0.6625
Average Acc: 0.6157
Total Loss: 0.6392
Instruction Loss: 0.6601
Quality Loss: 0.6183
============================================================
2025-12-21T23:12:29.411535+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_3400.pt
2025-12-21T23:12:29.412092+0800 | INFO | Best 3 checkpoints:
2025-12-21T23:12:29.412201+0800 | INFO | 1. Step 2800: acc=0.6541 (reward_model.best_2800.pt)
2025-12-21T23:12:29.412264+0800 | INFO | 2. Step 2600: acc=0.6478 (reward_model.best_2600.pt)
2025-12-21T23:12:29.412316+0800 | INFO | 3. Step 3300: acc=0.6416 (reward_model.best_3300.pt)
2025-12-21T23:12:30.898538+0800 | INFO | Step 3401: loss=0.0660, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:12:32.349074+0800 | INFO | Step 3402: loss=0.0143, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:12:33.783032+0800 | INFO | Step 3403: loss=0.0783, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:12:35.208222+0800 | INFO | Step 3404: loss=0.0357, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:12:36.667741+0800 | INFO | Step 3405: loss=0.1352, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:12:38.108884+0800 | INFO | Step 3406: loss=0.2815, acc=0.861 (IF=0.846, MQ=0.875)
2025-12-21T23:12:39.550803+0800 | INFO | Step 3407: loss=0.1369, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:12:41.019699+0800 | INFO | Step 3408: loss=0.0078, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:12:42.054908+0800 | INFO | Step 3409: loss=0.3029, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:12:43.491889+0800 | INFO | Step 3410: loss=0.1891, acc=0.900 (IF=0.800, MQ=1.000)
2025-12-21T23:12:44.531809+0800 | INFO | Step 3411: loss=0.0392, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:12:45.995292+0800 | INFO | Step 3412: loss=0.0098, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:12:47.451757+0800 | INFO | Step 3413: loss=0.0160, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:12:48.888985+0800 | INFO | Step 3414: loss=0.0147, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:12:50.355225+0800 | INFO | Step 3415: loss=0.2589, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T23:12:51.810563+0800 | INFO | Step 3416: loss=0.0224, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:12:53.254380+0800 | INFO | Step 3417: loss=0.2610, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:12:54.692013+0800 | INFO | Step 3418: loss=0.0109, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:12:56.120237+0800 | INFO | Step 3419: loss=0.0837, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:12:57.556145+0800 | INFO | Step 3420: loss=0.5492, acc=0.897 (IF=0.857, MQ=0.938)
2025-12-21T23:12:58.999277+0800 | INFO | Step 3421: loss=0.0161, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:13:00.446675+0800 | INFO | Step 3422: loss=0.0186, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:13:01.899383+0800 | INFO | Step 3423: loss=0.0805, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:13:03.352653+0800 | INFO | Step 3424: loss=0.3721, acc=0.856 (IF=0.900, MQ=0.812)
2025-12-21T23:13:04.802104+0800 | INFO | Step 3425: loss=0.0065, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:13:06.287909+0800 | INFO | Step 3426: loss=0.6614, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T23:13:07.788831+0800 | INFO | Step 3427: loss=0.0147, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:13:09.227129+0800 | INFO | Step 3428: loss=0.0102, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:13:10.683748+0800 | INFO | Step 3429: loss=0.0652, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:13:11.722168+0800 | INFO | Step 3430: loss=0.0116, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:13:13.191584+0800 | INFO | Step 3431: loss=0.1374, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T23:13:14.633803+0800 | INFO | Step 3432: loss=0.1173, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T23:13:16.076262+0800 | INFO | Step 3433: loss=0.1322, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T23:13:17.520972+0800 | INFO | Step 3434: loss=0.2629, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:13:18.963601+0800 | INFO | Step 3435: loss=0.3686, acc=0.917 (IF=0.900, MQ=0.933)
2025-12-21T23:13:20.410666+0800 | INFO | Step 3436: loss=1.0697, acc=0.799 (IF=0.786, MQ=0.812)
2025-12-21T23:13:21.861205+0800 | INFO | Step 3437: loss=0.0819, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T23:13:22.899277+0800 | INFO | Step 3438: loss=0.0513, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:13:24.355718+0800 | INFO | Step 3439: loss=0.1708, acc=0.929 (IF=0.857, MQ=1.000)
2025-12-21T23:13:25.793376+0800 | INFO | Step 3440: loss=0.0359, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:13:28.472207+0800 | INFO | Step 3441: loss=0.2216, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:13:29.903563+0800 | INFO | Step 3442: loss=0.0260, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:13:31.322757+0800 | INFO | Step 3443: loss=0.0260, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:13:32.814775+0800 | INFO | Step 3444: loss=0.0171, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:13:34.256706+0800 | INFO | Step 3445: loss=0.1578, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:13:35.720312+0800 | INFO | Step 3446: loss=0.0177, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:13:37.165692+0800 | INFO | Step 3447: loss=0.0277, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:13:38.636550+0800 | INFO | Step 3448: loss=0.0017, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:13:40.094057+0800 | INFO | Step 3449: loss=0.1535, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:13:41.538043+0800 | INFO | Step 3450: loss=0.1278, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:13:42.985554+0800 | INFO | Step 3451: loss=0.0612, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:13:44.466195+0800 | INFO | Step 3452: loss=0.0653, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:13:45.910369+0800 | INFO | Step 3453: loss=0.0067, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:13:46.949749+0800 | INFO | Step 3454: loss=0.0215, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:13:48.395781+0800 | INFO | Step 3455: loss=0.0029, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:13:49.821360+0800 | INFO | Step 3456: loss=0.1387, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:13:51.275092+0800 | INFO | Step 3457: loss=0.0112, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:13:52.715218+0800 | INFO | Step 3458: loss=0.0241, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:13:54.172661+0800 | INFO | Step 3459: loss=0.0062, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:13:55.597681+0800 | INFO | Step 3460: loss=0.0848, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:13:57.085940+0800 | INFO | Step 3461: loss=0.0342, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:13:58.577255+0800 | INFO | Step 3462: loss=0.2472, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:13:59.635842+0800 | INFO | Step 3463: loss=0.0806, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:14:01.072806+0800 | INFO | Step 3464: loss=0.0606, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:14:02.515580+0800 | INFO | Step 3465: loss=0.2393, acc=0.844 (IF=0.750, MQ=0.938)
2025-12-21T23:14:03.954109+0800 | INFO | Step 3466: loss=0.0045, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:14:05.439287+0800 | INFO | Step 3467: loss=0.4045, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T23:14:06.875498+0800 | INFO | Step 3468: loss=0.0934, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:14:08.311807+0800 | INFO | Step 3469: loss=0.0546, acc=0.967 (IF=0.933, MQ=1.000)
2025-12-21T23:14:09.759487+0800 | INFO | Step 3470: loss=0.0196, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:14:11.207229+0800 | INFO | Step 3471: loss=0.0999, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T23:14:12.653670+0800 | INFO | Step 3472: loss=0.2333, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T23:14:14.099933+0800 | INFO | Step 3473: loss=0.0494, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:14:15.552291+0800 | INFO | Step 3474: loss=0.0964, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T23:14:17.001523+0800 | INFO | Step 3475: loss=0.0901, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:14:18.450796+0800 | INFO | Step 3476: loss=0.0327, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:14:19.897758+0800 | INFO | Step 3477: loss=0.1366, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:14:21.350115+0800 | INFO | Step 3478: loss=0.1738, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:14:22.799067+0800 | INFO | Step 3479: loss=0.1092, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:14:24.224184+0800 | INFO | Step 3480: loss=0.1097, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:14:25.666498+0800 | INFO | Step 3481: loss=0.0589, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:14:27.107798+0800 | INFO | Step 3482: loss=0.0787, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:14:28.530044+0800 | INFO | Step 3483: loss=0.0375, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:14:29.961581+0800 | INFO | Step 3484: loss=0.0263, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:14:31.397005+0800 | INFO | Step 3485: loss=0.0566, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:14:32.490911+0800 | INFO | Step 3486: loss=0.0045, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:14:33.922119+0800 | INFO | Step 3487: loss=0.0586, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:14:35.358498+0800 | INFO | Step 3488: loss=0.0786, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:14:36.381275+0800 | INFO | Step 3489: loss=0.1303, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:14:37.822670+0800 | INFO | Step 3490: loss=0.1232, acc=0.923 (IF=0.846, MQ=1.000)
2025-12-21T23:14:39.281026+0800 | INFO | Step 3491: loss=0.0069, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:14:40.344800+0800 | INFO | Step 3492: loss=0.1627, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T23:14:41.794465+0800 | INFO | Step 3493: loss=0.3253, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:14:43.237681+0800 | INFO | Step 3494: loss=0.0579, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:14:44.679094+0800 | INFO | Step 3495: loss=0.0122, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:14:47.342953+0800 | INFO | Step 3496: loss=0.0700, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:14:48.798320+0800 | INFO | Step 3497: loss=0.0690, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:14:50.236589+0800 | INFO | Step 3498: loss=0.1853, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:14:51.678518+0800 | INFO | Step 3499: loss=0.1565, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:14:53.143670+0800 | INFO | Step 3500: loss=0.1251, acc=0.933 (IF=1.000, MQ=0.867)
2025-12-21T23:15:00.584649+0800 | INFO |
============================================================
Validation Results (took 7.41s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6552
Quality Acc: 0.6500
Average Acc: 0.6526
Total Loss: 0.6366
Instruction Loss: 0.6484
Quality Loss: 0.6249
============================================================
2025-12-21T23:15:03.192613+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_3300.pt
2025-12-21T23:15:03.193559+0800 | INFO | Best 3 checkpoints:
2025-12-21T23:15:03.193692+0800 | INFO | 1. Step 2800: acc=0.6541 (reward_model.best_2800.pt)
2025-12-21T23:15:03.193749+0800 | INFO | 2. Step 3500: acc=0.6526 (reward_model.best_3500.pt)
2025-12-21T23:15:03.193801+0800 | INFO | 3. Step 2600: acc=0.6478 (reward_model.best_2600.pt)
2025-12-21T23:15:04.664716+0800 | INFO | Step 3501: loss=0.0501, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:15:06.130972+0800 | INFO | Step 3502: loss=0.0015, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:15:07.572094+0800 | INFO | Step 3503: loss=0.0497, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:15:09.035936+0800 | INFO | Step 3504: loss=0.1864, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T23:15:10.554598+0800 | INFO | Step 3505: loss=0.0165, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:15:12.049855+0800 | INFO | Step 3506: loss=0.0128, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:15:13.518599+0800 | INFO | Step 3507: loss=0.0154, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:15:14.574801+0800 | INFO | Step 3508: loss=0.0264, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:15:16.014705+0800 | INFO | Step 3509: loss=0.0052, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:15:17.480165+0800 | INFO | Step 3510: loss=0.0296, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:15:18.923103+0800 | INFO | Step 3511: loss=0.0750, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:15:20.385681+0800 | INFO | Step 3512: loss=0.0704, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:15:21.839475+0800 | INFO | Step 3513: loss=0.0204, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:15:23.297358+0800 | INFO | Step 3514: loss=0.3342, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T23:15:24.758216+0800 | INFO | Step 3515: loss=0.0011, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:15:26.212522+0800 | INFO | Step 3516: loss=0.1762, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:15:27.662545+0800 | INFO | Step 3517: loss=0.0781, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T23:15:29.110870+0800 | INFO | Step 3518: loss=0.0953, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:15:30.560420+0800 | INFO | Step 3519: loss=0.3152, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:15:32.017060+0800 | INFO | Step 3520: loss=0.4082, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:15:33.464059+0800 | INFO | Step 3521: loss=0.0021, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:15:34.924634+0800 | INFO | Step 3522: loss=0.0987, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:15:36.370131+0800 | INFO | Step 3523: loss=0.0401, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:15:37.806665+0800 | INFO | Step 3524: loss=0.2841, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T23:15:39.249774+0800 | INFO | Step 3525: loss=0.0006, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:15:40.734734+0800 | INFO | Step 3526: loss=0.0993, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:15:42.184592+0800 | INFO | Step 3527: loss=0.1466, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:15:43.657384+0800 | INFO | Step 3528: loss=0.0053, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:15:45.145767+0800 | INFO | Step 3529: loss=0.0205, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:15:46.579859+0800 | INFO | Step 3530: loss=0.0643, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:15:48.010314+0800 | INFO | Step 3531: loss=0.2238, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:15:49.451812+0800 | INFO | Step 3532: loss=0.1094, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:15:50.890727+0800 | INFO | Step 3533: loss=0.0186, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:15:52.332478+0800 | INFO | Step 3534: loss=0.1012, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:15:53.785211+0800 | INFO | Step 3535: loss=0.0320, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:15:55.234825+0800 | INFO | Step 3536: loss=0.0599, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:15:56.686033+0800 | INFO | Step 3537: loss=0.4269, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T23:15:58.132865+0800 | INFO | Step 3538: loss=0.0372, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:15:59.582645+0800 | INFO | Step 3539: loss=0.0108, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:01.024022+0800 | INFO | Step 3540: loss=0.0309, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:02.478749+0800 | INFO | Step 3541: loss=0.1663, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:16:03.929497+0800 | INFO | Step 3542: loss=0.0137, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:05.376078+0800 | INFO | Step 3543: loss=0.0074, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:06.823765+0800 | INFO | Step 3544: loss=0.0460, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:08.331042+0800 | INFO | Step 3545: loss=0.0754, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:16:09.777198+0800 | INFO | Step 3546: loss=0.0115, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:11.220841+0800 | INFO | Step 3547: loss=0.0145, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:12.696097+0800 | INFO | Step 3548: loss=0.0121, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:14.134384+0800 | INFO | Step 3549: loss=0.0848, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:16:15.560971+0800 | INFO | Step 3550: loss=0.0378, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:16:16.981460+0800 | INFO | Step 3551: loss=0.1158, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:16:19.553037+0800 | INFO | Step 3552: loss=0.1345, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:16:21.032426+0800 | INFO | Step 3553: loss=0.0025, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:22.127862+0800 | INFO | Step 3554: loss=0.0611, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:16:23.566991+0800 | INFO | Step 3555: loss=0.0152, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:25.016906+0800 | INFO | Step 3556: loss=0.0670, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T23:16:26.457290+0800 | INFO | Step 3557: loss=0.2647, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T23:16:27.916594+0800 | INFO | Step 3558: loss=0.1377, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:16:29.365400+0800 | INFO | Step 3559: loss=0.0493, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:16:30.802884+0800 | INFO | Step 3560: loss=0.0252, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:32.248324+0800 | INFO | Step 3561: loss=0.0433, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:16:33.699183+0800 | INFO | Step 3562: loss=0.0141, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:35.151321+0800 | INFO | Step 3563: loss=0.1575, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:16:36.599196+0800 | INFO | Step 3564: loss=0.0142, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:38.060913+0800 | INFO | Step 3565: loss=0.0020, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:39.526970+0800 | INFO | Step 3566: loss=0.0818, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:16:40.977833+0800 | INFO | Step 3567: loss=0.0090, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:42.423572+0800 | INFO | Step 3568: loss=0.0260, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:43.868304+0800 | INFO | Step 3569: loss=0.0159, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:44.904211+0800 | INFO | Step 3570: loss=0.0552, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:46.354518+0800 | INFO | Step 3571: loss=0.0823, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:16:47.421095+0800 | INFO | Step 3572: loss=0.2300, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:16:48.874879+0800 | INFO | Step 3573: loss=0.0559, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T23:16:50.323489+0800 | INFO | Step 3574: loss=0.0049, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:51.771950+0800 | INFO | Step 3575: loss=0.1730, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:16:53.219543+0800 | INFO | Step 3576: loss=0.1311, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:16:54.661727+0800 | INFO | Step 3577: loss=0.3416, acc=0.851 (IF=0.889, MQ=0.812)
2025-12-21T23:16:56.104605+0800 | INFO | Step 3578: loss=0.1760, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:16:57.544177+0800 | INFO | Step 3579: loss=0.0201, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:16:58.972695+0800 | INFO | Step 3580: loss=0.2860, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:17:00.409684+0800 | INFO | Step 3581: loss=0.0400, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:17:01.843715+0800 | INFO | Step 3582: loss=0.1039, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:17:03.275400+0800 | INFO | Step 3583: loss=0.0245, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:17:04.306913+0800 | INFO | Step 3584: loss=0.4712, acc=0.838 (IF=0.800, MQ=0.875)
2025-12-21T23:17:05.795837+0800 | INFO | Step 3585: loss=0.0368, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:17:07.262080+0800 | INFO | Step 3586: loss=0.0740, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:17:08.706764+0800 | INFO | Step 3587: loss=0.2967, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T23:17:10.158294+0800 | INFO | Step 3588: loss=0.0386, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:17:11.607095+0800 | INFO | Step 3589: loss=0.0860, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:17:13.058273+0800 | INFO | Step 3590: loss=0.0314, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:17:14.568543+0800 | INFO | Step 3591: loss=0.0052, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:17:16.046190+0800 | INFO | Step 3592: loss=0.0121, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:17:17.535029+0800 | INFO | Step 3593: loss=0.0043, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:17:18.987852+0800 | INFO | Step 3594: loss=0.0018, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:17:20.432858+0800 | INFO | Step 3595: loss=0.2189, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:17:21.865521+0800 | INFO | Step 3596: loss=0.0651, acc=0.967 (IF=0.933, MQ=1.000)
2025-12-21T23:17:23.317512+0800 | INFO | Step 3597: loss=0.1724, acc=0.923 (IF=0.846, MQ=1.000)
2025-12-21T23:17:24.780952+0800 | INFO | Step 3598: loss=0.0083, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:17:26.239241+0800 | INFO | Step 3599: loss=0.0939, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:17:27.677436+0800 | INFO | Step 3600: loss=0.0634, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:17:34.868870+0800 | INFO |
============================================================
Validation Results (took 7.17s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6034
Quality Acc: 0.6750
Average Acc: 0.6392
Total Loss: 0.6299
Instruction Loss: 0.6461
Quality Loss: 0.6137
============================================================
2025-12-21T23:17:37.556138+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_3600.pt
2025-12-21T23:17:37.556663+0800 | INFO | Best 3 checkpoints:
2025-12-21T23:17:37.556757+0800 | INFO | 1. Step 2800: acc=0.6541 (reward_model.best_2800.pt)
2025-12-21T23:17:37.556819+0800 | INFO | 2. Step 3500: acc=0.6526 (reward_model.best_3500.pt)
2025-12-21T23:17:37.556867+0800 | INFO | 3. Step 2600: acc=0.6478 (reward_model.best_2600.pt)
2025-12-21T23:17:39.007990+0800 | INFO | Step 3601: loss=0.0658, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:17:40.465251+0800 | INFO | Step 3602: loss=0.2716, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T23:17:41.913948+0800 | INFO | Step 3603: loss=0.2460, acc=0.861 (IF=0.846, MQ=0.875)
2025-12-21T23:17:43.349001+0800 | INFO | Step 3604: loss=0.0138, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:17:44.782975+0800 | INFO | Step 3605: loss=0.1054, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:17:46.214520+0800 | INFO | Step 3606: loss=0.1899, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:17:48.907561+0800 | INFO | Step 3607: loss=0.0157, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:17:50.360853+0800 | INFO | Step 3608: loss=0.0658, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:17:51.822807+0800 | INFO | Step 3609: loss=0.0257, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:17:53.269222+0800 | INFO | Step 3610: loss=0.0228, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:17:54.715843+0800 | INFO | Step 3611: loss=0.0443, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T23:17:56.167063+0800 | INFO | Step 3612: loss=0.0051, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:17:57.717898+0800 | INFO | Step 3613: loss=0.0098, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:17:59.293097+0800 | INFO | Step 3614: loss=0.0732, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:18:00.855684+0800 | INFO | Step 3615: loss=0.0240, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:02.396583+0800 | INFO | Step 3616: loss=0.2482, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T23:18:03.954314+0800 | INFO | Step 3617: loss=0.0611, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:18:05.415401+0800 | INFO | Step 3618: loss=0.0829, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:18:06.848788+0800 | INFO | Step 3619: loss=0.0238, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:08.285165+0800 | INFO | Step 3620: loss=0.0229, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:09.330132+0800 | INFO | Step 3621: loss=0.0184, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:10.783311+0800 | INFO | Step 3622: loss=0.0004, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:12.289024+0800 | INFO | Step 3623: loss=0.2926, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T23:18:13.740646+0800 | INFO | Step 3624: loss=0.0135, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:15.219879+0800 | INFO | Step 3625: loss=0.1294, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T23:18:16.683283+0800 | INFO | Step 3626: loss=0.0571, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:18:18.114312+0800 | INFO | Step 3627: loss=0.0489, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:19.543643+0800 | INFO | Step 3628: loss=0.0037, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:20.978978+0800 | INFO | Step 3629: loss=0.1319, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:18:22.414980+0800 | INFO | Step 3630: loss=0.0981, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T23:18:23.853379+0800 | INFO | Step 3631: loss=0.0167, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:25.356218+0800 | INFO | Step 3632: loss=0.2771, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:18:26.826504+0800 | INFO | Step 3633: loss=0.0112, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:28.258264+0800 | INFO | Step 3634: loss=0.0914, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:18:29.708543+0800 | INFO | Step 3635: loss=0.1295, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:18:31.143187+0800 | INFO | Step 3636: loss=0.0623, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:18:32.577416+0800 | INFO | Step 3637: loss=0.1307, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:18:34.002884+0800 | INFO | Step 3638: loss=0.0810, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:18:35.432423+0800 | INFO | Step 3639: loss=0.0517, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:36.867458+0800 | INFO | Step 3640: loss=0.0082, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:37.900934+0800 | INFO | Step 3641: loss=0.4978, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T23:18:39.339330+0800 | INFO | Step 3642: loss=0.0215, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:40.761475+0800 | INFO | Step 3643: loss=0.3170, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T23:18:42.202305+0800 | INFO | Step 3644: loss=0.0829, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:18:43.257130+0800 | INFO | Step 3645: loss=0.1877, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:18:44.704163+0800 | INFO | Step 3646: loss=0.1892, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:18:46.193055+0800 | INFO | Step 3647: loss=0.0190, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:47.217435+0800 | INFO | Step 3648: loss=0.2616, acc=0.856 (IF=0.900, MQ=0.812)
2025-12-21T23:18:48.240747+0800 | INFO | Step 3649: loss=0.0439, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:49.686615+0800 | INFO | Step 3650: loss=0.0090, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:51.119654+0800 | INFO | Step 3651: loss=0.0389, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:52.563388+0800 | INFO | Step 3652: loss=0.0367, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:53.993373+0800 | INFO | Step 3653: loss=0.0236, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:55.423803+0800 | INFO | Step 3654: loss=0.3159, acc=0.869 (IF=0.800, MQ=0.938)
2025-12-21T23:18:56.872744+0800 | INFO | Step 3655: loss=0.1010, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:18:58.301150+0800 | INFO | Step 3656: loss=0.0282, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:18:59.730920+0800 | INFO | Step 3657: loss=0.3484, acc=0.892 (IF=0.846, MQ=0.938)
2025-12-21T23:19:01.170058+0800 | INFO | Step 3658: loss=0.0319, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:19:02.606443+0800 | INFO | Step 3659: loss=0.2057, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:19:03.653500+0800 | INFO | Step 3660: loss=0.1860, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T23:19:05.137114+0800 | INFO | Step 3661: loss=0.2076, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:19:06.573121+0800 | INFO | Step 3662: loss=0.0021, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:19:09.556791+0800 | INFO | Step 3663: loss=0.1590, acc=0.929 (IF=0.857, MQ=1.000)
2025-12-21T23:19:10.990989+0800 | INFO | Step 3664: loss=0.1161, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T23:19:12.423599+0800 | INFO | Step 3665: loss=0.0060, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:19:13.869563+0800 | INFO | Step 3666: loss=0.0970, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:19:15.324737+0800 | INFO | Step 3667: loss=0.0071, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:19:16.767926+0800 | INFO | Step 3668: loss=0.0526, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:19:18.230558+0800 | INFO | Step 3669: loss=0.4187, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T23:19:19.662468+0800 | INFO | Step 3670: loss=0.0248, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:19:21.095440+0800 | INFO | Step 3671: loss=0.0049, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:19:22.535240+0800 | INFO | Step 3672: loss=0.1033, acc=0.917 (IF=0.900, MQ=0.933)
2025-12-21T23:19:23.971080+0800 | INFO | Step 3673: loss=0.0404, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:19:25.451370+0800 | INFO | Step 3674: loss=0.0324, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:19:26.881277+0800 | INFO | Step 3675: loss=0.0159, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:19:28.333771+0800 | INFO | Step 3676: loss=0.0517, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:19:29.778517+0800 | INFO | Step 3677: loss=0.0214, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:19:31.233267+0800 | INFO | Step 3678: loss=0.1128, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:19:32.679450+0800 | INFO | Step 3679: loss=0.1334, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:19:34.122026+0800 | INFO | Step 3680: loss=0.1856, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:19:35.570024+0800 | INFO | Step 3681: loss=0.0599, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:19:37.015647+0800 | INFO | Step 3682: loss=0.0823, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:19:38.458435+0800 | INFO | Step 3683: loss=0.0337, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T23:19:39.901271+0800 | INFO | Step 3684: loss=0.2191, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T23:19:41.346795+0800 | INFO | Step 3685: loss=0.1505, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:19:42.787442+0800 | INFO | Step 3686: loss=0.1741, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T23:19:44.220278+0800 | INFO | Step 3687: loss=0.0690, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:19:45.663868+0800 | INFO | Step 3688: loss=0.0419, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:19:47.112268+0800 | INFO | Step 3689: loss=0.0313, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:19:48.561026+0800 | INFO | Step 3690: loss=0.0090, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:19:50.008155+0800 | INFO | Step 3691: loss=0.3046, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T23:19:51.462921+0800 | INFO | Step 3692: loss=0.0543, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:19:52.910689+0800 | INFO | Step 3693: loss=0.1046, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:19:53.947292+0800 | INFO | Step 3694: loss=0.0056, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:19:55.397866+0800 | INFO | Step 3695: loss=0.0886, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:19:56.840407+0800 | INFO | Step 3696: loss=0.1665, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:19:58.266229+0800 | INFO | Step 3697: loss=0.0552, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:19:59.715944+0800 | INFO | Step 3698: loss=0.1304, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T23:20:01.168605+0800 | INFO | Step 3699: loss=0.0845, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:20:02.644568+0800 | INFO | Step 3700: loss=0.0052, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:20:10.045729+0800 | INFO |
============================================================
Validation Results (took 7.38s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6034
Quality Acc: 0.6750
Average Acc: 0.6392
Total Loss: 0.6271
Instruction Loss: 0.6443
Quality Loss: 0.6098
============================================================
2025-12-21T23:20:12.635534+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_3700.pt
2025-12-21T23:20:12.636061+0800 | INFO | Best 3 checkpoints:
2025-12-21T23:20:12.636168+0800 | INFO | 1. Step 2800: acc=0.6541 (reward_model.best_2800.pt)
2025-12-21T23:20:12.636229+0800 | INFO | 2. Step 3500: acc=0.6526 (reward_model.best_3500.pt)
2025-12-21T23:20:12.636279+0800 | INFO | 3. Step 2600: acc=0.6478 (reward_model.best_2600.pt)
2025-12-21T23:20:14.123982+0800 | INFO | Step 3701: loss=0.0258, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:20:15.604194+0800 | INFO | Step 3702: loss=0.0722, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:20:17.050761+0800 | INFO | Step 3703: loss=0.0036, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:20:18.504169+0800 | INFO | Step 3704: loss=0.0165, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:20:19.993626+0800 | INFO | Step 3705: loss=0.2928, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T23:20:21.028829+0800 | INFO | Step 3706: loss=0.0355, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:20:22.468327+0800 | INFO | Step 3707: loss=0.1912, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:20:23.907152+0800 | INFO | Step 3708: loss=0.0176, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:20:25.345409+0800 | INFO | Step 3709: loss=0.1110, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:20:26.772402+0800 | INFO | Step 3710: loss=0.2314, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:20:28.227030+0800 | INFO | Step 3711: loss=0.1819, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:20:29.683816+0800 | INFO | Step 3712: loss=0.8112, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T23:20:31.126953+0800 | INFO | Step 3713: loss=0.0123, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:20:32.161767+0800 | INFO | Step 3714: loss=0.0714, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:20:33.602332+0800 | INFO | Step 3715: loss=0.0460, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:20:35.049766+0800 | INFO | Step 3716: loss=0.1392, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:20:36.493340+0800 | INFO | Step 3717: loss=0.0912, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:20:38.705712+0800 | INFO | Step 3718: loss=0.0135, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:20:40.421398+0800 | INFO | Step 3719: loss=0.0086, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:20:41.916420+0800 | INFO | Step 3720: loss=0.0176, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:20:43.380424+0800 | INFO | Step 3721: loss=0.0304, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:20:44.853244+0800 | INFO | Step 3722: loss=0.0042, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:20:46.277208+0800 | INFO | Step 3723: loss=0.0116, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:20:47.710121+0800 | INFO | Step 3724: loss=0.1242, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:20:48.750418+0800 | INFO | Step 3725: loss=0.0027, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:20:50.205644+0800 | INFO | Step 3726: loss=0.1712, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:20:51.638126+0800 | INFO | Step 3727: loss=0.0334, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:20:53.079066+0800 | INFO | Step 3728: loss=0.0262, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:20:54.525805+0800 | INFO | Step 3729: loss=0.2819, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:20:56.007983+0800 | INFO | Step 3730: loss=0.0676, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:20:57.513728+0800 | INFO | Step 3731: loss=0.1633, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:20:58.958611+0800 | INFO | Step 3732: loss=0.1045, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:21:00.405241+0800 | INFO | Step 3733: loss=0.0924, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:21:01.833171+0800 | INFO | Step 3734: loss=0.1369, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:21:03.257664+0800 | INFO | Step 3735: loss=0.1940, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:21:04.699167+0800 | INFO | Step 3736: loss=0.0828, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:21:06.172299+0800 | INFO | Step 3737: loss=0.0407, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:21:07.606774+0800 | INFO | Step 3738: loss=0.1783, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:21:09.043826+0800 | INFO | Step 3739: loss=0.1170, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:21:10.482414+0800 | INFO | Step 3740: loss=0.0263, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:21:11.955175+0800 | INFO | Step 3741: loss=0.0199, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:21:13.397701+0800 | INFO | Step 3742: loss=0.0964, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:21:14.848100+0800 | INFO | Step 3743: loss=0.0620, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:21:16.359904+0800 | INFO | Step 3744: loss=0.1102, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T23:21:17.826656+0800 | INFO | Step 3745: loss=0.0653, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:21:19.263542+0800 | INFO | Step 3746: loss=0.0290, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:21:20.690486+0800 | INFO | Step 3747: loss=0.0217, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:21:22.130693+0800 | INFO | Step 3748: loss=0.0895, acc=0.967 (IF=0.933, MQ=1.000)
2025-12-21T23:21:23.588096+0800 | INFO | Step 3749: loss=0.0298, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:21:25.031543+0800 | INFO | Step 3750: loss=0.0688, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:21:26.479734+0800 | INFO | Step 3751: loss=0.0827, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:21:27.956434+0800 | INFO | Step 3752: loss=0.0602, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:21:29.432520+0800 | INFO | Step 3753: loss=0.0573, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:21:30.876559+0800 | INFO | Step 3754: loss=0.1483, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:21:32.303989+0800 | INFO | Step 3755: loss=0.0522, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:21:33.753671+0800 | INFO | Step 3756: loss=0.0400, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:21:35.194924+0800 | INFO | Step 3757: loss=0.0493, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:21:36.640454+0800 | INFO | Step 3758: loss=0.4360, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T23:21:38.080117+0800 | INFO | Step 3759: loss=0.0116, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:21:39.516683+0800 | INFO | Step 3760: loss=0.0079, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:21:40.988115+0800 | INFO | Step 3761: loss=0.0450, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:21:42.442573+0800 | INFO | Step 3762: loss=0.0236, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:21:43.884755+0800 | INFO | Step 3763: loss=0.0151, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:21:45.313625+0800 | INFO | Step 3764: loss=0.3800, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T23:21:46.753933+0800 | INFO | Step 3765: loss=0.0545, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:21:48.190406+0800 | INFO | Step 3766: loss=0.0028, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:21:49.629371+0800 | INFO | Step 3767: loss=0.0262, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:21:51.079171+0800 | INFO | Step 3768: loss=0.3749, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T23:21:52.521748+0800 | INFO | Step 3769: loss=0.0340, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:21:53.966386+0800 | INFO | Step 3770: loss=0.0955, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:21:55.405746+0800 | INFO | Step 3771: loss=0.2154, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:21:56.836336+0800 | INFO | Step 3772: loss=0.2944, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:21:58.259029+0800 | INFO | Step 3773: loss=0.1695, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:22:01.125793+0800 | INFO | Step 3774: loss=0.0604, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:22:02.591097+0800 | INFO | Step 3775: loss=0.0792, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:22:04.046275+0800 | INFO | Step 3776: loss=0.0112, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:22:05.497240+0800 | INFO | Step 3777: loss=0.0468, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:22:06.938311+0800 | INFO | Step 3778: loss=0.1076, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T23:22:08.399626+0800 | INFO | Step 3779: loss=0.0154, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:22:09.854216+0800 | INFO | Step 3780: loss=0.0084, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:22:11.357291+0800 | INFO | Step 3781: loss=0.1458, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:22:12.422010+0800 | INFO | Step 3782: loss=0.0104, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:22:13.866707+0800 | INFO | Step 3783: loss=0.0725, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:22:15.317900+0800 | INFO | Step 3784: loss=0.1565, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:22:16.807780+0800 | INFO | Step 3785: loss=0.0325, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:22:18.265672+0800 | INFO | Step 3786: loss=0.2256, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:22:19.724276+0800 | INFO | Step 3787: loss=0.0178, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:22:20.809251+0800 | INFO | Step 3788: loss=0.1072, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:22:22.284183+0800 | INFO | Step 3789: loss=0.0239, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:22:23.714846+0800 | INFO | Step 3790: loss=0.0405, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:22:25.159062+0800 | INFO | Step 3791: loss=0.0263, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:22:26.605951+0800 | INFO | Step 3792: loss=0.0259, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:22:28.063713+0800 | INFO | Step 3793: loss=0.0481, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T23:22:29.505948+0800 | INFO | Step 3794: loss=0.0612, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:22:30.963928+0800 | INFO | Step 3795: loss=0.0063, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:22:32.411955+0800 | INFO | Step 3796: loss=0.0377, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:22:33.867428+0800 | INFO | Step 3797: loss=0.0108, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:22:35.318738+0800 | INFO | Step 3798: loss=0.1340, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:22:36.750212+0800 | INFO | Step 3799: loss=0.0301, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:22:38.198675+0800 | INFO | Step 3800: loss=0.0684, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:22:45.612259+0800 | INFO |
============================================================
Validation Results (took 7.39s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6034
Quality Acc: 0.6875
Average Acc: 0.6455
Total Loss: 0.6237
Instruction Loss: 0.6427
Quality Loss: 0.6048
============================================================
2025-12-21T23:22:48.169072+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_3800.pt
2025-12-21T23:22:48.169595+0800 | INFO | Best 3 checkpoints:
2025-12-21T23:22:48.169695+0800 | INFO | 1. Step 2800: acc=0.6541 (reward_model.best_2800.pt)
2025-12-21T23:22:48.169761+0800 | INFO | 2. Step 3500: acc=0.6526 (reward_model.best_3500.pt)
2025-12-21T23:22:48.169816+0800 | INFO | 3. Step 2600: acc=0.6478 (reward_model.best_2600.pt)
2025-12-21T23:22:49.661960+0800 | INFO | Step 3801: loss=0.0537, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:22:51.099494+0800 | INFO | Step 3802: loss=0.1756, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T23:22:52.544335+0800 | INFO | Step 3803: loss=0.1545, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:22:53.980268+0800 | INFO | Step 3804: loss=0.1418, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:22:55.441868+0800 | INFO | Step 3805: loss=0.0619, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:22:56.885877+0800 | INFO | Step 3806: loss=0.0081, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:22:58.324140+0800 | INFO | Step 3807: loss=0.1161, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:22:59.797360+0800 | INFO | Step 3808: loss=0.3201, acc=0.911 (IF=0.889, MQ=0.933)
2025-12-21T23:23:01.228624+0800 | INFO | Step 3809: loss=0.1147, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:23:02.660558+0800 | INFO | Step 3810: loss=0.0574, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:23:04.101975+0800 | INFO | Step 3811: loss=0.0511, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:23:05.588230+0800 | INFO | Step 3812: loss=0.1360, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:23:07.039104+0800 | INFO | Step 3813: loss=0.1403, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T23:23:08.092429+0800 | INFO | Step 3814: loss=0.0452, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:23:09.540792+0800 | INFO | Step 3815: loss=0.0331, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:23:11.026324+0800 | INFO | Step 3816: loss=0.0944, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:23:12.497554+0800 | INFO | Step 3817: loss=0.0519, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:23:13.940802+0800 | INFO | Step 3818: loss=0.0155, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:23:15.440797+0800 | INFO | Step 3819: loss=0.1649, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T23:23:16.915548+0800 | INFO | Step 3820: loss=0.0223, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:23:18.375111+0800 | INFO | Step 3821: loss=0.0747, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:23:19.808365+0800 | INFO | Step 3822: loss=0.1617, acc=0.917 (IF=0.833, MQ=1.000)
2025-12-21T23:23:21.265046+0800 | INFO | Step 3823: loss=0.1728, acc=0.866 (IF=0.857, MQ=0.875)
2025-12-21T23:23:22.731860+0800 | INFO | Step 3824: loss=0.1630, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:23:24.174895+0800 | INFO | Step 3825: loss=0.0125, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:23:25.615835+0800 | INFO | Step 3826: loss=0.0435, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:23:27.066823+0800 | INFO | Step 3827: loss=0.1095, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:23:28.534944+0800 | INFO | Step 3828: loss=0.3304, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T23:23:31.103136+0800 | INFO | Step 3829: loss=0.0524, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:23:32.585320+0800 | INFO | Step 3830: loss=0.1138, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:23:34.117693+0800 | INFO | Step 3831: loss=0.2755, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:23:35.597687+0800 | INFO | Step 3832: loss=0.0590, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:23:37.037719+0800 | INFO | Step 3833: loss=0.0864, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:23:38.474355+0800 | INFO | Step 3834: loss=0.5957, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T23:23:39.951486+0800 | INFO | Step 3835: loss=0.0135, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:23:41.385413+0800 | INFO | Step 3836: loss=0.0821, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:23:42.801076+0800 | INFO | Step 3837: loss=0.0206, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:23:43.848554+0800 | INFO | Step 3838: loss=0.0591, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:23:45.283634+0800 | INFO | Step 3839: loss=0.0332, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:23:46.752368+0800 | INFO | Step 3840: loss=0.0210, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:23:48.211498+0800 | INFO | Step 3841: loss=0.0157, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:23:49.661281+0800 | INFO | Step 3842: loss=0.0302, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:23:51.108272+0800 | INFO | Step 3843: loss=0.0436, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:23:52.557445+0800 | INFO | Step 3844: loss=0.1027, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:23:53.984528+0800 | INFO | Step 3845: loss=0.0459, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:23:55.423595+0800 | INFO | Step 3846: loss=0.0850, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:23:56.863707+0800 | INFO | Step 3847: loss=0.1764, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:23:58.321825+0800 | INFO | Step 3848: loss=0.0038, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:23:59.758396+0800 | INFO | Step 3849: loss=0.0230, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:24:01.199615+0800 | INFO | Step 3850: loss=0.0047, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:24:02.668702+0800 | INFO | Step 3851: loss=0.1655, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:24:04.115201+0800 | INFO | Step 3852: loss=0.1433, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T23:24:05.575613+0800 | INFO | Step 3853: loss=0.0302, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:24:07.018954+0800 | INFO | Step 3854: loss=0.0802, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:24:08.469281+0800 | INFO | Step 3855: loss=0.6752, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T23:24:09.910396+0800 | INFO | Step 3856: loss=0.1514, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:24:11.347689+0800 | INFO | Step 3857: loss=0.0619, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:24:12.791706+0800 | INFO | Step 3858: loss=0.1098, acc=0.923 (IF=0.846, MQ=1.000)
2025-12-21T23:24:14.235413+0800 | INFO | Step 3859: loss=0.0623, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:24:15.684160+0800 | INFO | Step 3860: loss=0.1113, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:24:17.169699+0800 | INFO | Step 3861: loss=0.0179, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:24:18.635025+0800 | INFO | Step 3862: loss=0.0093, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:24:19.655481+0800 | INFO | Step 3863: loss=0.1014, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:24:21.091732+0800 | INFO | Step 3864: loss=0.3993, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:24:22.535095+0800 | INFO | Step 3865: loss=0.0466, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:24:23.976798+0800 | INFO | Step 3866: loss=0.0479, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:24:25.450555+0800 | INFO | Step 3867: loss=0.3065, acc=0.866 (IF=0.857, MQ=0.875)
2025-12-21T23:24:26.891310+0800 | INFO | Step 3868: loss=0.0079, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:24:28.339435+0800 | INFO | Step 3869: loss=0.1653, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:24:29.798969+0800 | INFO | Step 3870: loss=0.1018, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:24:31.245538+0800 | INFO | Step 3871: loss=0.0308, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:24:32.695943+0800 | INFO | Step 3872: loss=0.1013, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:24:34.145186+0800 | INFO | Step 3873: loss=0.0511, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:24:35.587808+0800 | INFO | Step 3874: loss=0.0401, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:24:37.028856+0800 | INFO | Step 3875: loss=0.2135, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T23:24:38.490342+0800 | INFO | Step 3876: loss=0.1128, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:24:39.923224+0800 | INFO | Step 3877: loss=0.0578, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:24:41.353392+0800 | INFO | Step 3878: loss=0.0312, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:24:42.795192+0800 | INFO | Step 3879: loss=0.3481, acc=0.844 (IF=0.750, MQ=0.938)
2025-12-21T23:24:44.278151+0800 | INFO | Step 3880: loss=0.0051, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:24:45.730714+0800 | INFO | Step 3881: loss=0.0093, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:24:47.179065+0800 | INFO | Step 3882: loss=0.0910, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:24:48.624414+0800 | INFO | Step 3883: loss=0.0403, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:24:50.048280+0800 | INFO | Step 3884: loss=0.0577, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:24:53.180091+0800 | INFO | Step 3885: loss=0.0454, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:24:54.626087+0800 | INFO | Step 3886: loss=0.1132, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:24:56.055576+0800 | INFO | Step 3887: loss=0.0215, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:24:57.532333+0800 | INFO | Step 3888: loss=0.1250, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:24:58.951941+0800 | INFO | Step 3889: loss=0.0422, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:25:00.378725+0800 | INFO | Step 3890: loss=0.0061, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:25:01.806830+0800 | INFO | Step 3891: loss=0.0242, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:25:03.236715+0800 | INFO | Step 3892: loss=0.1254, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T23:25:04.668528+0800 | INFO | Step 3893: loss=0.1509, acc=0.933 (IF=1.000, MQ=0.867)
2025-12-21T23:25:06.091919+0800 | INFO | Step 3894: loss=0.0061, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:25:07.547823+0800 | INFO | Step 3895: loss=0.0548, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:25:09.018812+0800 | INFO | Step 3896: loss=0.0216, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:25:10.471444+0800 | INFO | Step 3897: loss=0.0271, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:25:11.931410+0800 | INFO | Step 3898: loss=0.0045, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:25:13.355263+0800 | INFO | Step 3899: loss=0.0435, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:25:14.851352+0800 | INFO | Step 3900: loss=0.0212, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:25:22.482932+0800 | INFO |
============================================================
Validation Results (took 7.61s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6207
Quality Acc: 0.6750
Average Acc: 0.6478
Total Loss: 0.6235
Instruction Loss: 0.6394
Quality Loss: 0.6076
============================================================
2025-12-21T23:25:25.285323+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_3900.pt
2025-12-21T23:25:25.285805+0800 | INFO | Best 3 checkpoints:
2025-12-21T23:25:25.285894+0800 | INFO | 1. Step 2800: acc=0.6541 (reward_model.best_2800.pt)
2025-12-21T23:25:25.285945+0800 | INFO | 2. Step 3500: acc=0.6526 (reward_model.best_3500.pt)
2025-12-21T23:25:25.285990+0800 | INFO | 3. Step 2600: acc=0.6478 (reward_model.best_2600.pt)
2025-12-21T23:25:26.781147+0800 | INFO | Step 3901: loss=0.0062, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:25:28.275000+0800 | INFO | Step 3902: loss=0.0904, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:25:29.699323+0800 | INFO | Step 3903: loss=0.0011, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:25:30.751042+0800 | INFO | Step 3904: loss=0.0389, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:25:32.186182+0800 | INFO | Step 3905: loss=0.0395, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:25:33.621800+0800 | INFO | Step 3906: loss=0.0624, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:25:35.075913+0800 | INFO | Step 3907: loss=0.3970, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T23:25:36.513649+0800 | INFO | Step 3908: loss=0.0682, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:25:37.955390+0800 | INFO | Step 3909: loss=0.0692, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:25:39.426375+0800 | INFO | Step 3910: loss=0.2383, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:25:40.450990+0800 | INFO | Step 3911: loss=0.1578, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T23:25:41.516831+0800 | INFO | Step 3912: loss=0.1454, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T23:25:42.943643+0800 | INFO | Step 3913: loss=0.0217, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:25:44.379537+0800 | INFO | Step 3914: loss=0.1066, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:25:45.427427+0800 | INFO | Step 3915: loss=0.0886, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:25:46.883652+0800 | INFO | Step 3916: loss=0.1154, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:25:48.304170+0800 | INFO | Step 3917: loss=0.0030, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:25:49.740471+0800 | INFO | Step 3918: loss=0.2093, acc=0.866 (IF=0.857, MQ=0.875)
2025-12-21T23:25:51.177322+0800 | INFO | Step 3919: loss=0.3547, acc=0.902 (IF=0.929, MQ=0.875)
2025-12-21T23:25:52.615730+0800 | INFO | Step 3920: loss=0.0427, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:25:54.054645+0800 | INFO | Step 3921: loss=0.0332, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:25:55.486447+0800 | INFO | Step 3922: loss=0.0385, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:25:56.927089+0800 | INFO | Step 3923: loss=0.2269, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:25:58.386712+0800 | INFO | Step 3924: loss=0.0079, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:25:59.838912+0800 | INFO | Step 3925: loss=0.0491, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:26:01.292029+0800 | INFO | Step 3926: loss=0.0229, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:26:02.756925+0800 | INFO | Step 3927: loss=0.0071, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:26:04.204050+0800 | INFO | Step 3928: loss=0.0517, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:26:05.653658+0800 | INFO | Step 3929: loss=0.0814, acc=0.917 (IF=0.833, MQ=1.000)
2025-12-21T23:26:07.101219+0800 | INFO | Step 3930: loss=0.0986, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:26:08.558932+0800 | INFO | Step 3931: loss=0.3689, acc=0.838 (IF=0.800, MQ=0.875)
2025-12-21T23:26:10.006603+0800 | INFO | Step 3932: loss=0.0398, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:26:11.449577+0800 | INFO | Step 3933: loss=0.1155, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:26:12.901560+0800 | INFO | Step 3934: loss=0.0565, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T23:26:14.344236+0800 | INFO | Step 3935: loss=0.0315, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:26:15.785346+0800 | INFO | Step 3936: loss=0.0199, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:26:17.234216+0800 | INFO | Step 3937: loss=0.1324, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:26:18.682746+0800 | INFO | Step 3938: loss=0.1871, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T23:26:20.182329+0800 | INFO | Step 3939: loss=0.4976, acc=0.875 (IF=0.750, MQ=1.000)
2025-12-21T23:26:23.092067+0800 | INFO | Step 3940: loss=0.2458, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:26:24.596936+0800 | INFO | Step 3941: loss=0.0107, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:26:26.200442+0800 | INFO | Step 3942: loss=0.0430, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:26:27.690581+0800 | INFO | Step 3943: loss=0.1105, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:26:29.149560+0800 | INFO | Step 3944: loss=0.0341, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:26:30.587572+0800 | INFO | Step 3945: loss=0.0361, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:26:32.061695+0800 | INFO | Step 3946: loss=0.1926, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:26:33.517226+0800 | INFO | Step 3947: loss=0.0638, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:26:34.982366+0800 | INFO | Step 3948: loss=0.2036, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T23:26:36.449826+0800 | INFO | Step 3949: loss=0.0369, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:26:37.897273+0800 | INFO | Step 3950: loss=0.0081, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:26:39.349885+0800 | INFO | Step 3951: loss=0.1274, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:26:40.797010+0800 | INFO | Step 3952: loss=0.1369, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T23:26:42.229028+0800 | INFO | Step 3953: loss=0.3071, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T23:26:43.684112+0800 | INFO | Step 3954: loss=0.0791, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:26:45.124044+0800 | INFO | Step 3955: loss=0.0029, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:26:46.567094+0800 | INFO | Step 3956: loss=0.0203, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:26:48.013768+0800 | INFO | Step 3957: loss=0.0019, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:26:49.466581+0800 | INFO | Step 3958: loss=0.0149, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:26:50.919105+0800 | INFO | Step 3959: loss=0.0913, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:26:52.390304+0800 | INFO | Step 3960: loss=0.0059, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:26:53.847212+0800 | INFO | Step 3961: loss=0.1449, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:26:55.294127+0800 | INFO | Step 3962: loss=0.0164, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:26:56.751702+0800 | INFO | Step 3963: loss=0.1261, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:26:58.234852+0800 | INFO | Step 3964: loss=0.0168, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:26:59.664837+0800 | INFO | Step 3965: loss=0.0805, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:27:01.103790+0800 | INFO | Step 3966: loss=0.0182, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:27:02.148823+0800 | INFO | Step 3967: loss=0.0065, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:27:03.589035+0800 | INFO | Step 3968: loss=0.0751, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:27:05.046180+0800 | INFO | Step 3969: loss=0.1033, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T23:27:06.495561+0800 | INFO | Step 3970: loss=0.0543, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T23:27:07.562387+0800 | INFO | Step 3971: loss=0.0450, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:27:08.984073+0800 | INFO | Step 3972: loss=0.0654, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:27:10.410217+0800 | INFO | Step 3973: loss=0.0125, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:27:11.863486+0800 | INFO | Step 3974: loss=0.0916, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:27:13.305453+0800 | INFO | Step 3975: loss=0.0072, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:27:14.757920+0800 | INFO | Step 3976: loss=0.0371, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:27:16.188882+0800 | INFO | Step 3977: loss=0.1861, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T23:27:17.622865+0800 | INFO | Step 3978: loss=0.0433, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:27:19.066681+0800 | INFO | Step 3979: loss=0.0668, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:27:20.579183+0800 | INFO | Step 3980: loss=0.0201, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:27:22.023658+0800 | INFO | Step 3981: loss=0.0769, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T23:27:23.458065+0800 | INFO | Step 3982: loss=0.1142, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:27:24.890093+0800 | INFO | Step 3983: loss=0.0080, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:27:26.319168+0800 | INFO | Step 3984: loss=0.0474, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:27:27.758629+0800 | INFO | Step 3985: loss=0.0223, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:27:29.204721+0800 | INFO | Step 3986: loss=0.0917, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T23:27:30.612145+0800 | INFO | Step 3987: loss=0.0139, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:27:32.043223+0800 | INFO | Step 3988: loss=0.0065, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:27:33.455078+0800 | INFO | Step 3989: loss=0.0755, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:27:34.878230+0800 | INFO | Step 3990: loss=0.0147, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:27:36.314552+0800 | INFO | Step 3991: loss=0.1095, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T23:27:37.760269+0800 | INFO | Step 3992: loss=0.0006, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:27:39.200003+0800 | INFO | Step 3993: loss=0.2133, acc=0.861 (IF=0.846, MQ=0.875)
2025-12-21T23:27:40.647201+0800 | INFO | Step 3994: loss=0.0054, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:27:42.086029+0800 | INFO | Step 3995: loss=0.0316, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:27:44.605656+0800 | INFO | Step 3996: loss=0.0642, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:27:46.112377+0800 | INFO | Step 3997: loss=0.0075, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:27:47.608133+0800 | INFO | Step 3998: loss=0.1452, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:27:49.060017+0800 | INFO | Step 3999: loss=0.1517, acc=0.906 (IF=0.875, MQ=0.938)
2025-12-21T23:27:50.512987+0800 | INFO | Step 4000: loss=0.0336, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:27:58.130805+0800 | INFO |
============================================================
Validation Results (took 7.59s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6207
Quality Acc: 0.6500
Average Acc: 0.6353
Total Loss: 0.6207
Instruction Loss: 0.6356
Quality Loss: 0.6057
============================================================
2025-12-21T23:28:00.749734+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_4000.pt
2025-12-21T23:28:00.750229+0800 | INFO | Best 3 checkpoints:
2025-12-21T23:28:00.750328+0800 | INFO | 1. Step 2800: acc=0.6541 (reward_model.best_2800.pt)
2025-12-21T23:28:00.750385+0800 | INFO | 2. Step 3500: acc=0.6526 (reward_model.best_3500.pt)
2025-12-21T23:28:00.750432+0800 | INFO | 3. Step 2600: acc=0.6478 (reward_model.best_2600.pt)
2025-12-21T23:28:03.057137+0800 | INFO | Step 4000: Saved to /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.4000.pt
2025-12-21T23:28:04.133245+0800 | INFO | Step 4001: loss=0.0232, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:28:05.567673+0800 | INFO | Step 4002: loss=0.1789, acc=0.899 (IF=0.923, MQ=0.875)
2025-12-21T23:28:07.014498+0800 | INFO | Step 4003: loss=0.1101, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:28:08.456998+0800 | INFO | Step 4004: loss=0.1239, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:28:09.897989+0800 | INFO | Step 4005: loss=0.0643, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:28:11.371421+0800 | INFO | Step 4006: loss=0.1140, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T23:28:12.392768+0800 | INFO | Step 4007: loss=0.0048, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:28:13.829887+0800 | INFO | Step 4008: loss=0.0309, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:28:15.259694+0800 | INFO | Step 4009: loss=0.1946, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:28:16.730699+0800 | INFO | Step 4010: loss=0.0036, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:28:18.156423+0800 | INFO | Step 4011: loss=0.0354, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:28:19.599330+0800 | INFO | Step 4012: loss=0.2063, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T23:28:21.034360+0800 | INFO | Step 4013: loss=0.0872, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:28:22.464860+0800 | INFO | Step 4014: loss=0.0272, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:28:23.896130+0800 | INFO | Step 4015: loss=0.1193, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:28:25.335150+0800 | INFO | Step 4016: loss=0.1159, acc=0.897 (IF=0.857, MQ=0.938)
2025-12-21T23:28:26.812020+0800 | INFO | Step 4017: loss=0.0443, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:28:28.236793+0800 | INFO | Step 4018: loss=0.0369, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:28:29.668061+0800 | INFO | Step 4019: loss=0.1520, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T23:28:31.112448+0800 | INFO | Step 4020: loss=0.0098, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:28:32.545775+0800 | INFO | Step 4021: loss=0.3445, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:28:33.990759+0800 | INFO | Step 4022: loss=0.0764, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:28:35.437861+0800 | INFO | Step 4023: loss=0.0378, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:28:36.894379+0800 | INFO | Step 4024: loss=0.2003, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:28:38.335077+0800 | INFO | Step 4025: loss=0.1219, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T23:28:39.796295+0800 | INFO | Step 4026: loss=0.0118, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:28:41.249729+0800 | INFO | Step 4027: loss=0.1360, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:28:42.705755+0800 | INFO | Step 4028: loss=0.0402, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:28:44.167680+0800 | INFO | Step 4029: loss=0.0122, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:28:45.616337+0800 | INFO | Step 4030: loss=0.0855, acc=0.933 (IF=0.929, MQ=0.938)
2025-12-21T23:28:47.069532+0800 | INFO | Step 4031: loss=0.3019, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:28:48.522009+0800 | INFO | Step 4032: loss=0.0569, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:28:49.977453+0800 | INFO | Step 4033: loss=0.0723, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:28:51.452436+0800 | INFO | Step 4034: loss=0.0177, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:28:52.912200+0800 | INFO | Step 4035: loss=0.0181, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:28:54.366517+0800 | INFO | Step 4036: loss=0.0479, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:28:55.813885+0800 | INFO | Step 4037: loss=0.0034, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:28:57.302241+0800 | INFO | Step 4038: loss=0.0445, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:28:58.757053+0800 | INFO | Step 4039: loss=0.0249, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:29:00.202828+0800 | INFO | Step 4040: loss=0.0187, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:29:01.243184+0800 | INFO | Step 4041: loss=0.0284, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:29:02.713838+0800 | INFO | Step 4042: loss=0.0137, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:29:04.171882+0800 | INFO | Step 4043: loss=0.0126, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:29:05.626395+0800 | INFO | Step 4044: loss=0.1068, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:29:06.660400+0800 | INFO | Step 4045: loss=0.0318, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:29:08.129792+0800 | INFO | Step 4046: loss=0.0856, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:29:09.591676+0800 | INFO | Step 4047: loss=0.2630, acc=0.909 (IF=0.818, MQ=1.000)
2025-12-21T23:29:11.044386+0800 | INFO | Step 4048: loss=0.0543, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:29:12.486688+0800 | INFO | Step 4049: loss=0.0938, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:29:13.930909+0800 | INFO | Step 4050: loss=0.1906, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T23:29:16.825787+0800 | INFO | Step 4051: loss=0.0581, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:29:18.314024+0800 | INFO | Step 4052: loss=0.0818, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:29:19.792808+0800 | INFO | Step 4053: loss=0.0674, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:29:21.243468+0800 | INFO | Step 4054: loss=0.0086, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:29:22.697656+0800 | INFO | Step 4055: loss=0.2819, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:29:24.152676+0800 | INFO | Step 4056: loss=0.0801, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:29:25.595689+0800 | INFO | Step 4057: loss=0.0262, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:29:26.646535+0800 | INFO | Step 4058: loss=0.0508, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:29:27.707899+0800 | INFO | Step 4059: loss=0.0051, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:29:29.135598+0800 | INFO | Step 4060: loss=0.0159, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:29:30.576892+0800 | INFO | Step 4061: loss=0.0837, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:29:32.018309+0800 | INFO | Step 4062: loss=0.1633, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:29:33.079779+0800 | INFO | Step 4063: loss=0.0421, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:29:34.530894+0800 | INFO | Step 4064: loss=0.0360, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:29:35.983887+0800 | INFO | Step 4065: loss=0.0543, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T23:29:37.430779+0800 | INFO | Step 4066: loss=0.1804, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:29:38.857605+0800 | INFO | Step 4067: loss=0.0791, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:29:40.290746+0800 | INFO | Step 4068: loss=0.0697, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:29:41.730678+0800 | INFO | Step 4069: loss=0.0015, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:29:43.172738+0800 | INFO | Step 4070: loss=0.0402, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:29:44.617237+0800 | INFO | Step 4071: loss=0.1274, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:29:46.045549+0800 | INFO | Step 4072: loss=0.0112, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:29:47.496997+0800 | INFO | Step 4073: loss=0.1693, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:29:48.938611+0800 | INFO | Step 4074: loss=0.0287, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:29:50.366865+0800 | INFO | Step 4075: loss=0.0778, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:29:51.805084+0800 | INFO | Step 4076: loss=0.0021, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:29:53.250736+0800 | INFO | Step 4077: loss=0.0788, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:29:54.681897+0800 | INFO | Step 4078: loss=0.2267, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T23:29:56.119358+0800 | INFO | Step 4079: loss=0.1597, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:29:57.148551+0800 | INFO | Step 4080: loss=0.0569, acc=0.938 (IF=0.875, MQ=1.000)
2025-12-21T23:29:58.591928+0800 | INFO | Step 4081: loss=0.0025, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:30:00.040260+0800 | INFO | Step 4082: loss=0.0477, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:30:01.481963+0800 | INFO | Step 4083: loss=0.0145, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:30:02.929531+0800 | INFO | Step 4084: loss=0.1634, acc=0.887 (IF=0.900, MQ=0.875)
2025-12-21T23:30:04.391474+0800 | INFO | Step 4085: loss=0.0603, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:30:05.878929+0800 | INFO | Step 4086: loss=0.0164, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:30:07.325108+0800 | INFO | Step 4087: loss=0.0254, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:30:08.754013+0800 | INFO | Step 4088: loss=0.0668, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:30:10.190183+0800 | INFO | Step 4089: loss=0.0081, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:30:11.666342+0800 | INFO | Step 4090: loss=0.0662, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:30:13.107457+0800 | INFO | Step 4091: loss=0.0234, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:30:14.547349+0800 | INFO | Step 4092: loss=0.2445, acc=0.882 (IF=0.889, MQ=0.875)
2025-12-21T23:30:15.993949+0800 | INFO | Step 4093: loss=0.3330, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:30:17.431215+0800 | INFO | Step 4094: loss=0.0173, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:30:18.905606+0800 | INFO | Step 4095: loss=0.1311, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:30:20.375601+0800 | INFO | Step 4096: loss=0.1586, acc=0.885 (IF=0.833, MQ=0.938)
2025-12-21T23:30:21.825446+0800 | INFO | Step 4097: loss=0.0316, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:30:23.326214+0800 | INFO | Step 4098: loss=0.5865, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:30:24.761805+0800 | INFO | Step 4099: loss=0.2468, acc=0.930 (IF=0.923, MQ=0.938)
2025-12-21T23:30:26.201945+0800 | INFO | Step 4100: loss=0.0766, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:30:33.768869+0800 | INFO |
============================================================
Validation Results (took 7.54s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6034
Quality Acc: 0.6750
Average Acc: 0.6392
Total Loss: 0.6211
Instruction Loss: 0.6401
Quality Loss: 0.6021
============================================================
2025-12-21T23:30:36.386031+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_4100.pt
2025-12-21T23:30:36.386607+0800 | INFO | Best 3 checkpoints:
2025-12-21T23:30:36.386710+0800 | INFO | 1. Step 2800: acc=0.6541 (reward_model.best_2800.pt)
2025-12-21T23:30:36.386767+0800 | INFO | 2. Step 3500: acc=0.6526 (reward_model.best_3500.pt)
2025-12-21T23:30:36.386819+0800 | INFO | 3. Step 2600: acc=0.6478 (reward_model.best_2600.pt)
2025-12-21T23:30:37.875644+0800 | INFO | Step 4101: loss=0.1093, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:30:39.318513+0800 | INFO | Step 4102: loss=0.0063, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:30:40.766572+0800 | INFO | Step 4103: loss=0.1622, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:30:41.802586+0800 | INFO | Step 4104: loss=0.0324, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:30:43.251603+0800 | INFO | Step 4105: loss=0.0044, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:30:44.695863+0800 | INFO | Step 4106: loss=0.0664, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:30:47.378014+0800 | INFO | Step 4107: loss=0.0064, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:30:48.857696+0800 | INFO | Step 4108: loss=0.0017, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:30:50.321512+0800 | INFO | Step 4109: loss=0.0140, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:30:51.779042+0800 | INFO | Step 4110: loss=0.0010, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:30:53.237283+0800 | INFO | Step 4111: loss=0.1211, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:30:54.295351+0800 | INFO | Step 4112: loss=0.0018, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:30:55.728149+0800 | INFO | Step 4113: loss=0.0027, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:30:57.194295+0800 | INFO | Step 4114: loss=0.3678, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:30:58.626433+0800 | INFO | Step 4115: loss=0.1305, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:31:00.084905+0800 | INFO | Step 4116: loss=0.0373, acc=0.964 (IF=0.929, MQ=1.000)
2025-12-21T23:31:01.536031+0800 | INFO | Step 4117: loss=0.0509, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:03.010724+0800 | INFO | Step 4118: loss=0.0299, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:31:04.470837+0800 | INFO | Step 4119: loss=0.0480, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:31:05.917719+0800 | INFO | Step 4120: loss=0.0308, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:07.368026+0800 | INFO | Step 4121: loss=0.0973, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:08.820905+0800 | INFO | Step 4122: loss=0.0290, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:10.275424+0800 | INFO | Step 4123: loss=0.0441, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:11.755750+0800 | INFO | Step 4124: loss=0.0409, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:13.235608+0800 | INFO | Step 4125: loss=0.0162, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:14.694092+0800 | INFO | Step 4126: loss=0.2416, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:31:16.134057+0800 | INFO | Step 4127: loss=0.0485, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:31:17.599785+0800 | INFO | Step 4128: loss=0.0167, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:19.082091+0800 | INFO | Step 4129: loss=0.0166, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:20.529761+0800 | INFO | Step 4130: loss=0.0117, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:21.971050+0800 | INFO | Step 4131: loss=0.0378, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:23.420613+0800 | INFO | Step 4132: loss=0.0073, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:24.868964+0800 | INFO | Step 4133: loss=0.1381, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:31:26.336601+0800 | INFO | Step 4134: loss=0.0878, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:31:27.821616+0800 | INFO | Step 4135: loss=0.1496, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:31:29.325354+0800 | INFO | Step 4136: loss=0.0572, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:31:30.766471+0800 | INFO | Step 4137: loss=0.3470, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T23:31:32.215992+0800 | INFO | Step 4138: loss=0.0573, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:33.295689+0800 | INFO | Step 4139: loss=0.2106, acc=0.826 (IF=0.778, MQ=0.875)
2025-12-21T23:31:34.794321+0800 | INFO | Step 4140: loss=0.0605, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:36.254898+0800 | INFO | Step 4141: loss=0.0228, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:37.686848+0800 | INFO | Step 4142: loss=0.0160, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:39.131088+0800 | INFO | Step 4143: loss=0.0055, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:40.583197+0800 | INFO | Step 4144: loss=0.0454, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:31:42.024669+0800 | INFO | Step 4145: loss=0.1692, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:31:43.486168+0800 | INFO | Step 4146: loss=0.1206, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:31:44.956167+0800 | INFO | Step 4147: loss=0.0318, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:46.405702+0800 | INFO | Step 4148: loss=0.0105, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:47.853138+0800 | INFO | Step 4149: loss=0.0292, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:49.298501+0800 | INFO | Step 4150: loss=0.1884, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:31:50.756245+0800 | INFO | Step 4151: loss=0.0072, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:52.203908+0800 | INFO | Step 4152: loss=0.1644, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T23:31:53.684692+0800 | INFO | Step 4153: loss=0.0627, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:31:55.114899+0800 | INFO | Step 4154: loss=0.0027, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:56.542784+0800 | INFO | Step 4155: loss=0.0047, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:31:57.991789+0800 | INFO | Step 4156: loss=0.1249, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:31:59.432707+0800 | INFO | Step 4157: loss=0.0811, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:32:00.906668+0800 | INFO | Step 4158: loss=0.1012, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:32:02.372993+0800 | INFO | Step 4159: loss=0.0459, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:32:03.836272+0800 | INFO | Step 4160: loss=0.0139, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:32:05.292405+0800 | INFO | Step 4161: loss=0.0593, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:32:08.519585+0800 | INFO | Step 4162: loss=0.0006, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:32:09.947913+0800 | INFO | Step 4163: loss=0.1555, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T23:32:11.386908+0800 | INFO | Step 4164: loss=0.0804, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:32:12.839784+0800 | INFO | Step 4165: loss=0.0636, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:32:14.283332+0800 | INFO | Step 4166: loss=0.0149, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:32:15.720791+0800 | INFO | Step 4167: loss=0.4840, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T23:32:17.185635+0800 | INFO | Step 4168: loss=0.2350, acc=0.847 (IF=0.818, MQ=0.875)
2025-12-21T23:32:18.622486+0800 | INFO | Step 4169: loss=0.0065, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:32:20.059080+0800 | INFO | Step 4170: loss=0.0010, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:32:21.498136+0800 | INFO | Step 4171: loss=0.0075, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:32:22.940689+0800 | INFO | Step 4172: loss=0.0411, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:32:24.400228+0800 | INFO | Step 4173: loss=0.1003, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:32:25.880380+0800 | INFO | Step 4174: loss=0.2097, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:32:27.313704+0800 | INFO | Step 4175: loss=0.0414, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:32:28.796316+0800 | INFO | Step 4176: loss=0.4759, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:32:30.238300+0800 | INFO | Step 4177: loss=0.0265, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:32:31.685225+0800 | INFO | Step 4178: loss=0.0707, acc=0.967 (IF=0.933, MQ=1.000)
2025-12-21T23:32:33.130504+0800 | INFO | Step 4179: loss=0.0130, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:32:34.579434+0800 | INFO | Step 4180: loss=0.0240, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:32:36.029202+0800 | INFO | Step 4181: loss=0.1039, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:32:37.069648+0800 | INFO | Step 4182: loss=0.0025, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:32:38.521731+0800 | INFO | Step 4183: loss=0.0487, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:32:39.973558+0800 | INFO | Step 4184: loss=0.0049, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:32:40.639795+0800 | INFO | Step 4185: loss=0.1359, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T23:32:42.076150+0800 | INFO | Step 4186: loss=0.0017, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:32:43.521865+0800 | INFO | Step 4187: loss=0.1266, acc=0.950 (IF=0.900, MQ=1.000)
2025-12-21T23:32:44.967057+0800 | INFO | Step 4188: loss=0.0100, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:32:46.404361+0800 | INFO | Step 4189: loss=0.1126, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:32:47.886983+0800 | INFO | Step 4190: loss=0.0209, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:32:49.319759+0800 | INFO | Step 4191: loss=0.2530, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:32:50.781338+0800 | INFO | Step 4192: loss=0.0440, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:32:52.242963+0800 | INFO | Step 4193: loss=0.0079, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:32:53.674054+0800 | INFO | Step 4194: loss=0.0365, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:32:55.116426+0800 | INFO | Step 4195: loss=0.0235, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:32:56.558857+0800 | INFO | Step 4196: loss=0.0057, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:32:58.009820+0800 | INFO | Step 4197: loss=0.1538, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:32:59.453372+0800 | INFO | Step 4198: loss=0.1109, acc=0.955 (IF=0.909, MQ=1.000)
2025-12-21T23:33:00.899623+0800 | INFO | Step 4199: loss=0.1342, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:33:02.360407+0800 | INFO | Step 4200: loss=0.6336, acc=0.856 (IF=0.900, MQ=0.812)
2025-12-21T23:33:09.611395+0800 | INFO |
============================================================
Validation Results (took 7.22s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6379
Quality Acc: 0.6625
Average Acc: 0.6502
Total Loss: 0.6170
Instruction Loss: 0.6339
Quality Loss: 0.6002
============================================================
2025-12-21T23:33:12.106621+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_2600.pt
2025-12-21T23:33:12.107122+0800 | INFO | Best 3 checkpoints:
2025-12-21T23:33:12.107209+0800 | INFO | 1. Step 2800: acc=0.6541 (reward_model.best_2800.pt)
2025-12-21T23:33:12.107259+0800 | INFO | 2. Step 3500: acc=0.6526 (reward_model.best_3500.pt)
2025-12-21T23:33:12.107304+0800 | INFO | 3. Step 4200: acc=0.6502 (reward_model.best_4200.pt)
2025-12-21T23:33:13.196195+0800 | INFO | Step 4201: loss=0.2433, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:33:14.696716+0800 | INFO | Step 4202: loss=0.1096, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T23:33:16.136016+0800 | INFO | Step 4203: loss=0.0158, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:33:17.594602+0800 | INFO | Step 4204: loss=0.1254, acc=0.944 (IF=0.889, MQ=1.000)
2025-12-21T23:33:19.040681+0800 | INFO | Step 4205: loss=0.1207, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:33:20.506373+0800 | INFO | Step 4206: loss=0.0024, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:33:21.937623+0800 | INFO | Step 4207: loss=0.0325, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:33:23.376309+0800 | INFO | Step 4208: loss=0.0173, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:33:24.820178+0800 | INFO | Step 4209: loss=0.0258, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:33:26.252321+0800 | INFO | Step 4210: loss=0.0554, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:33:27.681208+0800 | INFO | Step 4211: loss=0.0808, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:33:29.122543+0800 | INFO | Step 4212: loss=0.0535, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:33:30.577908+0800 | INFO | Step 4213: loss=0.5661, acc=0.861 (IF=0.846, MQ=0.875)
2025-12-21T23:33:32.024892+0800 | INFO | Step 4214: loss=0.1393, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:33:33.482892+0800 | INFO | Step 4215: loss=0.0095, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:33:34.922491+0800 | INFO | Step 4216: loss=0.0110, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:33:36.357514+0800 | INFO | Step 4217: loss=0.0386, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:33:39.232026+0800 | INFO | Step 4218: loss=0.0361, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:33:40.831126+0800 | INFO | Step 4219: loss=0.0664, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:33:42.326987+0800 | INFO | Step 4220: loss=0.0062, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:33:43.777991+0800 | INFO | Step 4221: loss=0.0289, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:33:45.247093+0800 | INFO | Step 4222: loss=0.0241, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:33:46.849926+0800 | INFO | Step 4223: loss=0.1184, acc=0.927 (IF=0.917, MQ=0.938)
2025-12-21T23:33:48.333596+0800 | INFO | Step 4224: loss=0.1437, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T23:33:49.811739+0800 | INFO | Step 4225: loss=0.0801, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:33:51.244139+0800 | INFO | Step 4226: loss=0.0213, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:33:52.683750+0800 | INFO | Step 4227: loss=0.0121, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:33:54.187459+0800 | INFO | Step 4228: loss=0.0199, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:33:55.641572+0800 | INFO | Step 4229: loss=0.0033, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:33:57.084026+0800 | INFO | Step 4230: loss=0.0096, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:33:58.564676+0800 | INFO | Step 4231: loss=0.0057, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:00.211642+0800 | INFO | Step 4232: loss=0.0184, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:01.815130+0800 | INFO | Step 4233: loss=0.0243, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:03.396744+0800 | INFO | Step 4234: loss=0.2618, acc=0.913 (IF=0.889, MQ=0.938)
2025-12-21T23:34:04.920308+0800 | INFO | Step 4235: loss=0.0224, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:06.430165+0800 | INFO | Step 4236: loss=0.0239, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:07.885446+0800 | INFO | Step 4237: loss=0.0217, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:09.336547+0800 | INFO | Step 4238: loss=0.1134, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:34:10.765467+0800 | INFO | Step 4239: loss=0.0477, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:34:12.235043+0800 | INFO | Step 4240: loss=0.1360, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:34:13.724155+0800 | INFO | Step 4241: loss=0.0126, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:15.210142+0800 | INFO | Step 4242: loss=0.0653, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:34:16.647900+0800 | INFO | Step 4243: loss=0.0194, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:18.115178+0800 | INFO | Step 4244: loss=1.4922, acc=0.838 (IF=0.800, MQ=0.875)
2025-12-21T23:34:19.559393+0800 | INFO | Step 4245: loss=0.0162, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:21.001152+0800 | INFO | Step 4246: loss=0.0126, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:22.445240+0800 | INFO | Step 4247: loss=0.4968, acc=0.875 (IF=0.875, MQ=0.875)
2025-12-21T23:34:23.924958+0800 | INFO | Step 4248: loss=0.0358, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:34:25.418949+0800 | INFO | Step 4249: loss=0.0058, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:26.847399+0800 | INFO | Step 4250: loss=0.0342, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:28.291481+0800 | INFO | Step 4251: loss=0.0382, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:29.774955+0800 | INFO | Step 4252: loss=0.0262, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:31.226273+0800 | INFO | Step 4253: loss=0.5157, acc=0.923 (IF=0.846, MQ=1.000)
2025-12-21T23:34:32.688458+0800 | INFO | Step 4254: loss=0.5202, acc=0.878 (IF=0.818, MQ=0.938)
2025-12-21T23:34:34.200763+0800 | INFO | Step 4255: loss=0.1117, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:34:35.674715+0800 | INFO | Step 4256: loss=0.1781, acc=0.896 (IF=0.917, MQ=0.875)
2025-12-21T23:34:37.191078+0800 | INFO | Step 4257: loss=0.0243, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:34:38.632261+0800 | INFO | Step 4258: loss=0.0037, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:40.077221+0800 | INFO | Step 4259: loss=0.1325, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:34:41.524424+0800 | INFO | Step 4260: loss=0.0058, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:42.975660+0800 | INFO | Step 4261: loss=0.0889, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:34:44.471412+0800 | INFO | Step 4262: loss=0.0276, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:45.914536+0800 | INFO | Step 4263: loss=0.0115, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:47.367567+0800 | INFO | Step 4264: loss=0.0868, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:34:48.814429+0800 | INFO | Step 4265: loss=0.1719, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:34:50.304771+0800 | INFO | Step 4266: loss=0.0055, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:51.750449+0800 | INFO | Step 4267: loss=0.0008, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:53.186779+0800 | INFO | Step 4268: loss=0.0058, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:54.641154+0800 | INFO | Step 4269: loss=0.1788, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:34:56.061156+0800 | INFO | Step 4270: loss=0.0064, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:34:57.487817+0800 | INFO | Step 4271: loss=0.0620, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:34:58.928155+0800 | INFO | Step 4272: loss=0.1205, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:35:01.396387+0800 | INFO | Step 4273: loss=0.0368, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:35:02.851324+0800 | INFO | Step 4274: loss=0.0379, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:35:04.299849+0800 | INFO | Step 4275: loss=0.0171, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:35:05.751751+0800 | INFO | Step 4276: loss=0.0595, acc=0.962 (IF=0.923, MQ=1.000)
2025-12-21T23:35:07.204585+0800 | INFO | Step 4277: loss=0.0032, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:35:08.665824+0800 | INFO | Step 4278: loss=0.0042, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:35:10.110338+0800 | INFO | Step 4279: loss=0.1386, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:35:11.579465+0800 | INFO | Step 4280: loss=0.3669, acc=0.892 (IF=0.909, MQ=0.875)
2025-12-21T23:35:13.010858+0800 | INFO | Step 4281: loss=0.0025, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:35:14.464078+0800 | INFO | Step 4282: loss=0.1598, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:35:15.906661+0800 | INFO | Step 4283: loss=0.0667, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:35:17.369797+0800 | INFO | Step 4284: loss=0.0347, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:35:18.795971+0800 | INFO | Step 4285: loss=0.0472, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:35:20.230736+0800 | INFO | Step 4286: loss=0.0696, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:35:21.667768+0800 | INFO | Step 4287: loss=0.0776, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:35:23.104895+0800 | INFO | Step 4288: loss=0.0043, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:35:24.542305+0800 | INFO | Step 4289: loss=0.0261, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:35:25.979323+0800 | INFO | Step 4290: loss=0.2018, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:35:27.037034+0800 | INFO | Step 4291: loss=0.0028, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:35:28.473386+0800 | INFO | Step 4292: loss=0.0340, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:35:29.912510+0800 | INFO | Step 4293: loss=0.0514, acc=0.967 (IF=1.000, MQ=0.933)
2025-12-21T23:35:31.388959+0800 | INFO | Step 4294: loss=0.0123, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:35:32.449762+0800 | INFO | Step 4295: loss=0.0090, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:35:33.901079+0800 | INFO | Step 4296: loss=0.1421, acc=0.938 (IF=1.000, MQ=0.875)
2025-12-21T23:35:35.346149+0800 | INFO | Step 4297: loss=0.1022, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:35:36.796353+0800 | INFO | Step 4298: loss=0.1431, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:35:38.239727+0800 | INFO | Step 4299: loss=0.4933, acc=0.919 (IF=0.900, MQ=0.938)
2025-12-21T23:35:39.712163+0800 | INFO | Step 4300: loss=0.0086, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:35:46.979302+0800 | INFO |
============================================================
Validation Results (took 7.24s):
Samples: 58 instruction, 80 quality
Instruction Acc: 0.6379
Quality Acc: 0.6750
Average Acc: 0.6565
Total Loss: 0.6163
Instruction Loss: 0.6341
Quality Loss: 0.5985
============================================================
2025-12-21T23:35:49.614476+0800 | INFO | Removed old checkpoint: /data/yrb/musicarena/Haiwen/offline_data/cmi-arena/experiments/reward_model/debug_downsample_20251221_2142/ckpt/reward_model.best_4200.pt
2025-12-21T23:35:49.614995+0800 | INFO | Best 3 checkpoints:
2025-12-21T23:35:49.615091+0800 | INFO | 1. Step 4300: acc=0.6565 (reward_model.best_4300.pt)
2025-12-21T23:35:49.615148+0800 | INFO | 2. Step 2800: acc=0.6541 (reward_model.best_2800.pt)
2025-12-21T23:35:49.615194+0800 | INFO | 3. Step 3500: acc=0.6526 (reward_model.best_3500.pt)
2025-12-21T23:35:51.068733+0800 | INFO | Step 4301: loss=0.0222, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:35:52.503070+0800 | INFO | Step 4302: loss=0.3617, acc=0.923 (IF=0.909, MQ=0.938)
2025-12-21T23:35:53.965245+0800 | INFO | Step 4303: loss=0.0759, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:35:55.419640+0800 | INFO | Step 4304: loss=0.3492, acc=0.854 (IF=0.833, MQ=0.875)
2025-12-21T23:35:56.849731+0800 | INFO | Step 4305: loss=0.2971, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:35:57.924006+0800 | INFO | Step 4306: loss=0.0527, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:35:59.366242+0800 | INFO | Step 4307: loss=0.0045, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:36:00.795735+0800 | INFO | Step 4308: loss=0.0983, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:36:02.234663+0800 | INFO | Step 4309: loss=0.0041, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:36:03.678616+0800 | INFO | Step 4310: loss=0.0073, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:36:05.110306+0800 | INFO | Step 4311: loss=0.1343, acc=0.906 (IF=1.000, MQ=0.812)
2025-12-21T23:36:06.556627+0800 | INFO | Step 4312: loss=0.0045, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:36:08.032987+0800 | INFO | Step 4313: loss=0.1749, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:36:09.480500+0800 | INFO | Step 4314: loss=0.0272, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:36:10.924575+0800 | INFO | Step 4315: loss=0.0824, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:36:12.356448+0800 | INFO | Step 4316: loss=0.0307, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:36:13.818143+0800 | INFO | Step 4317: loss=0.0156, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:36:15.242093+0800 | INFO | Step 4318: loss=0.0181, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:36:16.680797+0800 | INFO | Step 4319: loss=0.0233, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:36:18.127225+0800 | INFO | Step 4320: loss=0.0414, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:36:19.572282+0800 | INFO | Step 4321: loss=0.0500, acc=1.000 (IF=1.000, MQ=1.000)
2025-12-21T23:36:21.016374+0800 | INFO | Step 4322: loss=0.0981, acc=0.958 (IF=0.917, MQ=1.000)
2025-12-21T23:36:22.464906+0800 | INFO | Step 4323: loss=0.0486, acc=0.969 (IF=1.000, MQ=0.938)
2025-12-21T23:36:23.915695+0800 | INFO | Step 4324: loss=0.1068, acc=0.919 (IF=0.900, MQ=0.938)