Junyi42 commited on
Commit
8eb1956
·
verified ·
1 Parent(s): 1438fc6

Upload checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins

Browse files
checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/wandb/offline-run-20260126_213949-checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins-run0/files/output.log CHANGED
@@ -1065,53 +1065,6 @@ wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
1065
  [2026-01-26 22:14:01] (step=0001054) Train Loss mse: 0.0175, Train Loss ce: 0.0570, Train Steps/Sec: 0.68,
1066
  [2026-01-26 22:14:03] (step=0001055) Train Loss mse: 0.0202, Train Loss ce: 0.0187, Train Steps/Sec: 0.68,
1067
  [2026-01-26 22:14:04] (step=0001056) Train Loss mse: 0.0158, Train Loss ce: 0.0599, Train Steps/Sec: 0.68,
1068
- [2026-01-26 22:14:06] (step=0001057) Train Loss mse: 0.0142, Train Loss ce: 0.0901, Train Steps/Sec: 0.68,
1069
- [2026-01-26 22:14:07] (step=0001058) Train Loss mse: 0.0084, Train Loss ce: 0.0419, Train Steps/Sec: 0.67,
1070
- [2026-01-26 22:14:09] (step=0001059) Train Loss mse: 0.0127, Train Loss ce: 0.0692, Train Steps/Sec: 0.56,
1071
- [2026-01-26 22:14:11] (step=0001060) Train Loss mse: 0.0078, Train Loss ce: 0.0673, Train Steps/Sec: 0.56,
1072
- [2026-01-26 22:14:12] (step=0001061) Train Loss mse: 0.0140, Train Loss ce: 0.0492, Train Steps/Sec: 0.58,
1073
- [2026-01-26 22:14:14] (step=0001062) Train Loss mse: 0.0124, Train Loss ce: 0.0609, Train Steps/Sec: 0.68,
1074
- [2026-01-26 22:14:15] (step=0001063) Train Loss mse: 0.0176, Train Loss ce: 0.0481, Train Steps/Sec: 0.69,
1075
- [2026-01-26 22:14:17] (step=0001064) Train Loss mse: 0.0090, Train Loss ce: 0.0420, Train Steps/Sec: 0.68,
1076
- [2026-01-26 22:14:18] (step=0001065) Train Loss mse: 0.0176, Train Loss ce: 0.0697, Train Steps/Sec: 0.68,
1077
- [2026-01-26 22:14:20] (step=0001066) Train Loss mse: 0.0142, Train Loss ce: 0.0573, Train Steps/Sec: 0.68,
1078
- [2026-01-26 22:14:21] (step=0001067) Train Loss mse: 0.0099, Train Loss ce: 0.0944, Train Steps/Sec: 0.68,
1079
- [2026-01-26 22:14:23] (step=0001068) Train Loss mse: 0.0228, Train Loss ce: 0.0403, Train Steps/Sec: 0.56,
1080
- [2026-01-26 22:14:25] (step=0001069) Train Loss mse: 0.0124, Train Loss ce: 0.0557, Train Steps/Sec: 0.56,
1081
- [2026-01-26 22:14:26] (step=0001070) Train Loss mse: 0.0111, Train Loss ce: 0.1486, Train Steps/Sec: 0.58,
1082
- [2026-01-26 22:14:28] (step=0001071) Train Loss mse: 0.0107, Train Loss ce: 0.0601, Train Steps/Sec: 0.68,
1083
- [2026-01-26 22:14:29] (step=0001072) Train Loss mse: 0.0119, Train Loss ce: 0.0674, Train Steps/Sec: 0.68,
1084
- [2026-01-26 22:14:31] (step=0001073) Train Loss mse: 0.0131, Train Loss ce: 0.0523, Train Steps/Sec: 0.68,
1085
- [2026-01-26 22:14:32] (step=0001074) Train Loss mse: 0.0094, Train Loss ce: 0.0564, Train Steps/Sec: 0.68,
1086
- [2026-01-26 22:14:34] (step=0001075) Train Loss mse: 0.0093, Train Loss ce: 0.0724, Train Steps/Sec: 0.68,
1087
- [2026-01-26 22:14:36] (step=0001076) Train Loss mse: 0.0124, Train Loss ce: 0.0541, Train Steps/Sec: 0.56,
1088
- [2026-01-26 22:14:37] (step=0001077) Train Loss mse: 0.0127, Train Loss ce: 0.0760, Train Steps/Sec: 0.56,
1089
- [2026-01-26 22:14:39] (step=0001078) Train Loss mse: 0.0197, Train Loss ce: 0.0570, Train Steps/Sec: 0.59,
1090
- [2026-01-26 22:14:41] (step=0001079) Train Loss mse: 0.0101, Train Loss ce: 0.0584, Train Steps/Sec: 0.67,
1091
- [2026-01-26 22:14:42] (step=0001080) Train Loss mse: 0.0209, Train Loss ce: 0.0491, Train Steps/Sec: 0.69,
1092
- [2026-01-26 22:14:43] (step=0001081) Train Loss mse: 0.0376, Train Loss ce: 0.0515, Train Steps/Sec: 0.68,
1093
- [2026-01-26 22:14:45] (step=0001082) Train Loss mse: 0.0092, Train Loss ce: 0.0488, Train Steps/Sec: 0.68,
1094
- [2026-01-26 22:14:47] (step=0001083) Train Loss mse: 0.0144, Train Loss ce: 0.0308, Train Steps/Sec: 0.57,
1095
- [2026-01-26 22:14:48] (step=0001084) Train Loss mse: 0.0096, Train Loss ce: 0.0897, Train Steps/Sec: 0.68,
1096
- [2026-01-26 22:14:50] (step=0001085) Train Loss mse: 0.0127, Train Loss ce: 0.0523, Train Steps/Sec: 0.57,
1097
- [2026-01-26 22:14:52] (step=0001086) Train Loss mse: 0.0295, Train Loss ce: 0.0351, Train Steps/Sec: 0.59,
1098
- [2026-01-26 22:14:53] (step=0001087) Train Loss mse: 0.0099, Train Loss ce: 0.0597, Train Steps/Sec: 0.68,
1099
- [2026-01-26 22:14:55] (step=0001088) Train Loss mse: 0.0112, Train Loss ce: 0.0307, Train Steps/Sec: 0.68,
1100
- [2026-01-26 22:14:56] (step=0001089) Train Loss mse: 0.0141, Train Loss ce: 0.0282, Train Steps/Sec: 0.68,
1101
- [2026-01-26 22:14:57] (step=0001090) Train Loss mse: 0.0187, Train Loss ce: 0.0744, Train Steps/Sec: 0.68,
1102
- [2026-01-26 22:14:59] (step=0001091) Train Loss mse: 0.0144, Train Loss ce: 0.0371, Train Steps/Sec: 0.56,
1103
- [2026-01-26 22:15:01] (step=0001092) Train Loss mse: 0.0181, Train Loss ce: 0.0742, Train Steps/Sec: 0.69,
1104
- [2026-01-26 22:15:02] (step=0001093) Train Loss mse: 0.0119, Train Loss ce: 0.0614, Train Steps/Sec: 0.56,
1105
- [2026-01-26 22:15:04] (step=0001094) Train Loss mse: 0.0140, Train Loss ce: 0.0572, Train Steps/Sec: 0.59,
1106
- [2026-01-26 22:15:06] (step=0001095) Train Loss mse: 0.0155, Train Loss ce: 0.0332, Train Steps/Sec: 0.69,
1107
- [2026-01-26 22:15:07] (step=0001096) Train Loss mse: 0.0166, Train Loss ce: 0.0692, Train Steps/Sec: 0.68,
1108
- [2026-01-26 22:15:09] (step=0001097) Train Loss mse: 0.0175, Train Loss ce: 0.0863, Train Steps/Sec: 0.68,
1109
- [2026-01-26 22:15:10] (step=0001098) Train Loss mse: 0.0082, Train Loss ce: 0.0662, Train Steps/Sec: 0.57,
1110
- [2026-01-26 22:15:12] (step=0001099) Train Loss mse: 0.0139, Train Loss ce: 0.0561, Train Steps/Sec: 0.68,
1111
- [2026-01-26 22:15:13] (step=0001100) Train Loss mse: 0.0106, Train Loss ce: 0.0682, Train Steps/Sec: 0.68,
1112
- [2026-01-26 22:15:15] (step=0001101) Train Loss mse: 0.0105, Train Loss ce: 0.0531, Train Steps/Sec: 0.56,
1113
- [2026-01-26 22:15:17] (step=0001102) Train Loss mse: 0.0123, Train Loss ce: 0.0902, Train Steps/Sec: 0.59,
1114
- [2026-01-26 22:15:18] (step=0001103) Train Loss mse: 0.0096, Train Loss ce: 0.0452, Train Steps/Sec: 0.68,
1115
  FullyShardedDataParallel(
1116
  (_fsdp_wrapped_module): Bagel(
1117
  (language_model): Qwen2ForCausalLM(
@@ -1319,6 +1272,60 @@ Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equat
1319
  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
1320
  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
1321
  ce_avg: 0.24203190207481384, mse_avg: 0.011826912872493267
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1322
  [2026-01-26 22:15:20] (step=0001104) Train Loss mse: 0.0135, Train Loss ce: 0.0637, Train Steps/Sec: 0.68,
1323
  [2026-01-26 22:15:21] (step=0001105) Train Loss mse: 0.0077, Train Loss ce: 0.0449, Train Steps/Sec: 0.68,
1324
  [2026-01-26 22:15:23] (step=0001106) Train Loss mse: 0.0119, Train Loss ce: 0.0483, Train Steps/Sec: 0.56,
@@ -2711,27 +2718,6 @@ ce_avg: 0.24203190207481384, mse_avg: 0.011826912872493267
2711
  [2026-01-26 22:52:50] (step=0002493) Train Loss mse: 0.0070, Train Loss ce: 0.0249, Train Steps/Sec: 0.68,
2712
  [2026-01-26 22:52:52] (step=0002494) Train Loss mse: 0.0089, Train Loss ce: 0.0502, Train Steps/Sec: 0.59,
2713
  [2026-01-26 22:52:54] (step=0002495) Train Loss mse: 0.0080, Train Loss ce: 0.0490, Train Steps/Sec: 0.43,
2714
- base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step2500
2715
- Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
2716
- [eval debug] first 3 batch fingerprints:
2717
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
2718
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
2719
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
2720
- ce_avg: 0.20800399780273438, mse_avg: 0.011797062121331692
2721
- base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step3000
2722
- Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
2723
- [eval debug] first 3 batch fingerprints:
2724
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
2725
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
2726
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
2727
- ce_avg: 0.04329414293169975, mse_avg: 0.006353132426738739
2728
- base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step3500
2729
- Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
2730
- [eval debug] first 3 batch fingerprints:
2731
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
2732
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
2733
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
2734
- ce_avg: 0.04275057464838028, mse_avg: 0.0056571937166154385
2735
  [2026-01-26 22:52:55] (step=0002496) Train Loss mse: 0.0109, Train Loss ce: 0.0463, Train Steps/Sec: 0.68,
2736
  [2026-01-26 22:52:57] (step=0002497) Train Loss mse: 0.0087, Train Loss ce: 0.0305, Train Steps/Sec: 0.59,
2737
  [2026-01-26 22:52:59] (step=0002498) Train Loss mse: 0.0078, Train Loss ce: 0.0502, Train Steps/Sec: 0.68,
@@ -2746,6 +2732,20 @@ ce_avg: 0.04275057464838028, mse_avg: 0.0056571937166154385
2746
  [2026-01-26 22:55:50] (step=0002504) Train Loss mse: 0.0089, Train Loss ce: 0.0733, Train Steps/Sec: 0.68,
2747
  [2026-01-26 22:55:52] (step=0002505) Train Loss mse: 0.0061, Train Loss ce: 0.0662, Train Steps/Sec: 0.68,
2748
  [2026-01-26 22:55:53] (step=0002506) Train Loss mse: 0.0068, Train Loss ce: 0.0412, Train Steps/Sec: 0.68,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2749
  [2026-01-26 22:55:55] (step=0002507) Train Loss mse: 0.0085, Train Loss ce: 0.0475, Train Steps/Sec: 0.49,
2750
  [2026-01-26 22:55:57] (step=0002508) Train Loss mse: 0.0056, Train Loss ce: 0.0384, Train Steps/Sec: 0.69,
2751
  [2026-01-26 22:55:58] (step=0002509) Train Loss mse: 0.0106, Train Loss ce: 0.0332, Train Steps/Sec: 0.68,
@@ -3762,20 +3762,6 @@ ce_avg: 0.04275057464838028, mse_avg: 0.0056571937166154385
3762
  [2026-01-26 23:22:58] (step=0003520) Train Loss mse: 0.0051, Train Loss ce: 0.0479, Train Steps/Sec: 0.68,
3763
  [2026-01-26 23:22:59] (step=0003521) Train Loss mse: 0.0087, Train Loss ce: 0.0276, Train Steps/Sec: 0.68,
3764
  [2026-01-26 23:23:01] (step=0003522) Train Loss mse: 0.0087, Train Loss ce: 0.0529, Train Steps/Sec: 0.68,
3765
- base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step4000
3766
- Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
3767
- [eval debug] first 3 batch fingerprints:
3768
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
3769
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
3770
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
3771
- ce_avg: 0.03896063566207886, mse_avg: 0.005920059513300657
3772
- base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step4500
3773
- Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
3774
- [eval debug] first 3 batch fingerprints:
3775
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
3776
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
3777
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
3778
- ce_avg: 0.03593315929174423, mse_avg: 0.005641990341246128
3779
  [2026-01-26 23:23:02] (step=0003523) Train Loss mse: 0.0064, Train Loss ce: 0.0316, Train Steps/Sec: 0.68,
3780
  [2026-01-26 23:23:04] (step=0003524) Train Loss mse: 0.0067, Train Loss ce: 0.0446, Train Steps/Sec: 0.49,
3781
  [2026-01-26 23:23:06] (step=0003525) Train Loss mse: 0.0094, Train Loss ce: 0.0456, Train Steps/Sec: 0.69,
@@ -3806,6 +3792,27 @@ ce_avg: 0.03593315929174423, mse_avg: 0.005641990341246128
3806
  [2026-01-26 23:23:44] (step=0003550) Train Loss mse: 0.0066, Train Loss ce: 0.0326, Train Steps/Sec: 0.68,
3807
  [2026-01-26 23:23:46] (step=0003551) Train Loss mse: 0.0086, Train Loss ce: 0.0259, Train Steps/Sec: 0.48,
3808
  [2026-01-26 23:23:48] (step=0003552) Train Loss mse: 0.0122, Train Loss ce: 0.0249, Train Steps/Sec: 0.59,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3809
  [2026-01-26 23:23:49] (step=0003553) Train Loss mse: 0.0082, Train Loss ce: 0.0280, Train Steps/Sec: 0.68,
3810
  [2026-01-26 23:23:51] (step=0003554) Train Loss mse: 0.0045, Train Loss ce: 0.0642, Train Steps/Sec: 0.68,
3811
  [2026-01-26 23:23:52] (step=0003555) Train Loss mse: 0.0075, Train Loss ce: 0.0468, Train Steps/Sec: 0.68,
@@ -5204,13 +5211,6 @@ ce_avg: 0.03593315929174423, mse_avg: 0.005641990341246128
5204
  [2026-01-27 00:01:03] (step=0004948) Train Loss mse: 0.0052, Train Loss ce: 0.0313, Train Steps/Sec: 0.57,
5205
  [2026-01-27 00:01:05] (step=0004949) Train Loss mse: 0.0078, Train Loss ce: 0.0787, Train Steps/Sec: 0.57,
5206
  [2026-01-27 00:01:07] (step=0004950) Train Loss mse: 0.0046, Train Loss ce: 0.0306, Train Steps/Sec: 0.59,
5207
- base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step5000
5208
- Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
5209
- [eval debug] first 3 batch fingerprints:
5210
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
5211
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
5212
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
5213
- ce_avg: 0.03604895621538162, mse_avg: 0.005358231253921986
5214
  [2026-01-27 00:01:08] (step=0004951) Train Loss mse: 0.0074, Train Loss ce: 0.0149, Train Steps/Sec: 0.68,
5215
  [2026-01-27 00:01:10] (step=0004952) Train Loss mse: 0.0051, Train Loss ce: 0.0171, Train Steps/Sec: 0.58,
5216
  [2026-01-27 00:01:12] (step=0004953) Train Loss mse: 0.0094, Train Loss ce: 0.0230, Train Steps/Sec: 0.68,
 
1065
  [2026-01-26 22:14:01] (step=0001054) Train Loss mse: 0.0175, Train Loss ce: 0.0570, Train Steps/Sec: 0.68,
1066
  [2026-01-26 22:14:03] (step=0001055) Train Loss mse: 0.0202, Train Loss ce: 0.0187, Train Steps/Sec: 0.68,
1067
  [2026-01-26 22:14:04] (step=0001056) Train Loss mse: 0.0158, Train Loss ce: 0.0599, Train Steps/Sec: 0.68,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1068
  FullyShardedDataParallel(
1069
  (_fsdp_wrapped_module): Bagel(
1070
  (language_model): Qwen2ForCausalLM(
 
1272
  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
1273
  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
1274
  ce_avg: 0.24203190207481384, mse_avg: 0.011826912872493267
1275
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step2500
1276
+ Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
1277
+ [eval debug] first 3 batch fingerprints:
1278
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
1279
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
1280
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
1281
+ ce_avg: 0.20800399780273438, mse_avg: 0.011797062121331692
1282
+ [2026-01-26 22:14:06] (step=0001057) Train Loss mse: 0.0142, Train Loss ce: 0.0901, Train Steps/Sec: 0.68,
1283
+ [2026-01-26 22:14:07] (step=0001058) Train Loss mse: 0.0084, Train Loss ce: 0.0419, Train Steps/Sec: 0.67,
1284
+ [2026-01-26 22:14:09] (step=0001059) Train Loss mse: 0.0127, Train Loss ce: 0.0692, Train Steps/Sec: 0.56,
1285
+ [2026-01-26 22:14:11] (step=0001060) Train Loss mse: 0.0078, Train Loss ce: 0.0673, Train Steps/Sec: 0.56,
1286
+ [2026-01-26 22:14:12] (step=0001061) Train Loss mse: 0.0140, Train Loss ce: 0.0492, Train Steps/Sec: 0.58,
1287
+ [2026-01-26 22:14:14] (step=0001062) Train Loss mse: 0.0124, Train Loss ce: 0.0609, Train Steps/Sec: 0.68,
1288
+ [2026-01-26 22:14:15] (step=0001063) Train Loss mse: 0.0176, Train Loss ce: 0.0481, Train Steps/Sec: 0.69,
1289
+ [2026-01-26 22:14:17] (step=0001064) Train Loss mse: 0.0090, Train Loss ce: 0.0420, Train Steps/Sec: 0.68,
1290
+ [2026-01-26 22:14:18] (step=0001065) Train Loss mse: 0.0176, Train Loss ce: 0.0697, Train Steps/Sec: 0.68,
1291
+ [2026-01-26 22:14:20] (step=0001066) Train Loss mse: 0.0142, Train Loss ce: 0.0573, Train Steps/Sec: 0.68,
1292
+ [2026-01-26 22:14:21] (step=0001067) Train Loss mse: 0.0099, Train Loss ce: 0.0944, Train Steps/Sec: 0.68,
1293
+ [2026-01-26 22:14:23] (step=0001068) Train Loss mse: 0.0228, Train Loss ce: 0.0403, Train Steps/Sec: 0.56,
1294
+ [2026-01-26 22:14:25] (step=0001069) Train Loss mse: 0.0124, Train Loss ce: 0.0557, Train Steps/Sec: 0.56,
1295
+ [2026-01-26 22:14:26] (step=0001070) Train Loss mse: 0.0111, Train Loss ce: 0.1486, Train Steps/Sec: 0.58,
1296
+ [2026-01-26 22:14:28] (step=0001071) Train Loss mse: 0.0107, Train Loss ce: 0.0601, Train Steps/Sec: 0.68,
1297
+ [2026-01-26 22:14:29] (step=0001072) Train Loss mse: 0.0119, Train Loss ce: 0.0674, Train Steps/Sec: 0.68,
1298
+ [2026-01-26 22:14:31] (step=0001073) Train Loss mse: 0.0131, Train Loss ce: 0.0523, Train Steps/Sec: 0.68,
1299
+ [2026-01-26 22:14:32] (step=0001074) Train Loss mse: 0.0094, Train Loss ce: 0.0564, Train Steps/Sec: 0.68,
1300
+ [2026-01-26 22:14:34] (step=0001075) Train Loss mse: 0.0093, Train Loss ce: 0.0724, Train Steps/Sec: 0.68,
1301
+ [2026-01-26 22:14:36] (step=0001076) Train Loss mse: 0.0124, Train Loss ce: 0.0541, Train Steps/Sec: 0.56,
1302
+ [2026-01-26 22:14:37] (step=0001077) Train Loss mse: 0.0127, Train Loss ce: 0.0760, Train Steps/Sec: 0.56,
1303
+ [2026-01-26 22:14:39] (step=0001078) Train Loss mse: 0.0197, Train Loss ce: 0.0570, Train Steps/Sec: 0.59,
1304
+ [2026-01-26 22:14:41] (step=0001079) Train Loss mse: 0.0101, Train Loss ce: 0.0584, Train Steps/Sec: 0.67,
1305
+ [2026-01-26 22:14:42] (step=0001080) Train Loss mse: 0.0209, Train Loss ce: 0.0491, Train Steps/Sec: 0.69,
1306
+ [2026-01-26 22:14:43] (step=0001081) Train Loss mse: 0.0376, Train Loss ce: 0.0515, Train Steps/Sec: 0.68,
1307
+ [2026-01-26 22:14:45] (step=0001082) Train Loss mse: 0.0092, Train Loss ce: 0.0488, Train Steps/Sec: 0.68,
1308
+ [2026-01-26 22:14:47] (step=0001083) Train Loss mse: 0.0144, Train Loss ce: 0.0308, Train Steps/Sec: 0.57,
1309
+ [2026-01-26 22:14:48] (step=0001084) Train Loss mse: 0.0096, Train Loss ce: 0.0897, Train Steps/Sec: 0.68,
1310
+ [2026-01-26 22:14:50] (step=0001085) Train Loss mse: 0.0127, Train Loss ce: 0.0523, Train Steps/Sec: 0.57,
1311
+ [2026-01-26 22:14:52] (step=0001086) Train Loss mse: 0.0295, Train Loss ce: 0.0351, Train Steps/Sec: 0.59,
1312
+ [2026-01-26 22:14:53] (step=0001087) Train Loss mse: 0.0099, Train Loss ce: 0.0597, Train Steps/Sec: 0.68,
1313
+ [2026-01-26 22:14:55] (step=0001088) Train Loss mse: 0.0112, Train Loss ce: 0.0307, Train Steps/Sec: 0.68,
1314
+ [2026-01-26 22:14:56] (step=0001089) Train Loss mse: 0.0141, Train Loss ce: 0.0282, Train Steps/Sec: 0.68,
1315
+ [2026-01-26 22:14:57] (step=0001090) Train Loss mse: 0.0187, Train Loss ce: 0.0744, Train Steps/Sec: 0.68,
1316
+ [2026-01-26 22:14:59] (step=0001091) Train Loss mse: 0.0144, Train Loss ce: 0.0371, Train Steps/Sec: 0.56,
1317
+ [2026-01-26 22:15:01] (step=0001092) Train Loss mse: 0.0181, Train Loss ce: 0.0742, Train Steps/Sec: 0.69,
1318
+ [2026-01-26 22:15:02] (step=0001093) Train Loss mse: 0.0119, Train Loss ce: 0.0614, Train Steps/Sec: 0.56,
1319
+ [2026-01-26 22:15:04] (step=0001094) Train Loss mse: 0.0140, Train Loss ce: 0.0572, Train Steps/Sec: 0.59,
1320
+ [2026-01-26 22:15:06] (step=0001095) Train Loss mse: 0.0155, Train Loss ce: 0.0332, Train Steps/Sec: 0.69,
1321
+ [2026-01-26 22:15:07] (step=0001096) Train Loss mse: 0.0166, Train Loss ce: 0.0692, Train Steps/Sec: 0.68,
1322
+ [2026-01-26 22:15:09] (step=0001097) Train Loss mse: 0.0175, Train Loss ce: 0.0863, Train Steps/Sec: 0.68,
1323
+ [2026-01-26 22:15:10] (step=0001098) Train Loss mse: 0.0082, Train Loss ce: 0.0662, Train Steps/Sec: 0.57,
1324
+ [2026-01-26 22:15:12] (step=0001099) Train Loss mse: 0.0139, Train Loss ce: 0.0561, Train Steps/Sec: 0.68,
1325
+ [2026-01-26 22:15:13] (step=0001100) Train Loss mse: 0.0106, Train Loss ce: 0.0682, Train Steps/Sec: 0.68,
1326
+ [2026-01-26 22:15:15] (step=0001101) Train Loss mse: 0.0105, Train Loss ce: 0.0531, Train Steps/Sec: 0.56,
1327
+ [2026-01-26 22:15:17] (step=0001102) Train Loss mse: 0.0123, Train Loss ce: 0.0902, Train Steps/Sec: 0.59,
1328
+ [2026-01-26 22:15:18] (step=0001103) Train Loss mse: 0.0096, Train Loss ce: 0.0452, Train Steps/Sec: 0.68,
1329
  [2026-01-26 22:15:20] (step=0001104) Train Loss mse: 0.0135, Train Loss ce: 0.0637, Train Steps/Sec: 0.68,
1330
  [2026-01-26 22:15:21] (step=0001105) Train Loss mse: 0.0077, Train Loss ce: 0.0449, Train Steps/Sec: 0.68,
1331
  [2026-01-26 22:15:23] (step=0001106) Train Loss mse: 0.0119, Train Loss ce: 0.0483, Train Steps/Sec: 0.56,
 
2718
  [2026-01-26 22:52:50] (step=0002493) Train Loss mse: 0.0070, Train Loss ce: 0.0249, Train Steps/Sec: 0.68,
2719
  [2026-01-26 22:52:52] (step=0002494) Train Loss mse: 0.0089, Train Loss ce: 0.0502, Train Steps/Sec: 0.59,
2720
  [2026-01-26 22:52:54] (step=0002495) Train Loss mse: 0.0080, Train Loss ce: 0.0490, Train Steps/Sec: 0.43,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2721
  [2026-01-26 22:52:55] (step=0002496) Train Loss mse: 0.0109, Train Loss ce: 0.0463, Train Steps/Sec: 0.68,
2722
  [2026-01-26 22:52:57] (step=0002497) Train Loss mse: 0.0087, Train Loss ce: 0.0305, Train Steps/Sec: 0.59,
2723
  [2026-01-26 22:52:59] (step=0002498) Train Loss mse: 0.0078, Train Loss ce: 0.0502, Train Steps/Sec: 0.68,
 
2732
  [2026-01-26 22:55:50] (step=0002504) Train Loss mse: 0.0089, Train Loss ce: 0.0733, Train Steps/Sec: 0.68,
2733
  [2026-01-26 22:55:52] (step=0002505) Train Loss mse: 0.0061, Train Loss ce: 0.0662, Train Steps/Sec: 0.68,
2734
  [2026-01-26 22:55:53] (step=0002506) Train Loss mse: 0.0068, Train Loss ce: 0.0412, Train Steps/Sec: 0.68,
2735
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step3000
2736
+ Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
2737
+ [eval debug] first 3 batch fingerprints:
2738
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
2739
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
2740
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
2741
+ ce_avg: 0.04329414293169975, mse_avg: 0.006353132426738739
2742
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step3500
2743
+ Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
2744
+ [eval debug] first 3 batch fingerprints:
2745
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
2746
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
2747
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
2748
+ ce_avg: 0.04275057464838028, mse_avg: 0.0056571937166154385
2749
  [2026-01-26 22:55:55] (step=0002507) Train Loss mse: 0.0085, Train Loss ce: 0.0475, Train Steps/Sec: 0.49,
2750
  [2026-01-26 22:55:57] (step=0002508) Train Loss mse: 0.0056, Train Loss ce: 0.0384, Train Steps/Sec: 0.69,
2751
  [2026-01-26 22:55:58] (step=0002509) Train Loss mse: 0.0106, Train Loss ce: 0.0332, Train Steps/Sec: 0.68,
 
3762
  [2026-01-26 23:22:58] (step=0003520) Train Loss mse: 0.0051, Train Loss ce: 0.0479, Train Steps/Sec: 0.68,
3763
  [2026-01-26 23:22:59] (step=0003521) Train Loss mse: 0.0087, Train Loss ce: 0.0276, Train Steps/Sec: 0.68,
3764
  [2026-01-26 23:23:01] (step=0003522) Train Loss mse: 0.0087, Train Loss ce: 0.0529, Train Steps/Sec: 0.68,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3765
  [2026-01-26 23:23:02] (step=0003523) Train Loss mse: 0.0064, Train Loss ce: 0.0316, Train Steps/Sec: 0.68,
3766
  [2026-01-26 23:23:04] (step=0003524) Train Loss mse: 0.0067, Train Loss ce: 0.0446, Train Steps/Sec: 0.49,
3767
  [2026-01-26 23:23:06] (step=0003525) Train Loss mse: 0.0094, Train Loss ce: 0.0456, Train Steps/Sec: 0.69,
 
3792
  [2026-01-26 23:23:44] (step=0003550) Train Loss mse: 0.0066, Train Loss ce: 0.0326, Train Steps/Sec: 0.68,
3793
  [2026-01-26 23:23:46] (step=0003551) Train Loss mse: 0.0086, Train Loss ce: 0.0259, Train Steps/Sec: 0.48,
3794
  [2026-01-26 23:23:48] (step=0003552) Train Loss mse: 0.0122, Train Loss ce: 0.0249, Train Steps/Sec: 0.59,
3795
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step4000
3796
+ Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
3797
+ [eval debug] first 3 batch fingerprints:
3798
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
3799
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
3800
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
3801
+ ce_avg: 0.03896063566207886, mse_avg: 0.005920059513300657
3802
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step4500
3803
+ Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
3804
+ [eval debug] first 3 batch fingerprints:
3805
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
3806
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
3807
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
3808
+ ce_avg: 0.03593315929174423, mse_avg: 0.005641990341246128
3809
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step5000
3810
+ Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
3811
+ [eval debug] first 3 batch fingerprints:
3812
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
3813
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
3814
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
3815
+ ce_avg: 0.03604895621538162, mse_avg: 0.005358231253921986
3816
  [2026-01-26 23:23:49] (step=0003553) Train Loss mse: 0.0082, Train Loss ce: 0.0280, Train Steps/Sec: 0.68,
3817
  [2026-01-26 23:23:51] (step=0003554) Train Loss mse: 0.0045, Train Loss ce: 0.0642, Train Steps/Sec: 0.68,
3818
  [2026-01-26 23:23:52] (step=0003555) Train Loss mse: 0.0075, Train Loss ce: 0.0468, Train Steps/Sec: 0.68,
 
5211
  [2026-01-27 00:01:03] (step=0004948) Train Loss mse: 0.0052, Train Loss ce: 0.0313, Train Steps/Sec: 0.57,
5212
  [2026-01-27 00:01:05] (step=0004949) Train Loss mse: 0.0078, Train Loss ce: 0.0787, Train Steps/Sec: 0.57,
5213
  [2026-01-27 00:01:07] (step=0004950) Train Loss mse: 0.0046, Train Loss ce: 0.0306, Train Steps/Sec: 0.59,
 
 
 
 
 
 
 
5214
  [2026-01-27 00:01:08] (step=0004951) Train Loss mse: 0.0074, Train Loss ce: 0.0149, Train Steps/Sec: 0.68,
5215
  [2026-01-27 00:01:10] (step=0004952) Train Loss mse: 0.0051, Train Loss ce: 0.0171, Train Steps/Sec: 0.58,
5216
  [2026-01-27 00:01:12] (step=0004953) Train Loss mse: 0.0094, Train Loss ce: 0.0230, Train Steps/Sec: 0.68,