Upload checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins
Browse files
checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/wandb/offline-run-20260126_213949-checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins-run0/files/output.log
CHANGED
|
@@ -1065,53 +1065,6 @@ wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
|
|
| 1065 |
[[34m2026-01-26 22:14:01[39m] (step=0001054) Train Loss mse: 0.0175, Train Loss ce: 0.0570, Train Steps/Sec: 0.68,
|
| 1066 |
[[34m2026-01-26 22:14:03[39m] (step=0001055) Train Loss mse: 0.0202, Train Loss ce: 0.0187, Train Steps/Sec: 0.68,
|
| 1067 |
[[34m2026-01-26 22:14:04[39m] (step=0001056) Train Loss mse: 0.0158, Train Loss ce: 0.0599, Train Steps/Sec: 0.68,
|
| 1068 |
-
[[34m2026-01-26 22:14:06[39m] (step=0001057) Train Loss mse: 0.0142, Train Loss ce: 0.0901, Train Steps/Sec: 0.68,
|
| 1069 |
-
[[34m2026-01-26 22:14:07[39m] (step=0001058) Train Loss mse: 0.0084, Train Loss ce: 0.0419, Train Steps/Sec: 0.67,
|
| 1070 |
-
[[34m2026-01-26 22:14:09[39m] (step=0001059) Train Loss mse: 0.0127, Train Loss ce: 0.0692, Train Steps/Sec: 0.56,
|
| 1071 |
-
[[34m2026-01-26 22:14:11[39m] (step=0001060) Train Loss mse: 0.0078, Train Loss ce: 0.0673, Train Steps/Sec: 0.56,
|
| 1072 |
-
[[34m2026-01-26 22:14:12[39m] (step=0001061) Train Loss mse: 0.0140, Train Loss ce: 0.0492, Train Steps/Sec: 0.58,
|
| 1073 |
-
[[34m2026-01-26 22:14:14[39m] (step=0001062) Train Loss mse: 0.0124, Train Loss ce: 0.0609, Train Steps/Sec: 0.68,
|
| 1074 |
-
[[34m2026-01-26 22:14:15[39m] (step=0001063) Train Loss mse: 0.0176, Train Loss ce: 0.0481, Train Steps/Sec: 0.69,
|
| 1075 |
-
[[34m2026-01-26 22:14:17[39m] (step=0001064) Train Loss mse: 0.0090, Train Loss ce: 0.0420, Train Steps/Sec: 0.68,
|
| 1076 |
-
[[34m2026-01-26 22:14:18[39m] (step=0001065) Train Loss mse: 0.0176, Train Loss ce: 0.0697, Train Steps/Sec: 0.68,
|
| 1077 |
-
[[34m2026-01-26 22:14:20[39m] (step=0001066) Train Loss mse: 0.0142, Train Loss ce: 0.0573, Train Steps/Sec: 0.68,
|
| 1078 |
-
[[34m2026-01-26 22:14:21[39m] (step=0001067) Train Loss mse: 0.0099, Train Loss ce: 0.0944, Train Steps/Sec: 0.68,
|
| 1079 |
-
[[34m2026-01-26 22:14:23[39m] (step=0001068) Train Loss mse: 0.0228, Train Loss ce: 0.0403, Train Steps/Sec: 0.56,
|
| 1080 |
-
[[34m2026-01-26 22:14:25[39m] (step=0001069) Train Loss mse: 0.0124, Train Loss ce: 0.0557, Train Steps/Sec: 0.56,
|
| 1081 |
-
[[34m2026-01-26 22:14:26[39m] (step=0001070) Train Loss mse: 0.0111, Train Loss ce: 0.1486, Train Steps/Sec: 0.58,
|
| 1082 |
-
[[34m2026-01-26 22:14:28[39m] (step=0001071) Train Loss mse: 0.0107, Train Loss ce: 0.0601, Train Steps/Sec: 0.68,
|
| 1083 |
-
[[34m2026-01-26 22:14:29[39m] (step=0001072) Train Loss mse: 0.0119, Train Loss ce: 0.0674, Train Steps/Sec: 0.68,
|
| 1084 |
-
[[34m2026-01-26 22:14:31[39m] (step=0001073) Train Loss mse: 0.0131, Train Loss ce: 0.0523, Train Steps/Sec: 0.68,
|
| 1085 |
-
[[34m2026-01-26 22:14:32[39m] (step=0001074) Train Loss mse: 0.0094, Train Loss ce: 0.0564, Train Steps/Sec: 0.68,
|
| 1086 |
-
[[34m2026-01-26 22:14:34[39m] (step=0001075) Train Loss mse: 0.0093, Train Loss ce: 0.0724, Train Steps/Sec: 0.68,
|
| 1087 |
-
[[34m2026-01-26 22:14:36[39m] (step=0001076) Train Loss mse: 0.0124, Train Loss ce: 0.0541, Train Steps/Sec: 0.56,
|
| 1088 |
-
[[34m2026-01-26 22:14:37[39m] (step=0001077) Train Loss mse: 0.0127, Train Loss ce: 0.0760, Train Steps/Sec: 0.56,
|
| 1089 |
-
[[34m2026-01-26 22:14:39[39m] (step=0001078) Train Loss mse: 0.0197, Train Loss ce: 0.0570, Train Steps/Sec: 0.59,
|
| 1090 |
-
[[34m2026-01-26 22:14:41[39m] (step=0001079) Train Loss mse: 0.0101, Train Loss ce: 0.0584, Train Steps/Sec: 0.67,
|
| 1091 |
-
[[34m2026-01-26 22:14:42[39m] (step=0001080) Train Loss mse: 0.0209, Train Loss ce: 0.0491, Train Steps/Sec: 0.69,
|
| 1092 |
-
[[34m2026-01-26 22:14:43[39m] (step=0001081) Train Loss mse: 0.0376, Train Loss ce: 0.0515, Train Steps/Sec: 0.68,
|
| 1093 |
-
[[34m2026-01-26 22:14:45[39m] (step=0001082) Train Loss mse: 0.0092, Train Loss ce: 0.0488, Train Steps/Sec: 0.68,
|
| 1094 |
-
[[34m2026-01-26 22:14:47[39m] (step=0001083) Train Loss mse: 0.0144, Train Loss ce: 0.0308, Train Steps/Sec: 0.57,
|
| 1095 |
-
[[34m2026-01-26 22:14:48[39m] (step=0001084) Train Loss mse: 0.0096, Train Loss ce: 0.0897, Train Steps/Sec: 0.68,
|
| 1096 |
-
[[34m2026-01-26 22:14:50[39m] (step=0001085) Train Loss mse: 0.0127, Train Loss ce: 0.0523, Train Steps/Sec: 0.57,
|
| 1097 |
-
[[34m2026-01-26 22:14:52[39m] (step=0001086) Train Loss mse: 0.0295, Train Loss ce: 0.0351, Train Steps/Sec: 0.59,
|
| 1098 |
-
[[34m2026-01-26 22:14:53[39m] (step=0001087) Train Loss mse: 0.0099, Train Loss ce: 0.0597, Train Steps/Sec: 0.68,
|
| 1099 |
-
[[34m2026-01-26 22:14:55[39m] (step=0001088) Train Loss mse: 0.0112, Train Loss ce: 0.0307, Train Steps/Sec: 0.68,
|
| 1100 |
-
[[34m2026-01-26 22:14:56[39m] (step=0001089) Train Loss mse: 0.0141, Train Loss ce: 0.0282, Train Steps/Sec: 0.68,
|
| 1101 |
-
[[34m2026-01-26 22:14:57[39m] (step=0001090) Train Loss mse: 0.0187, Train Loss ce: 0.0744, Train Steps/Sec: 0.68,
|
| 1102 |
-
[[34m2026-01-26 22:14:59[39m] (step=0001091) Train Loss mse: 0.0144, Train Loss ce: 0.0371, Train Steps/Sec: 0.56,
|
| 1103 |
-
[[34m2026-01-26 22:15:01[39m] (step=0001092) Train Loss mse: 0.0181, Train Loss ce: 0.0742, Train Steps/Sec: 0.69,
|
| 1104 |
-
[[34m2026-01-26 22:15:02[39m] (step=0001093) Train Loss mse: 0.0119, Train Loss ce: 0.0614, Train Steps/Sec: 0.56,
|
| 1105 |
-
[[34m2026-01-26 22:15:04[39m] (step=0001094) Train Loss mse: 0.0140, Train Loss ce: 0.0572, Train Steps/Sec: 0.59,
|
| 1106 |
-
[[34m2026-01-26 22:15:06[39m] (step=0001095) Train Loss mse: 0.0155, Train Loss ce: 0.0332, Train Steps/Sec: 0.69,
|
| 1107 |
-
[[34m2026-01-26 22:15:07[39m] (step=0001096) Train Loss mse: 0.0166, Train Loss ce: 0.0692, Train Steps/Sec: 0.68,
|
| 1108 |
-
[[34m2026-01-26 22:15:09[39m] (step=0001097) Train Loss mse: 0.0175, Train Loss ce: 0.0863, Train Steps/Sec: 0.68,
|
| 1109 |
-
[[34m2026-01-26 22:15:10[39m] (step=0001098) Train Loss mse: 0.0082, Train Loss ce: 0.0662, Train Steps/Sec: 0.57,
|
| 1110 |
-
[[34m2026-01-26 22:15:12[39m] (step=0001099) Train Loss mse: 0.0139, Train Loss ce: 0.0561, Train Steps/Sec: 0.68,
|
| 1111 |
-
[[34m2026-01-26 22:15:13[39m] (step=0001100) Train Loss mse: 0.0106, Train Loss ce: 0.0682, Train Steps/Sec: 0.68,
|
| 1112 |
-
[[34m2026-01-26 22:15:15[39m] (step=0001101) Train Loss mse: 0.0105, Train Loss ce: 0.0531, Train Steps/Sec: 0.56,
|
| 1113 |
-
[[34m2026-01-26 22:15:17[39m] (step=0001102) Train Loss mse: 0.0123, Train Loss ce: 0.0902, Train Steps/Sec: 0.59,
|
| 1114 |
-
[[34m2026-01-26 22:15:18[39m] (step=0001103) Train Loss mse: 0.0096, Train Loss ce: 0.0452, Train Steps/Sec: 0.68,
|
| 1115 |
FullyShardedDataParallel(
|
| 1116 |
(_fsdp_wrapped_module): Bagel(
|
| 1117 |
(language_model): Qwen2ForCausalLM(
|
|
@@ -1319,6 +1272,60 @@ Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equat
|
|
| 1319 |
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 1320 |
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 1321 |
ce_avg: 0.24203190207481384, mse_avg: 0.011826912872493267
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1322 |
[[34m2026-01-26 22:15:20[39m] (step=0001104) Train Loss mse: 0.0135, Train Loss ce: 0.0637, Train Steps/Sec: 0.68,
|
| 1323 |
[[34m2026-01-26 22:15:21[39m] (step=0001105) Train Loss mse: 0.0077, Train Loss ce: 0.0449, Train Steps/Sec: 0.68,
|
| 1324 |
[[34m2026-01-26 22:15:23[39m] (step=0001106) Train Loss mse: 0.0119, Train Loss ce: 0.0483, Train Steps/Sec: 0.56,
|
|
@@ -2711,27 +2718,6 @@ ce_avg: 0.24203190207481384, mse_avg: 0.011826912872493267
|
|
| 2711 |
[[34m2026-01-26 22:52:50[39m] (step=0002493) Train Loss mse: 0.0070, Train Loss ce: 0.0249, Train Steps/Sec: 0.68,
|
| 2712 |
[[34m2026-01-26 22:52:52[39m] (step=0002494) Train Loss mse: 0.0089, Train Loss ce: 0.0502, Train Steps/Sec: 0.59,
|
| 2713 |
[[34m2026-01-26 22:52:54[39m] (step=0002495) Train Loss mse: 0.0080, Train Loss ce: 0.0490, Train Steps/Sec: 0.43,
|
| 2714 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step2500
|
| 2715 |
-
Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
|
| 2716 |
-
[eval debug] first 3 batch fingerprints:
|
| 2717 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 2718 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 2719 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 2720 |
-
ce_avg: 0.20800399780273438, mse_avg: 0.011797062121331692
|
| 2721 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step3000
|
| 2722 |
-
Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
|
| 2723 |
-
[eval debug] first 3 batch fingerprints:
|
| 2724 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 2725 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 2726 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 2727 |
-
ce_avg: 0.04329414293169975, mse_avg: 0.006353132426738739
|
| 2728 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step3500
|
| 2729 |
-
Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
|
| 2730 |
-
[eval debug] first 3 batch fingerprints:
|
| 2731 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 2732 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 2733 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 2734 |
-
ce_avg: 0.04275057464838028, mse_avg: 0.0056571937166154385
|
| 2735 |
[[34m2026-01-26 22:52:55[39m] (step=0002496) Train Loss mse: 0.0109, Train Loss ce: 0.0463, Train Steps/Sec: 0.68,
|
| 2736 |
[[34m2026-01-26 22:52:57[39m] (step=0002497) Train Loss mse: 0.0087, Train Loss ce: 0.0305, Train Steps/Sec: 0.59,
|
| 2737 |
[[34m2026-01-26 22:52:59[39m] (step=0002498) Train Loss mse: 0.0078, Train Loss ce: 0.0502, Train Steps/Sec: 0.68,
|
|
@@ -2746,6 +2732,20 @@ ce_avg: 0.04275057464838028, mse_avg: 0.0056571937166154385
|
|
| 2746 |
[[34m2026-01-26 22:55:50[39m] (step=0002504) Train Loss mse: 0.0089, Train Loss ce: 0.0733, Train Steps/Sec: 0.68,
|
| 2747 |
[[34m2026-01-26 22:55:52[39m] (step=0002505) Train Loss mse: 0.0061, Train Loss ce: 0.0662, Train Steps/Sec: 0.68,
|
| 2748 |
[[34m2026-01-26 22:55:53[39m] (step=0002506) Train Loss mse: 0.0068, Train Loss ce: 0.0412, Train Steps/Sec: 0.68,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2749 |
[[34m2026-01-26 22:55:55[39m] (step=0002507) Train Loss mse: 0.0085, Train Loss ce: 0.0475, Train Steps/Sec: 0.49,
|
| 2750 |
[[34m2026-01-26 22:55:57[39m] (step=0002508) Train Loss mse: 0.0056, Train Loss ce: 0.0384, Train Steps/Sec: 0.69,
|
| 2751 |
[[34m2026-01-26 22:55:58[39m] (step=0002509) Train Loss mse: 0.0106, Train Loss ce: 0.0332, Train Steps/Sec: 0.68,
|
|
@@ -3762,20 +3762,6 @@ ce_avg: 0.04275057464838028, mse_avg: 0.0056571937166154385
|
|
| 3762 |
[[34m2026-01-26 23:22:58[39m] (step=0003520) Train Loss mse: 0.0051, Train Loss ce: 0.0479, Train Steps/Sec: 0.68,
|
| 3763 |
[[34m2026-01-26 23:22:59[39m] (step=0003521) Train Loss mse: 0.0087, Train Loss ce: 0.0276, Train Steps/Sec: 0.68,
|
| 3764 |
[[34m2026-01-26 23:23:01[39m] (step=0003522) Train Loss mse: 0.0087, Train Loss ce: 0.0529, Train Steps/Sec: 0.68,
|
| 3765 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step4000
|
| 3766 |
-
Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
|
| 3767 |
-
[eval debug] first 3 batch fingerprints:
|
| 3768 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 3769 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 3770 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 3771 |
-
ce_avg: 0.03896063566207886, mse_avg: 0.005920059513300657
|
| 3772 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step4500
|
| 3773 |
-
Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
|
| 3774 |
-
[eval debug] first 3 batch fingerprints:
|
| 3775 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 3776 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 3777 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 3778 |
-
ce_avg: 0.03593315929174423, mse_avg: 0.005641990341246128
|
| 3779 |
[[34m2026-01-26 23:23:02[39m] (step=0003523) Train Loss mse: 0.0064, Train Loss ce: 0.0316, Train Steps/Sec: 0.68,
|
| 3780 |
[[34m2026-01-26 23:23:04[39m] (step=0003524) Train Loss mse: 0.0067, Train Loss ce: 0.0446, Train Steps/Sec: 0.49,
|
| 3781 |
[[34m2026-01-26 23:23:06[39m] (step=0003525) Train Loss mse: 0.0094, Train Loss ce: 0.0456, Train Steps/Sec: 0.69,
|
|
@@ -3806,6 +3792,27 @@ ce_avg: 0.03593315929174423, mse_avg: 0.005641990341246128
|
|
| 3806 |
[[34m2026-01-26 23:23:44[39m] (step=0003550) Train Loss mse: 0.0066, Train Loss ce: 0.0326, Train Steps/Sec: 0.68,
|
| 3807 |
[[34m2026-01-26 23:23:46[39m] (step=0003551) Train Loss mse: 0.0086, Train Loss ce: 0.0259, Train Steps/Sec: 0.48,
|
| 3808 |
[[34m2026-01-26 23:23:48[39m] (step=0003552) Train Loss mse: 0.0122, Train Loss ce: 0.0249, Train Steps/Sec: 0.59,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3809 |
[[34m2026-01-26 23:23:49[39m] (step=0003553) Train Loss mse: 0.0082, Train Loss ce: 0.0280, Train Steps/Sec: 0.68,
|
| 3810 |
[[34m2026-01-26 23:23:51[39m] (step=0003554) Train Loss mse: 0.0045, Train Loss ce: 0.0642, Train Steps/Sec: 0.68,
|
| 3811 |
[[34m2026-01-26 23:23:52[39m] (step=0003555) Train Loss mse: 0.0075, Train Loss ce: 0.0468, Train Steps/Sec: 0.68,
|
|
@@ -5204,13 +5211,6 @@ ce_avg: 0.03593315929174423, mse_avg: 0.005641990341246128
|
|
| 5204 |
[[34m2026-01-27 00:01:03[39m] (step=0004948) Train Loss mse: 0.0052, Train Loss ce: 0.0313, Train Steps/Sec: 0.57,
|
| 5205 |
[[34m2026-01-27 00:01:05[39m] (step=0004949) Train Loss mse: 0.0078, Train Loss ce: 0.0787, Train Steps/Sec: 0.57,
|
| 5206 |
[[34m2026-01-27 00:01:07[39m] (step=0004950) Train Loss mse: 0.0046, Train Loss ce: 0.0306, Train Steps/Sec: 0.59,
|
| 5207 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step5000
|
| 5208 |
-
Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
|
| 5209 |
-
[eval debug] first 3 batch fingerprints:
|
| 5210 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 5211 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 5212 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 5213 |
-
ce_avg: 0.03604895621538162, mse_avg: 0.005358231253921986
|
| 5214 |
[[34m2026-01-27 00:01:08[39m] (step=0004951) Train Loss mse: 0.0074, Train Loss ce: 0.0149, Train Steps/Sec: 0.68,
|
| 5215 |
[[34m2026-01-27 00:01:10[39m] (step=0004952) Train Loss mse: 0.0051, Train Loss ce: 0.0171, Train Steps/Sec: 0.58,
|
| 5216 |
[[34m2026-01-27 00:01:12[39m] (step=0004953) Train Loss mse: 0.0094, Train Loss ce: 0.0230, Train Steps/Sec: 0.68,
|
|
|
|
| 1065 |
[[34m2026-01-26 22:14:01[39m] (step=0001054) Train Loss mse: 0.0175, Train Loss ce: 0.0570, Train Steps/Sec: 0.68,
|
| 1066 |
[[34m2026-01-26 22:14:03[39m] (step=0001055) Train Loss mse: 0.0202, Train Loss ce: 0.0187, Train Steps/Sec: 0.68,
|
| 1067 |
[[34m2026-01-26 22:14:04[39m] (step=0001056) Train Loss mse: 0.0158, Train Loss ce: 0.0599, Train Steps/Sec: 0.68,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1068 |
FullyShardedDataParallel(
|
| 1069 |
(_fsdp_wrapped_module): Bagel(
|
| 1070 |
(language_model): Qwen2ForCausalLM(
|
|
|
|
| 1272 |
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 1273 |
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 1274 |
ce_avg: 0.24203190207481384, mse_avg: 0.011826912872493267
|
| 1275 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step2500
|
| 1276 |
+
Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
|
| 1277 |
+
[eval debug] first 3 batch fingerprints:
|
| 1278 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 1279 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 1280 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 1281 |
+
ce_avg: 0.20800399780273438, mse_avg: 0.011797062121331692
|
| 1282 |
+
[[34m2026-01-26 22:14:06[39m] (step=0001057) Train Loss mse: 0.0142, Train Loss ce: 0.0901, Train Steps/Sec: 0.68,
|
| 1283 |
+
[[34m2026-01-26 22:14:07[39m] (step=0001058) Train Loss mse: 0.0084, Train Loss ce: 0.0419, Train Steps/Sec: 0.67,
|
| 1284 |
+
[[34m2026-01-26 22:14:09[39m] (step=0001059) Train Loss mse: 0.0127, Train Loss ce: 0.0692, Train Steps/Sec: 0.56,
|
| 1285 |
+
[[34m2026-01-26 22:14:11[39m] (step=0001060) Train Loss mse: 0.0078, Train Loss ce: 0.0673, Train Steps/Sec: 0.56,
|
| 1286 |
+
[[34m2026-01-26 22:14:12[39m] (step=0001061) Train Loss mse: 0.0140, Train Loss ce: 0.0492, Train Steps/Sec: 0.58,
|
| 1287 |
+
[[34m2026-01-26 22:14:14[39m] (step=0001062) Train Loss mse: 0.0124, Train Loss ce: 0.0609, Train Steps/Sec: 0.68,
|
| 1288 |
+
[[34m2026-01-26 22:14:15[39m] (step=0001063) Train Loss mse: 0.0176, Train Loss ce: 0.0481, Train Steps/Sec: 0.69,
|
| 1289 |
+
[[34m2026-01-26 22:14:17[39m] (step=0001064) Train Loss mse: 0.0090, Train Loss ce: 0.0420, Train Steps/Sec: 0.68,
|
| 1290 |
+
[[34m2026-01-26 22:14:18[39m] (step=0001065) Train Loss mse: 0.0176, Train Loss ce: 0.0697, Train Steps/Sec: 0.68,
|
| 1291 |
+
[[34m2026-01-26 22:14:20[39m] (step=0001066) Train Loss mse: 0.0142, Train Loss ce: 0.0573, Train Steps/Sec: 0.68,
|
| 1292 |
+
[[34m2026-01-26 22:14:21[39m] (step=0001067) Train Loss mse: 0.0099, Train Loss ce: 0.0944, Train Steps/Sec: 0.68,
|
| 1293 |
+
[[34m2026-01-26 22:14:23[39m] (step=0001068) Train Loss mse: 0.0228, Train Loss ce: 0.0403, Train Steps/Sec: 0.56,
|
| 1294 |
+
[[34m2026-01-26 22:14:25[39m] (step=0001069) Train Loss mse: 0.0124, Train Loss ce: 0.0557, Train Steps/Sec: 0.56,
|
| 1295 |
+
[[34m2026-01-26 22:14:26[39m] (step=0001070) Train Loss mse: 0.0111, Train Loss ce: 0.1486, Train Steps/Sec: 0.58,
|
| 1296 |
+
[[34m2026-01-26 22:14:28[39m] (step=0001071) Train Loss mse: 0.0107, Train Loss ce: 0.0601, Train Steps/Sec: 0.68,
|
| 1297 |
+
[[34m2026-01-26 22:14:29[39m] (step=0001072) Train Loss mse: 0.0119, Train Loss ce: 0.0674, Train Steps/Sec: 0.68,
|
| 1298 |
+
[[34m2026-01-26 22:14:31[39m] (step=0001073) Train Loss mse: 0.0131, Train Loss ce: 0.0523, Train Steps/Sec: 0.68,
|
| 1299 |
+
[[34m2026-01-26 22:14:32[39m] (step=0001074) Train Loss mse: 0.0094, Train Loss ce: 0.0564, Train Steps/Sec: 0.68,
|
| 1300 |
+
[[34m2026-01-26 22:14:34[39m] (step=0001075) Train Loss mse: 0.0093, Train Loss ce: 0.0724, Train Steps/Sec: 0.68,
|
| 1301 |
+
[[34m2026-01-26 22:14:36[39m] (step=0001076) Train Loss mse: 0.0124, Train Loss ce: 0.0541, Train Steps/Sec: 0.56,
|
| 1302 |
+
[[34m2026-01-26 22:14:37[39m] (step=0001077) Train Loss mse: 0.0127, Train Loss ce: 0.0760, Train Steps/Sec: 0.56,
|
| 1303 |
+
[[34m2026-01-26 22:14:39[39m] (step=0001078) Train Loss mse: 0.0197, Train Loss ce: 0.0570, Train Steps/Sec: 0.59,
|
| 1304 |
+
[[34m2026-01-26 22:14:41[39m] (step=0001079) Train Loss mse: 0.0101, Train Loss ce: 0.0584, Train Steps/Sec: 0.67,
|
| 1305 |
+
[[34m2026-01-26 22:14:42[39m] (step=0001080) Train Loss mse: 0.0209, Train Loss ce: 0.0491, Train Steps/Sec: 0.69,
|
| 1306 |
+
[[34m2026-01-26 22:14:43[39m] (step=0001081) Train Loss mse: 0.0376, Train Loss ce: 0.0515, Train Steps/Sec: 0.68,
|
| 1307 |
+
[[34m2026-01-26 22:14:45[39m] (step=0001082) Train Loss mse: 0.0092, Train Loss ce: 0.0488, Train Steps/Sec: 0.68,
|
| 1308 |
+
[[34m2026-01-26 22:14:47[39m] (step=0001083) Train Loss mse: 0.0144, Train Loss ce: 0.0308, Train Steps/Sec: 0.57,
|
| 1309 |
+
[[34m2026-01-26 22:14:48[39m] (step=0001084) Train Loss mse: 0.0096, Train Loss ce: 0.0897, Train Steps/Sec: 0.68,
|
| 1310 |
+
[[34m2026-01-26 22:14:50[39m] (step=0001085) Train Loss mse: 0.0127, Train Loss ce: 0.0523, Train Steps/Sec: 0.57,
|
| 1311 |
+
[[34m2026-01-26 22:14:52[39m] (step=0001086) Train Loss mse: 0.0295, Train Loss ce: 0.0351, Train Steps/Sec: 0.59,
|
| 1312 |
+
[[34m2026-01-26 22:14:53[39m] (step=0001087) Train Loss mse: 0.0099, Train Loss ce: 0.0597, Train Steps/Sec: 0.68,
|
| 1313 |
+
[[34m2026-01-26 22:14:55[39m] (step=0001088) Train Loss mse: 0.0112, Train Loss ce: 0.0307, Train Steps/Sec: 0.68,
|
| 1314 |
+
[[34m2026-01-26 22:14:56[39m] (step=0001089) Train Loss mse: 0.0141, Train Loss ce: 0.0282, Train Steps/Sec: 0.68,
|
| 1315 |
+
[[34m2026-01-26 22:14:57[39m] (step=0001090) Train Loss mse: 0.0187, Train Loss ce: 0.0744, Train Steps/Sec: 0.68,
|
| 1316 |
+
[[34m2026-01-26 22:14:59[39m] (step=0001091) Train Loss mse: 0.0144, Train Loss ce: 0.0371, Train Steps/Sec: 0.56,
|
| 1317 |
+
[[34m2026-01-26 22:15:01[39m] (step=0001092) Train Loss mse: 0.0181, Train Loss ce: 0.0742, Train Steps/Sec: 0.69,
|
| 1318 |
+
[[34m2026-01-26 22:15:02[39m] (step=0001093) Train Loss mse: 0.0119, Train Loss ce: 0.0614, Train Steps/Sec: 0.56,
|
| 1319 |
+
[[34m2026-01-26 22:15:04[39m] (step=0001094) Train Loss mse: 0.0140, Train Loss ce: 0.0572, Train Steps/Sec: 0.59,
|
| 1320 |
+
[[34m2026-01-26 22:15:06[39m] (step=0001095) Train Loss mse: 0.0155, Train Loss ce: 0.0332, Train Steps/Sec: 0.69,
|
| 1321 |
+
[[34m2026-01-26 22:15:07[39m] (step=0001096) Train Loss mse: 0.0166, Train Loss ce: 0.0692, Train Steps/Sec: 0.68,
|
| 1322 |
+
[[34m2026-01-26 22:15:09[39m] (step=0001097) Train Loss mse: 0.0175, Train Loss ce: 0.0863, Train Steps/Sec: 0.68,
|
| 1323 |
+
[[34m2026-01-26 22:15:10[39m] (step=0001098) Train Loss mse: 0.0082, Train Loss ce: 0.0662, Train Steps/Sec: 0.57,
|
| 1324 |
+
[[34m2026-01-26 22:15:12[39m] (step=0001099) Train Loss mse: 0.0139, Train Loss ce: 0.0561, Train Steps/Sec: 0.68,
|
| 1325 |
+
[[34m2026-01-26 22:15:13[39m] (step=0001100) Train Loss mse: 0.0106, Train Loss ce: 0.0682, Train Steps/Sec: 0.68,
|
| 1326 |
+
[[34m2026-01-26 22:15:15[39m] (step=0001101) Train Loss mse: 0.0105, Train Loss ce: 0.0531, Train Steps/Sec: 0.56,
|
| 1327 |
+
[[34m2026-01-26 22:15:17[39m] (step=0001102) Train Loss mse: 0.0123, Train Loss ce: 0.0902, Train Steps/Sec: 0.59,
|
| 1328 |
+
[[34m2026-01-26 22:15:18[39m] (step=0001103) Train Loss mse: 0.0096, Train Loss ce: 0.0452, Train Steps/Sec: 0.68,
|
| 1329 |
[[34m2026-01-26 22:15:20[39m] (step=0001104) Train Loss mse: 0.0135, Train Loss ce: 0.0637, Train Steps/Sec: 0.68,
|
| 1330 |
[[34m2026-01-26 22:15:21[39m] (step=0001105) Train Loss mse: 0.0077, Train Loss ce: 0.0449, Train Steps/Sec: 0.68,
|
| 1331 |
[[34m2026-01-26 22:15:23[39m] (step=0001106) Train Loss mse: 0.0119, Train Loss ce: 0.0483, Train Steps/Sec: 0.56,
|
|
|
|
| 2718 |
[[34m2026-01-26 22:52:50[39m] (step=0002493) Train Loss mse: 0.0070, Train Loss ce: 0.0249, Train Steps/Sec: 0.68,
|
| 2719 |
[[34m2026-01-26 22:52:52[39m] (step=0002494) Train Loss mse: 0.0089, Train Loss ce: 0.0502, Train Steps/Sec: 0.59,
|
| 2720 |
[[34m2026-01-26 22:52:54[39m] (step=0002495) Train Loss mse: 0.0080, Train Loss ce: 0.0490, Train Steps/Sec: 0.43,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2721 |
[[34m2026-01-26 22:52:55[39m] (step=0002496) Train Loss mse: 0.0109, Train Loss ce: 0.0463, Train Steps/Sec: 0.68,
|
| 2722 |
[[34m2026-01-26 22:52:57[39m] (step=0002497) Train Loss mse: 0.0087, Train Loss ce: 0.0305, Train Steps/Sec: 0.59,
|
| 2723 |
[[34m2026-01-26 22:52:59[39m] (step=0002498) Train Loss mse: 0.0078, Train Loss ce: 0.0502, Train Steps/Sec: 0.68,
|
|
|
|
| 2732 |
[[34m2026-01-26 22:55:50[39m] (step=0002504) Train Loss mse: 0.0089, Train Loss ce: 0.0733, Train Steps/Sec: 0.68,
|
| 2733 |
[[34m2026-01-26 22:55:52[39m] (step=0002505) Train Loss mse: 0.0061, Train Loss ce: 0.0662, Train Steps/Sec: 0.68,
|
| 2734 |
[[34m2026-01-26 22:55:53[39m] (step=0002506) Train Loss mse: 0.0068, Train Loss ce: 0.0412, Train Steps/Sec: 0.68,
|
| 2735 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step3000
|
| 2736 |
+
Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
|
| 2737 |
+
[eval debug] first 3 batch fingerprints:
|
| 2738 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 2739 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 2740 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 2741 |
+
ce_avg: 0.04329414293169975, mse_avg: 0.006353132426738739
|
| 2742 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step3500
|
| 2743 |
+
Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
|
| 2744 |
+
[eval debug] first 3 batch fingerprints:
|
| 2745 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 2746 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 2747 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 2748 |
+
ce_avg: 0.04275057464838028, mse_avg: 0.0056571937166154385
|
| 2749 |
[[34m2026-01-26 22:55:55[39m] (step=0002507) Train Loss mse: 0.0085, Train Loss ce: 0.0475, Train Steps/Sec: 0.49,
|
| 2750 |
[[34m2026-01-26 22:55:57[39m] (step=0002508) Train Loss mse: 0.0056, Train Loss ce: 0.0384, Train Steps/Sec: 0.69,
|
| 2751 |
[[34m2026-01-26 22:55:58[39m] (step=0002509) Train Loss mse: 0.0106, Train Loss ce: 0.0332, Train Steps/Sec: 0.68,
|
|
|
|
| 3762 |
[[34m2026-01-26 23:22:58[39m] (step=0003520) Train Loss mse: 0.0051, Train Loss ce: 0.0479, Train Steps/Sec: 0.68,
|
| 3763 |
[[34m2026-01-26 23:22:59[39m] (step=0003521) Train Loss mse: 0.0087, Train Loss ce: 0.0276, Train Steps/Sec: 0.68,
|
| 3764 |
[[34m2026-01-26 23:23:01[39m] (step=0003522) Train Loss mse: 0.0087, Train Loss ce: 0.0529, Train Steps/Sec: 0.68,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3765 |
[[34m2026-01-26 23:23:02[39m] (step=0003523) Train Loss mse: 0.0064, Train Loss ce: 0.0316, Train Steps/Sec: 0.68,
|
| 3766 |
[[34m2026-01-26 23:23:04[39m] (step=0003524) Train Loss mse: 0.0067, Train Loss ce: 0.0446, Train Steps/Sec: 0.49,
|
| 3767 |
[[34m2026-01-26 23:23:06[39m] (step=0003525) Train Loss mse: 0.0094, Train Loss ce: 0.0456, Train Steps/Sec: 0.69,
|
|
|
|
| 3792 |
[[34m2026-01-26 23:23:44[39m] (step=0003550) Train Loss mse: 0.0066, Train Loss ce: 0.0326, Train Steps/Sec: 0.68,
|
| 3793 |
[[34m2026-01-26 23:23:46[39m] (step=0003551) Train Loss mse: 0.0086, Train Loss ce: 0.0259, Train Steps/Sec: 0.48,
|
| 3794 |
[[34m2026-01-26 23:23:48[39m] (step=0003552) Train Loss mse: 0.0122, Train Loss ce: 0.0249, Train Steps/Sec: 0.59,
|
| 3795 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step4000
|
| 3796 |
+
Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
|
| 3797 |
+
[eval debug] first 3 batch fingerprints:
|
| 3798 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 3799 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 3800 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 3801 |
+
ce_avg: 0.03896063566207886, mse_avg: 0.005920059513300657
|
| 3802 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step4500
|
| 3803 |
+
Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
|
| 3804 |
+
[eval debug] first 3 batch fingerprints:
|
| 3805 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 3806 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 3807 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 3808 |
+
ce_avg: 0.03593315929174423, mse_avg: 0.005641990341246128
|
| 3809 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_match_equation_sos_one_image_lr2e_5_ce_ins_step5000
|
| 3810 |
+
Preparing Dataset vlm_gym_match_equation_sos_celoss_evalonce/vlm_gym_match_equation_sos_val
|
| 3811 |
+
[eval debug] first 3 batch fingerprints:
|
| 3812 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 3813 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 3814 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_match_equation_sos_celoss_evalonce'}]
|
| 3815 |
+
ce_avg: 0.03604895621538162, mse_avg: 0.005358231253921986
|
| 3816 |
[[34m2026-01-26 23:23:49[39m] (step=0003553) Train Loss mse: 0.0082, Train Loss ce: 0.0280, Train Steps/Sec: 0.68,
|
| 3817 |
[[34m2026-01-26 23:23:51[39m] (step=0003554) Train Loss mse: 0.0045, Train Loss ce: 0.0642, Train Steps/Sec: 0.68,
|
| 3818 |
[[34m2026-01-26 23:23:52[39m] (step=0003555) Train Loss mse: 0.0075, Train Loss ce: 0.0468, Train Steps/Sec: 0.68,
|
|
|
|
| 5211 |
[[34m2026-01-27 00:01:03[39m] (step=0004948) Train Loss mse: 0.0052, Train Loss ce: 0.0313, Train Steps/Sec: 0.57,
|
| 5212 |
[[34m2026-01-27 00:01:05[39m] (step=0004949) Train Loss mse: 0.0078, Train Loss ce: 0.0787, Train Steps/Sec: 0.57,
|
| 5213 |
[[34m2026-01-27 00:01:07[39m] (step=0004950) Train Loss mse: 0.0046, Train Loss ce: 0.0306, Train Steps/Sec: 0.59,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5214 |
[[34m2026-01-27 00:01:08[39m] (step=0004951) Train Loss mse: 0.0074, Train Loss ce: 0.0149, Train Steps/Sec: 0.68,
|
| 5215 |
[[34m2026-01-27 00:01:10[39m] (step=0004952) Train Loss mse: 0.0051, Train Loss ce: 0.0171, Train Steps/Sec: 0.58,
|
| 5216 |
[[34m2026-01-27 00:01:12[39m] (step=0004953) Train Loss mse: 0.0094, Train Loss ce: 0.0230, Train Steps/Sec: 0.68,
|