Junyi42 commited on
Commit
e7f6e5b
·
verified ·
1 Parent(s): a6f22c7

Upload checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins

Browse files
checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins/0005000/ema.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:73b567c0952d33306f37941298efd623616a326df8cd00072efe55db64f1592d
3
+ size 58429204680
checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins/wandb/offline-run-20260129_223919-checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins-run0/files/config.yaml CHANGED
@@ -0,0 +1,457 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ wandb_version: 1
2
+
3
+ _wandb:
4
+ desc: null
5
+ value:
6
+ python_version: 3.11.10
7
+ cli_version: 0.23.1
8
+ framework: huggingface
9
+ huggingface_version: 4.49.0
10
+ is_jupyter_run: false
11
+ is_kaggle_kernel: false
12
+ start_time: 1769726360
13
+ t:
14
+ 1:
15
+ - 1
16
+ - 5
17
+ - 11
18
+ - 41
19
+ - 49
20
+ - 53
21
+ - 71
22
+ - 105
23
+ 2:
24
+ - 1
25
+ - 5
26
+ - 11
27
+ - 41
28
+ - 49
29
+ - 53
30
+ - 71
31
+ - 105
32
+ 3:
33
+ - 2
34
+ - 4
35
+ - 13
36
+ - 14
37
+ - 37
38
+ - 42
39
+ - 61
40
+ 4: 3.11.10
41
+ 5: 0.23.1
42
+ 6: 4.49.0
43
+ 13: linux-x86_64
44
+ e:
45
+ pfbo5dtoypojh5ejq637djdym70xh66v:
46
+ os: Linux-6.6.93+-x86_64-with-glibc2.35
47
+ python: CPython 3.11.10
48
+ started_at: '2026-01-29T22:39:19.822640Z'
49
+ args:
50
+ - --dataset_config_file
51
+ - ./data/configs/vlm_gym_patch_reassembly_alt_train_celoss.yaml
52
+ - --eval_dataset_config_file
53
+ - ./data/configs/vlm_gym_patch_reassembly_alt_eval_celoss.yaml
54
+ - --viz_dataset_config_file
55
+ - ./data/configs/vlm_gym_patch_reassembly_alt_eval_celoss.yaml
56
+ - --inference_hash_file
57
+ - /home/clouduser/Code/Github/launch_new/hashes_test_set_v10.json
58
+ - --task_name
59
+ - patch_reassembly_v5
60
+ - --instructions_dir
61
+ - ./data/instructions
62
+ - --train_data_dir
63
+ - /home/clouduser/Code/data/gym/patch_reassembly_alt_v5/train/
64
+ - --train_jsonl_path
65
+ - /home/clouduser/Code/data/gym/patch_reassembly_alt_v5/train/
66
+ - --eval_data_dir
67
+ - /home/clouduser/Code/data/gym/patch_reassembly_alt_v5/val/
68
+ - --eval_jsonl_path
69
+ - /home/clouduser/Code/data/gym/patch_reassembly_alt_v5/val/
70
+ - --model_path
71
+ - /home/clouduser/Code/Models/BAGEL-7B-MoT
72
+ - --layer_module
73
+ - Qwen2MoTDecoderLayer
74
+ - --max_latent_size
75
+ - '64'
76
+ - --resume-from
77
+ - /home/clouduser/Code/Models/BAGEL-7B-MoT
78
+ - --finetune_from_hf
79
+ - 'True'
80
+ - --auto_resume
81
+ - 'False'
82
+ - --resume-model-only
83
+ - 'True'
84
+ - --finetune-from-ema
85
+ - 'True'
86
+ - --log_every
87
+ - '1'
88
+ - --lr
89
+ - 2e-5
90
+ - --warmup_steps
91
+ - '300'
92
+ - --lr_scheduler
93
+ - cosine
94
+ - --num_worker
95
+ - '1'
96
+ - --expected_num_tokens
97
+ - '30000'
98
+ - --max_num_tokens
99
+ - '30000'
100
+ - --max_num_tokens_per_sample
101
+ - '30000'
102
+ - --visual_und
103
+ - 'True'
104
+ - --save_every
105
+ - '2500'
106
+ - --total_steps
107
+ - '5000'
108
+ - --text_cond_dropout_prob
109
+ - '0.0'
110
+ - --vae_cond_dropout_prob
111
+ - '0.3'
112
+ - --vit_cond_dropout_prob
113
+ - '0.0'
114
+ - --ema
115
+ - '0.993'
116
+ - --checkpoint_dir
117
+ - /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins
118
+ - --wandb_project
119
+ - bagel
120
+ - --wandb_name
121
+ - checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins
122
+ - --wandb_dir
123
+ - /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins
124
+ - --wandb_offline
125
+ - 'True'
126
+ program: /home/clouduser/Code/Github/unified_world_model/train/pretrain_unified_navit.py
127
+ code_path: train/pretrain_unified_navit.py
128
+ code_path_local: train/pretrain_unified_navit.py
129
+ git:
130
+ remote_url: https://github.com/para-lost/unified_world_model
131
+ commit: 8d7b26b7e552fc87b592cf3be94d93be7aeca2a9
132
+ root: /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins
133
+ host: junyizhang-launch-new-226785731-1-0
134
+ executable: /opt/conda/bin/python3.11
135
+ cpu_count: 48
136
+ cpu_count_logical: 96
137
+ gpu_type: NVIDIA A100-SXM4-80GB
138
+ gpu_count: 8
139
+ disk:
140
+ /:
141
+ total: '1052461830144'
142
+ used: '179558117376'
143
+ memory:
144
+ total: '1437332606976'
145
+ gpu_nvidia:
146
+ - name: NVIDIA A100-SXM4-80GB
147
+ memory_total: '85899345920'
148
+ cuda_cores: 6912
149
+ architecture: Ampere
150
+ uuid: GPU-871da72d-3f3a-ff37-fc78-177139a0eeab
151
+ - name: NVIDIA A100-SXM4-80GB
152
+ memory_total: '85899345920'
153
+ cuda_cores: 6912
154
+ architecture: Ampere
155
+ uuid: GPU-68b93783-6768-a263-778d-fc8d1d8c5cec
156
+ - name: NVIDIA A100-SXM4-80GB
157
+ memory_total: '85899345920'
158
+ cuda_cores: 6912
159
+ architecture: Ampere
160
+ uuid: GPU-fb1c8d7d-0ecc-acab-6ba2-def6b47110f7
161
+ - name: NVIDIA A100-SXM4-80GB
162
+ memory_total: '85899345920'
163
+ cuda_cores: 6912
164
+ architecture: Ampere
165
+ uuid: GPU-0bac335f-a2a0-f393-7ac3-edbd8ca00890
166
+ - name: NVIDIA A100-SXM4-80GB
167
+ memory_total: '85899345920'
168
+ cuda_cores: 6912
169
+ architecture: Ampere
170
+ uuid: GPU-235c5877-2112-1bf2-5697-9a6bb2c1bdc6
171
+ - name: NVIDIA A100-SXM4-80GB
172
+ memory_total: '85899345920'
173
+ cuda_cores: 6912
174
+ architecture: Ampere
175
+ uuid: GPU-53b3bf28-8aed-e7f3-3dd3-92d1557f1088
176
+ - name: NVIDIA A100-SXM4-80GB
177
+ memory_total: '85899345920'
178
+ cuda_cores: 6912
179
+ architecture: Ampere
180
+ uuid: GPU-1b085487-7123-2309-c8da-1e209c198264
181
+ - name: NVIDIA A100-SXM4-80GB
182
+ memory_total: '85899345920'
183
+ cuda_cores: 6912
184
+ architecture: Ampere
185
+ uuid: GPU-e2dcec15-6e8e-e6a6-5845-bbf312b788d7
186
+ cuda_version: '12.2'
187
+ writer_id: pfbo5dtoypojh5ejq637djdym70xh66v
188
+ visual_gen:
189
+ desc: null
190
+ value: true
191
+ visual_und:
192
+ desc: null
193
+ value: true
194
+ results_dir:
195
+ desc: null
196
+ value: results
197
+ checkpoint_dir:
198
+ desc: null
199
+ value: /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins
200
+ wandb_project:
201
+ desc: null
202
+ value: bagel
203
+ wandb_name:
204
+ desc: null
205
+ value: checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins
206
+ wandb_runid:
207
+ desc: null
208
+ value: '0'
209
+ wandb_resume:
210
+ desc: null
211
+ value: allow
212
+ wandb_offline:
213
+ desc: null
214
+ value: true
215
+ wandb_dir:
216
+ desc: null
217
+ value: /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins
218
+ global_seed:
219
+ desc: null
220
+ value: 4396
221
+ auto_resume:
222
+ desc: null
223
+ value: false
224
+ resume_from:
225
+ desc: null
226
+ value: /home/clouduser/Code/Models/BAGEL-7B-MoT
227
+ resume_model_only:
228
+ desc: null
229
+ value: true
230
+ finetune_from_ema:
231
+ desc: null
232
+ value: true
233
+ finetune_from_hf:
234
+ desc: null
235
+ value: true
236
+ log_every:
237
+ desc: null
238
+ value: 1
239
+ save_every:
240
+ desc: null
241
+ value: 2500
242
+ total_steps:
243
+ desc: null
244
+ value: 5000
245
+ warmup_steps:
246
+ desc: null
247
+ value: 300
248
+ lr_scheduler:
249
+ desc: null
250
+ value: cosine
251
+ lr:
252
+ desc: null
253
+ value: 2.0e-05
254
+ min_lr:
255
+ desc: null
256
+ value: 1.0e-07
257
+ beta1:
258
+ desc: null
259
+ value: 0.9
260
+ beta2:
261
+ desc: null
262
+ value: 0.95
263
+ eps:
264
+ desc: null
265
+ value: 1.0e-15
266
+ ema:
267
+ desc: null
268
+ value: 0.993
269
+ max_grad_norm:
270
+ desc: null
271
+ value: 1.0
272
+ timestep_shift:
273
+ desc: null
274
+ value: 1.0
275
+ mse_weight:
276
+ desc: null
277
+ value: 1.0
278
+ ce_weight:
279
+ desc: null
280
+ value: 1.0
281
+ ce_loss_reweighting:
282
+ desc: null
283
+ value: false
284
+ expected_num_tokens:
285
+ desc: null
286
+ value: 30000
287
+ num_replicate:
288
+ desc: null
289
+ value: 1
290
+ num_shard:
291
+ desc: null
292
+ value: 8
293
+ sharding_strategy:
294
+ desc: null
295
+ value: HYBRID_SHARD
296
+ backward_prefetch:
297
+ desc: null
298
+ value: BACKWARD_PRE
299
+ cpu_offload:
300
+ desc: null
301
+ value: false
302
+ freeze_llm:
303
+ desc: null
304
+ value: false
305
+ freeze_vit:
306
+ desc: null
307
+ value: false
308
+ freeze_vae:
309
+ desc: null
310
+ value: true
311
+ freeze_und:
312
+ desc: null
313
+ value: false
314
+ copy_init_moe:
315
+ desc: null
316
+ value: true
317
+ use_flex:
318
+ desc: null
319
+ value: false
320
+ eval_every:
321
+ desc: null
322
+ value: 500
323
+ num_eval_batches:
324
+ desc: null
325
+ value: 20
326
+ use_ema_for_eval:
327
+ desc: null
328
+ value: true
329
+ eval_log_dir:
330
+ desc: null
331
+ value: null
332
+ eval_run_tag:
333
+ desc: null
334
+ value: ''
335
+ viz_every:
336
+ desc: null
337
+ value: 500
338
+ viz_n:
339
+ desc: null
340
+ value: 8
341
+ viz_outdir:
342
+ desc: null
343
+ value: results/viz
344
+ eval_dataset_config_file:
345
+ desc: null
346
+ value: ./data/configs/vlm_gym_patch_reassembly_alt_eval_celoss.yaml
347
+ viz_dataset_config_file:
348
+ desc: null
349
+ value: ./data/configs/vlm_gym_patch_reassembly_alt_eval_celoss.yaml
350
+ eval_print_n:
351
+ desc: null
352
+ value: 3
353
+ save_ema_only:
354
+ desc: null
355
+ value: true
356
+ save_optimizer:
357
+ desc: null
358
+ value: false
359
+ model_path:
360
+ desc: null
361
+ value: /home/clouduser/Code/Models/BAGEL-7B-MoT
362
+ llm_path:
363
+ desc: null
364
+ value: hf/Qwen2.5-0.5B-Instruct/
365
+ llm_qk_norm:
366
+ desc: null
367
+ value: true
368
+ tie_word_embeddings:
369
+ desc: null
370
+ value: false
371
+ layer_module:
372
+ desc: null
373
+ value: Qwen2MoTDecoderLayer
374
+ vae_path:
375
+ desc: null
376
+ value: flux/vae/ae.safetensors
377
+ vit_path:
378
+ desc: null
379
+ value: hf/siglip-so400m-14-980-flash-attn2-navit/
380
+ max_latent_size:
381
+ desc: null
382
+ value: 64
383
+ latent_patch_size:
384
+ desc: null
385
+ value: 2
386
+ vit_patch_size:
387
+ desc: null
388
+ value: 14
389
+ vit_max_num_patch_per_side:
390
+ desc: null
391
+ value: 70
392
+ connector_act:
393
+ desc: null
394
+ value: gelu_pytorch_tanh
395
+ interpolate_pos:
396
+ desc: null
397
+ value: false
398
+ vit_select_layer:
399
+ desc: null
400
+ value: -2
401
+ vit_rope:
402
+ desc: null
403
+ value: false
404
+ text_cond_dropout_prob:
405
+ desc: null
406
+ value: 0.0
407
+ vae_cond_dropout_prob:
408
+ desc: null
409
+ value: 0.3
410
+ vit_cond_dropout_prob:
411
+ desc: null
412
+ value: 0.0
413
+ dataset_config_file:
414
+ desc: null
415
+ value: ./data/configs/vlm_gym_patch_reassembly_alt_train_celoss.yaml
416
+ train_data_dir:
417
+ desc: null
418
+ value: /home/clouduser/Code/data/gym/patch_reassembly_alt_v5/train/
419
+ train_jsonl_path:
420
+ desc: null
421
+ value: /home/clouduser/Code/data/gym/patch_reassembly_alt_v5/train/
422
+ eval_data_dir:
423
+ desc: null
424
+ value: /home/clouduser/Code/data/gym/patch_reassembly_alt_v5/val/
425
+ eval_jsonl_path:
426
+ desc: null
427
+ value: /home/clouduser/Code/data/gym/patch_reassembly_alt_v5/val/
428
+ inference_hash_file:
429
+ desc: null
430
+ value: /home/clouduser/Code/Github/launch_new/hashes_test_set_v10.json
431
+ task_name:
432
+ desc: null
433
+ value: patch_reassembly_v5
434
+ instructions_dir:
435
+ desc: null
436
+ value: ./data/instructions
437
+ prefetch_factor:
438
+ desc: null
439
+ value: 2
440
+ num_workers:
441
+ desc: null
442
+ value: 1
443
+ max_num_tokens_per_sample:
444
+ desc: null
445
+ value: 30000
446
+ max_num_tokens:
447
+ desc: null
448
+ value: 30000
449
+ prefer_buffer_before:
450
+ desc: null
451
+ value: 16384
452
+ max_buffer_size:
453
+ desc: null
454
+ value: 50
455
+ data_seed:
456
+ desc: null
457
+ value: 42
checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins/wandb/offline-run-20260129_223919-checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins-run0/files/output.log CHANGED
@@ -829,6 +829,22 @@ wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
829
  [2026-01-30 03:09:56] (step=0000818) Train Loss mse: 0.0246, Train Loss ce: 0.1153, Train Steps/Sec: 0.06,
830
  [2026-01-30 03:10:18] (step=0000819) Train Loss mse: 0.0270, Train Loss ce: 0.0949, Train Steps/Sec: 0.05,
831
  [2026-01-30 03:10:38] (step=0000820) Train Loss mse: 0.0203, Train Loss ce: 0.0989, Train Steps/Sec: 0.05,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
832
  FullyShardedDataParallel(
833
  (_fsdp_wrapped_module): Bagel(
834
  (language_model): Qwen2ForCausalLM(
@@ -1029,29 +1045,6 @@ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_evalonce/vlm_gym_patch_rea
1029
  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
1030
  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
1031
  ce_avg: 0.08397598564624786, mse_avg: 0.015734810382127762
1032
- base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins_step2000
1033
- Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_evalonce/vlm_gym_patch_reassembly_alt_val
1034
- [eval debug] first 3 batch fingerprints:
1035
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
1036
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
1037
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
1038
- ce_avg: 0.08462081849575043, mse_avg: 0.015557804144918919
1039
- [2026-01-30 03:10:51] (step=0000821) Train Loss mse: 0.0239, Train Loss ce: 0.0780, Train Steps/Sec: 0.07,
1040
- [2026-01-30 03:11:08] (step=0000822) Train Loss mse: 0.0224, Train Loss ce: 0.0904, Train Steps/Sec: 0.06,
1041
- [2026-01-30 03:11:25] (step=0000823) Train Loss mse: 0.0222, Train Loss ce: 0.1022, Train Steps/Sec: 0.06,
1042
- [2026-01-30 03:11:44] (step=0000824) Train Loss mse: 0.0243, Train Loss ce: 0.1039, Train Steps/Sec: 0.05,
1043
- [2026-01-30 03:12:02] (step=0000825) Train Loss mse: 0.0204, Train Loss ce: 0.0916, Train Steps/Sec: 0.06,
1044
- [2026-01-30 03:12:16] (step=0000826) Train Loss mse: 0.0201, Train Loss ce: 0.0970, Train Steps/Sec: 0.07,
1045
- [2026-01-30 03:12:37] (step=0000827) Train Loss mse: 0.0192, Train Loss ce: 0.1047, Train Steps/Sec: 0.05,
1046
- [2026-01-30 03:12:57] (step=0000828) Train Loss mse: 0.0214, Train Loss ce: 0.1020, Train Steps/Sec: 0.05,
1047
- [2026-01-30 03:13:18] (step=0000829) Train Loss mse: 0.0250, Train Loss ce: 0.0784, Train Steps/Sec: 0.05,
1048
- [2026-01-30 03:13:36] (step=0000830) Train Loss mse: 0.0287, Train Loss ce: 0.0897, Train Steps/Sec: 0.06,
1049
- [2026-01-30 03:13:57] (step=0000831) Train Loss mse: 0.0190, Train Loss ce: 0.0986, Train Steps/Sec: 0.05,
1050
- [2026-01-30 03:14:13] (step=0000832) Train Loss mse: 0.0256, Train Loss ce: 0.0924, Train Steps/Sec: 0.06,
1051
- [2026-01-30 03:14:25] (step=0000833) Train Loss mse: 0.0232, Train Loss ce: 0.0824, Train Steps/Sec: 0.08,
1052
- [2026-01-30 03:14:42] (step=0000834) Train Loss mse: 0.0249, Train Loss ce: 0.0951, Train Steps/Sec: 0.06,
1053
- [2026-01-30 03:14:58] (step=0000835) Train Loss mse: 0.0275, Train Loss ce: 0.0966, Train Steps/Sec: 0.06,
1054
- [2026-01-30 03:15:19] (step=0000836) Train Loss mse: 0.0208, Train Loss ce: 0.1061, Train Steps/Sec: 0.05,
1055
  [2026-01-30 03:15:35] (step=0000837) Train Loss mse: 0.0218, Train Loss ce: 0.0916, Train Steps/Sec: 0.06,
1056
  [2026-01-30 03:15:53] (step=0000838) Train Loss mse: 0.0238, Train Loss ce: 0.1028, Train Steps/Sec: 0.05,
1057
  [2026-01-30 03:16:12] (step=0000839) Train Loss mse: 0.0211, Train Loss ce: 0.1001, Train Steps/Sec: 0.05,
@@ -2162,6 +2155,20 @@ ce_avg: 0.08462081849575043, mse_avg: 0.015557804144918919
2162
  [2026-01-30 09:06:52] (step=0001944) Train Loss mse: 0.0175, Train Loss ce: 0.0748, Train Steps/Sec: 0.06,
2163
  [2026-01-30 09:07:12] (step=0001945) Train Loss mse: 0.0267, Train Loss ce: 0.0836, Train Steps/Sec: 0.05,
2164
  [2026-01-30 09:07:26] (step=0001946) Train Loss mse: 0.0200, Train Loss ce: 0.0730, Train Steps/Sec: 0.07,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2165
  [2026-01-30 09:07:46] (step=0001947) Train Loss mse: 0.0181, Train Loss ce: 0.0834, Train Steps/Sec: 0.05,
2166
  [2026-01-30 09:08:06] (step=0001948) Train Loss mse: 0.0197, Train Loss ce: 0.0996, Train Steps/Sec: 0.05,
2167
  [2026-01-30 09:08:26] (step=0001949) Train Loss mse: 0.0215, Train Loss ce: 0.0908, Train Steps/Sec: 0.05,
@@ -2199,13 +2206,6 @@ ce_avg: 0.08462081849575043, mse_avg: 0.015557804144918919
2199
  [2026-01-30 09:18:42] (step=0001981) Train Loss mse: 0.0167, Train Loss ce: 0.0742, Train Steps/Sec: 0.06,
2200
  [2026-01-30 09:19:04] (step=0001982) Train Loss mse: 0.0220, Train Loss ce: 0.0905, Train Steps/Sec: 0.05,
2201
  [2026-01-30 09:19:20] (step=0001983) Train Loss mse: 0.0224, Train Loss ce: 0.0735, Train Steps/Sec: 0.06,
2202
- base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins_step2500
2203
- Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_evalonce/vlm_gym_patch_reassembly_alt_val
2204
- [eval debug] first 3 batch fingerprints:
2205
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
2206
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
2207
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
2208
- ce_avg: 0.3942395746707916, mse_avg: 0.015481945127248764
2209
  [2026-01-30 09:19:42] (step=0001984) Train Loss mse: 0.0192, Train Loss ce: 0.0788, Train Steps/Sec: 0.05,
2210
  [2026-01-30 09:20:04] (step=0001985) Train Loss mse: 0.0141, Train Loss ce: 0.0955, Train Steps/Sec: 0.05,
2211
  [2026-01-30 09:20:25] (step=0001986) Train Loss mse: 0.0198, Train Loss ce: 0.0789, Train Steps/Sec: 0.05,
@@ -3072,27 +3072,6 @@ ce_avg: 0.3942395746707916, mse_avg: 0.015481945127248764
3072
  [2026-01-30 13:55:59] (step=0002844) Train Loss mse: 0.0198, Train Loss ce: 0.0788, Train Steps/Sec: 0.05,
3073
  [2026-01-30 13:56:17] (step=0002845) Train Loss mse: 0.0168, Train Loss ce: 0.0757, Train Steps/Sec: 0.05,
3074
  [2026-01-30 13:56:37] (step=0002846) Train Loss mse: 0.0185, Train Loss ce: 0.0793, Train Steps/Sec: 0.05,
3075
- [2026-01-30 13:56:59
3076
- base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins_step3000
3077
- Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_evalonce/vlm_gym_patch_reassembly_alt_val
3078
- [eval debug] first 3 batch fingerprints:
3079
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
3080
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
3081
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
3082
- ce_avg: 0.07592044025659561, mse_avg: 0.013209872879087925
3083
- base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins_step3500
3084
- Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_evalonce/vlm_gym_patch_reassembly_alt_val
3085
- [eval debug] first 3 batch fingerprints:
3086
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
3087
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
3088
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
3089
- ce_avg: 0.07461568713188171, mse_avg: 0.013009372167289257
3090
- base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins_step4000
3091
- Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_evalonce/vlm_gym_patch_reassembly_alt_val
3092
- [eval debug] first 3 batch fingerprints:
3093
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
3094
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
3095
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
3096
  [2026-01-30 13:56:59] (step=0002847) Train Loss mse: 0.0204, Train Loss ce: 0.0870, Train Steps/Sec: 0.05,
3097
  [2026-01-30 13:57:16] (step=0002848) Train Loss mse: 0.0180, Train Loss ce: 0.0854, Train Steps/Sec: 0.06,
3098
  [2026-01-30 13:57:38] (step=0002849) Train Loss mse: 0.0181, Train Loss ce: 0.0848, Train Steps/Sec: 0.05,
@@ -3332,17 +3311,27 @@ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_evalonce/vlm_gym_patch_rea
3332
  [2026-01-30 15:14:49] (step=0003083) Train Loss mse: 0.0170, Train Loss ce: 0.0682, Train Steps/Sec: 0.05,
3333
  [2026-01-30 15:15:11] (step=0003084) Train Loss mse: 0.0136, Train Loss ce: 0.0838, Train Steps/Sec: 0.04,
3334
  [2026-01-30 15:15:34] (step=0003085) Train Loss mse: 0.0167, Train Loss ce: 0.0730, Train Steps/Sec: 0.04,
3335
- [2026-01-30 15:15:58] (step=0003086) Train Loss mse: 0.0185, Train Loss ce: 0.0791, Train Steps/Sec: 0.04,
3336
- [2026-01-30 15:16:17] (step=0003087) Train Loss mse: 0.0131, Train Loss ce: 0.0739, Train Steps/Sec: 0.05,
3337
- [2026-01-30 15:16:38] (step=0003088) Train Loss mse: 0.0154, Train Loss ce: 0.0845, Train Steps/Sec: 0.05,
3338
- [2026-01-30 15:17:00] (step=0003089) Train Loss mse: 0.0206, Train Loss ce: 0.0767, Train Steps/Sec: 0.05,
3339
- [2026-01-30 15:17:16] (step=0003090) Train Loss mse: 0.0185, Train Loss ce: 0.0730, Train Steps/Sec: 0.06,
3340
- [2026-01-30 15:17:34] (step=0003091) Train Loss mse: 0.0181, Train Loss ce: 0.0746, Train Steps/Sec: 0.05,
3341
- [2026-01-30 15:17:55] (step=0003092) Train Loss mse: 0.0198, Train Loss ce: 0.0845, Train Steps/Sec: 0.05,
3342
- [2026-01-30 15:18:16] (step=0003093) Train Loss mse: 0.0185, Train Loss ce: 0.0748, Train Steps/Sec: 0.05,
3343
- [2026-01-30 15:18:33] (step=0003094) Train Loss mse: 0.0219, Train Loss ce: 0.0700, Train Steps/Sec: 0.06,
3344
- [2026-01-30 15:18:52] (step=0003095) Train Loss mse: 0.0138, Train Loss ce: 0.0704, Train Steps/Sec: 0.05,
3345
- [2026-01-30 15:19:12] (step=0003096) Train Loss mse: 0.0226, Train Loss ce: 0.0813, Train Steps/Sec: 0.05,
 
 
 
 
 
 
 
 
 
 
3346
  [2026-01-30 15:19:30] (step=0003097) Train Loss mse: 0.0185, Train Loss ce: 0.0726, Train Steps/Sec: 0.05,
3347
  [2026-01-30 15:19:48] (step=0003098) Train Loss mse: 0.0155, Train Loss ce: 0.0671, Train Steps/Sec: 0.06,
3348
  [2026-01-30 15:20:07] (step=0003099) Train Loss mse: 0.0187, Train Loss ce: 0.0632, Train Steps/Sec: 0.05,
@@ -4255,53 +4244,17 @@ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_evalonce/vlm_gym_patch_rea
4255
  [2026-01-30 20:11:11] (step=0004006) Train Loss mse: 0.0208, Train Loss ce: 0.0885, Train Steps/Sec: 0.04,
4256
  [2026-01-30 20:11:29] (step=0004007) Train Loss mse: 0.0143, Train Loss ce: 0.0732, Train Steps/Sec: 0.05,
4257
  [2026-01-30 20:11:49] (step=0004008) Train Loss mse: 0.0147, Train Loss ce: 0.0707, Train Steps/Sec: 0.05,
4258
- [2026-01-30 20:12:06] (step=0004009) Train Loss mse: 0.0179, Train Loss ce: 0.0831, Train Steps/Sec: 0.06,
4259
- [2026-01-30 20:12:24] (step=0004010) Train Loss mse: 0.0206, Train Loss ce: 0.0700, Train Steps/Sec: 0.05,
4260
- [2026-01-30 20:12:41] (step=0004011) Train Loss mse: 0.0176, Train Loss ce: 0.0685, Train Steps/Sec: 0.06,
4261
- [2026-01-30 20:13:00] (step=0004012) Train Loss mse: 0.0187, Train Loss ce: 0.0644, Train Steps/Sec: 0.05,
4262
- [2026-01-30 20:13:21] (step=0004013) Train Loss mse: 0.0137, Train Loss ce: 0.0782, Train Steps/Sec: 0.05,
4263
- [2026-01-30 20:13:36] (step=0004014) Train Loss mse: 0.0164, Train Loss ce: 0.0727, Train Steps/Sec: 0.07,
4264
- [2026-01-30 20:13:51] (step=0004015) Train Loss mse: 0.0166, Train Loss ce: 0.0657, Train Steps/Sec: 0.07,
4265
- [2026-01-30 20:14:07] (step=0004016) Train Loss mse: 0.0183, Train Loss ce: 0.0631, Train Steps/Sec: 0.06,
4266
- [2026-01-30 20:14:29] (step=0004017) Train Loss mse: 0.0177, Train Loss ce: 0.0605, Train Steps/Sec: 0.05,
4267
- [2026-01-30 20:14:52] (step=0004018) Train Loss mse: 0.0166, Train Loss ce: 0.0792, Train Steps/Sec: 0.04,
4268
- [2026-01-30 20:15:14] (step=0004019) Train Loss mse: 0.0147, Train Loss ce: 0.0865, Train Steps/Sec: 0.05,
4269
- [2026-01-30 20:15:30] (step=0004020) Train Loss mse: 0.0176, Train Loss ce: 0.0630, Train Steps/Sec: 0.06,
4270
- [2026-01-30 20:15:48] (step=0004021) Train Loss mse: 0.0191, Train Loss ce: 0.0688, Train Steps/Sec: 0.06,
4271
- [2026-01-30 20:16:10] (step=0004022) Train Loss mse: 0.0132, Train Loss ce: 0.0891, Train Steps/Sec: 0.05,
4272
- [2026-01-30 20:16:32] (step=0004023) Train Loss mse: 0.0154, Train Loss ce: 0.0771, Train Steps/Sec: 0.04,
4273
- [2026-01-30 20:16:54] (step=0004024) Train Loss mse: 0.0163, Train Loss ce: 0.0733, Train Steps/Sec: 0.05,
4274
- [2026-01-30 20:17:09] (step=0004025) Train Loss mse: 0.0140, Train Loss ce: 0.0760, Train Steps/Sec: 0.06,
4275
- [2026-01-30 20:17:26] (step=0004026) Train Loss mse: 0.0225, Train Loss ce: 0.0853, Train Steps/Sec: 0.06,
4276
- [2026-01-30 20:17:48] (step=0004027) Train Loss mse: 0.0173, Train Loss ce: 0.0815, Train Steps/Sec: 0.04,
4277
- [2026-01-30 20:18:03] (step=0004028) Train Loss mse: 0.0156, Train Loss ce: 0.0705, Train Steps/Sec: 0.07,
4278
- [2026-01-30 20:18:25] (step=0004029) Train Loss mse: 0.0211, Train Loss ce: 0.0749, Train Steps/Sec: 0.05,
4279
- [2026-01-30 20:18:44] (step=0004030) Train Loss mse: 0.0201, Train Loss ce: 0.0623, Train Steps/Sec: 0.05,
4280
- [2026-01-30 20:19:01] (step=0004031) Train Loss mse: 0.0188, Train Loss ce: 0.0554, Train Steps/Sec: 0.06,
4281
- [2026-01-30 20:19:20] (step=0004032) Train Loss mse: 0.0122, Train Loss ce: 0.0831, Train Steps/Sec: 0.05,
4282
- [2026-01-30 20:19:37] (step=0004033) Train Loss mse: 0.0150, Train Loss ce: 0.0704, Train Steps/Sec: 0.06,
4283
- [2026-01-30 20:19:53] (step=0004034) Train Loss mse: 0.0151, Train Loss ce: 0.0717, Train Steps/Sec: 0.06,
4284
- [2026-01-30 20:20:11] (step=0004035) Train Loss mse: 0.0177, Train Loss ce: 0.0796, Train Steps/Sec: 0.06,
4285
- [2026-01-30 20:20:31] (step=0004036) Train Loss mse: 0.0131, Train Loss ce: 0.0839, Train Steps/Sec: 0.05,
4286
- [2026-01-30 20:20:50] (step=0004037) Train Loss mse: 0.0204, Train Loss ce: 0.0773, Train Steps/Sec: 0.05,
4287
- [2026-01-30 20:21:10] (step=0004038) Train Loss mse: 0.0142, Train Loss ce: 0.0768, Train Steps/Sec: 0.05,
4288
- [2026-01-30 20:21:32] (step=0004039) Train Loss mse: 0.0164, Train Loss ce: 0.0707, Train Steps/Sec: 0.05,
4289
- [2026-01-30 20:21:49] (step=0004040) Train Loss mse: 0.0159, Train Loss ce: 0.0670, Train Steps/Sec: 0.06,
4290
- [2026-01-30 20:22:11] (step=0004041) Train Loss mse: 0.0131, Train Loss ce: 0.0709, Train Steps/Sec: 0.05,
4291
- [2026-01-30 20:22:31] (step=0004042) Train Loss mse: 0.0176, Train Loss ce: 0.0648, Train Steps/Sec: 0.05,
4292
- [2026-01-30 20:22:50] (step=0004043) Train Loss mse: 0.0205, Train Loss ce: 0.0746, Train Steps/Sec: 0.05,
4293
- [2026-01-30 20:23:10] (step=0004044) Train Loss mse: 0.0144, Train Loss ce: 0.0847, Train Steps/Sec: 0.05,
4294
- [2026-01-30 20:23:24] (step=0004045) Train Loss mse: 0.0179, Train Loss ce: 0.0575, Train Steps/Sec: 0.07,
4295
- [2026-01-30 20:23:41] (step=0004046) Train Loss mse: 0.0134, Train Loss ce: 0.0550, Train Steps/Sec: 0.06,
4296
- [2026-01-30 20:23:57] (step=0004047) Train Loss mse: 0.0185, Train Loss ce: 0.0675, Train Steps/Sec: 0.06,
4297
- [2026-01-30 20:24:14] (step=0004048) Train Loss mse: 0.0152, Train Loss ce: 0.0608, Train Steps/Sec: 0.06,
4298
- [2026-01-30 20:24:30] (step=0004049) Train Loss mse: 0.0157, Train Loss ce: 0.0600, Train Steps/Sec: 0.06,
4299
- [2026-01-30 20:24:52] (step=0004050) Train Loss mse: 0.0159, Train Loss ce: 0.0711, Train Steps/Sec: 0.05,
4300
- [2026-01-30 20:25:11] (step=0004051) Train Loss mse: 0.0189, Train Loss ce: 0.0743, Train Steps/Sec: 0.05,
4301
- [2026-01-30 20:25:31] (step=0004052) Train Loss mse: 0.0175, Train Loss ce: 0.0711, Train Steps/Sec: 0.05,
4302
- [2026-01-30 20:25:47] (step=0004053) Train Loss mse: 0.0159, Train Loss ce: 0.0579, Train Steps/Sec: 0.06,
4303
- [2026-01-30 20:26:05] (step=0004054) Train Loss mse: 0.0185, Train Loss ce: 0.0781, Train Steps/Sec: 0.06,
4304
- [2026-01-30 20:26:23] (step=0004055) Train Loss mse: 0.0159, Train Loss ce: 0.0670, Train Steps/Sec: 0.06,
4305
  base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins_step4500
4306
  Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_evalonce/vlm_gym_patch_reassembly_alt_val
4307
  [eval debug] first 3 batch fingerprints:
@@ -4315,59 +4268,6 @@ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_evalonce/vlm_gym_patch_rea
4315
  fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
4316
  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
4317
  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
4318
- ce_avg: 0.815917432308197, mse_avg: 0.012591413222253323
4319
- [2026-01-30 20:26:41] (step=0004056) Train Loss mse: 0.0173, Train Loss ce: 0.0698, Train Steps/Sec: 0.06,
4320
- [2026-01-30 20:26:59] (step=0004057) Train Loss mse: 0.0169, Train Loss ce: 0.0774, Train Steps/Sec: 0.05,
4321
- [2026-01-30 20:27:18] (step=0004058) Train Loss mse: 0.0154, Train Loss ce: 0.0729, Train Steps/Sec: 0.06,
4322
- [2026-01-30 20:27:35] (step=0004059) Train Loss mse: 0.0184, Train Loss ce: 0.0678, Train Steps/Sec: 0.06,
4323
- [2026-01-30 20:27:52] (step=0004060) Train Loss mse: 0.0179, Train Loss ce: 0.0634, Train Steps/Sec: 0.06,
4324
- [2026-01-30 20:28:11] (step=0004061) Train Loss mse: 0.0187, Train Loss ce: 0.0742, Train Steps/Sec: 0.05,
4325
- [2026-01-30 20:28:28] (step=0004062) Train Loss mse: 0.0162, Train Loss ce: 0.0733, Train Steps/Sec: 0.06,
4326
- [2026-01-30 20:28:44] (step=0004063) Train Loss mse: 0.0147, Train Loss ce: 0.0743, Train Steps/Sec: 0.06,
4327
- [2026-01-30 20:29:04] (step=0004064) Train Loss mse: 0.0178, Train Loss ce: 0.0793, Train Steps/Sec: 0.05,
4328
- [2026-01-30 20:29:17] (step=0004065) Train Loss mse: 0.0224, Train Loss ce: 0.0725, Train Steps/Sec: 0.08,
4329
- [2026-01-30 20:29:42] (step=0004066) Train Loss mse: 0.0126, Train Loss ce: 0.0976, Train Steps/Sec: 0.04,
4330
- [2026-01-30 20:30:02] (step=0004067) Train Loss mse: 0.0172, Train Loss ce: 0.0725, Train Steps/Sec: 0.05,
4331
- [2026-01-30 20:30:22] (step=0004068) Train Loss mse: 0.0154, Train Loss ce: 0.0704, Train Steps/Sec: 0.05,
4332
- [2026-01-30 20:30:42] (step=0004069) Train Loss mse: 0.0148, Train Loss ce: 0.0818, Train Steps/Sec: 0.05,
4333
- [2026-01-30 20:30:59] (step=0004070) Train Loss mse: 0.0164, Train Loss ce: 0.0676, Train Steps/Sec: 0.06,
4334
- [2026-01-30 20:31:14] (step=0004071) Train Loss mse: 0.0175, Train Loss ce: 0.0627, Train Steps/Sec: 0.06,
4335
- [2026-01-30 20:31:35] (step=0004072) Train Loss mse: 0.0143, Train Loss ce: 0.0685, Train Steps/Sec: 0.05,
4336
- [2026-01-30 20:31:51] (step=0004073) Train Loss mse: 0.0224, Train Loss ce: 0.0662, Train Steps/Sec: 0.06,
4337
- [2026-01-30 20:32:07] (step=0004074) Train Loss mse: 0.0171, Train Loss ce: 0.0672, Train Steps/Sec: 0.06,
4338
- [2026-01-30 20:32:24] (step=0004075) Train Loss mse: 0.0154, Train Loss ce: 0.0681, Train Steps/Sec: 0.06,
4339
- [2026-01-30 20:32:40] (step=0004076) Train Loss mse: 0.0194, Train Loss ce: 0.0653, Train Steps/Sec: 0.06,
4340
- [2026-01-30 20:32:55] (step=0004077) Train Loss mse: 0.0226, Train Loss ce: 0.0734, Train Steps/Sec: 0.07,
4341
- [2026-01-30 20:33:13] (step=0004078) Train Loss mse: 0.0161, Train Loss ce: 0.0689, Train Steps/Sec: 0.06,
4342
- [2026-01-30 20:33:28] (step=0004079) Train Loss mse: 0.0197, Train Loss ce: 0.0657, Train Steps/Sec: 0.07,
4343
- [2026-01-30 20:33:48] (step=0004080) Train Loss mse: 0.0163, Train Loss ce: 0.0757, Train Steps/Sec: 0.05,
4344
- [2026-01-30 20:34:05] (step=0004081) Train Loss mse: 0.0170, Train Loss ce: 0.0834, Train Steps/Sec: 0.06,
4345
- [2026-01-30 20:34:18] (step=0004082) Train Loss mse: 0.0207, Train Loss ce: 0.0604, Train Steps/Sec: 0.08,
4346
- [2026-01-30 20:34:33] (step=0004083) Train Loss mse: 0.0176, Train Loss ce: 0.0622, Train Steps/Sec: 0.07,
4347
- [2026-01-30 20:34:45] (step=0004084) Train Loss mse: 0.0172, Train Loss ce: 0.0625, Train Steps/Sec: 0.08,
4348
- [2026-01-30 20:35:07] (step=0004085) Train Loss mse: 0.0128, Train Loss ce: 0.0653, Train Steps/Sec: 0.05,
4349
- [2026-01-30 20:35:27] (step=0004086) Train Loss mse: 0.0206, Train Loss ce: 0.0769, Train Steps/Sec: 0.05,
4350
- [2026-01-30 20:35:47] (step=0004087) Train Loss mse: 0.0148, Train Loss ce: 0.0588, Train Steps/Sec: 0.05,
4351
- [2026-01-30 20:36:08] (step=0004088) Train Loss mse: 0.0203, Train Loss ce: 0.0695, Train Steps/Sec: 0.05,
4352
- [2026-01-30 20:36:26] (step=0004089) Train Loss mse: 0.0194, Train Loss ce: 0.0745, Train Steps/Sec: 0.06,
4353
- [2026-01-30 20:36:45] (step=0004090) Train Loss mse: 0.0174, Train Loss ce: 0.0675, Train Steps/Sec: 0.05,
4354
- [2026-01-30 20:37:08] (step=0004091) Train Loss mse: 0.0142, Train Loss ce: 0.0756, Train Steps/Sec: 0.04,
4355
- [2026-01-30 20:37:27] (step=0004092) Train Loss mse: 0.0145, Train Loss ce: 0.0764, Train Steps/Sec: 0.05,
4356
- [2026-01-30 20:37:44] (step=0004093) Train Loss mse: 0.0168, Train Loss ce: 0.0729, Train Steps/Sec: 0.06,
4357
- [2026-01-30 20:37:59] (step=0004094) Train Loss mse: 0.0173, Train Loss ce: 0.0667, Train Steps/Sec: 0.07,
4358
- [2026-01-30 20:38:10] (step=0004095) Train Loss mse: 0.0189, Train Loss ce: 0.0711, Train Steps/Sec: 0.09,
4359
- [2026-01-30 20:38:33] (step=0004096) Train Loss mse: 0.0143, Train Loss ce: 0.0711, Train Steps/Sec: 0.04,
4360
- [2026-01-30 20:38:56] (step=0004097) Train Loss mse: 0.0149, Train Loss ce: 0.0693, Train Steps/Sec: 0.04,
4361
- [2026-01-30 20:39:12] (step=0004098) Train Loss mse: 0.0169, Train Loss ce: 0.0657, Train Steps/Sec: 0.06,
4362
- [2026-01-30 20:39:33] (step=0004099) Train Loss mse: 0.0142, Train Loss ce: 0.0780, Train Steps/Sec: 0.05,
4363
- [2026-01-30 20:39:52] (step=0004100) Train Loss mse: 0.0160, Train Loss ce: 0.0752, Train Steps/Sec: 0.05,
4364
- [2026-01-30 20:40:15] (step=0004101) Train Loss mse: 0.0160, Train Loss ce: 0.0739, Train Steps/Sec: 0.04,
4365
- [2026-01-30 20:40:36] (step=0004102) Train Loss mse: 0.0144, Train Loss ce: 0.0622, Train Steps/Sec: 0.05,
4366
- [2026-01-30 20:40:57] (step=0004103) Train Loss mse: 0.0170, Train Loss ce: 0.0804, Train Steps/Sec: 0.05,
4367
- [2026-01-30 20:41:17] (step=0004104) Train Loss mse: 0.0181, Train Loss ce: 0.0598, Train Steps/Sec: 0.05,
4368
- [2026-01-30 20:41:38] (step=0004105) Train Loss mse: 0.0121, Train Loss ce: 0.0770, Train Steps/Sec: 0.05,
4369
- [2026-01-30 20:41:55] (step=0004106) Train Loss mse: 0.0152, Train Loss ce: 0.0821, Train Steps/Sec: 0.06,
4370
- [2026-01-30 20:42:13] (step=0004107) Train Loss mse: 0.0151, Train Loss ce: 0.0657, Train Steps/Sec: 0.06,
4371
  [2026-01-30 20:42:28] (step=0004108) Train Loss mse: 0.0194, Train Loss ce: 0.0764, Train Steps/Sec: 0.07,
4372
  [2026-01-30 20:42:48] (step=0004109) Train Loss mse: 0.0146, Train Loss ce: 0.0788, Train Steps/Sec: 0.05,
4373
  [2026-01-30 20:43:07] (step=0004110) Train Loss mse: 0.0180, Train Loss ce: 0.0578, Train Steps/Sec: 0.05,
 
829
  [2026-01-30 03:09:56] (step=0000818) Train Loss mse: 0.0246, Train Loss ce: 0.1153, Train Steps/Sec: 0.06,
830
  [2026-01-30 03:10:18] (step=0000819) Train Loss mse: 0.0270, Train Loss ce: 0.0949, Train Steps/Sec: 0.05,
831
  [2026-01-30 03:10:38] (step=0000820) Train Loss mse: 0.0203, Train Loss ce: 0.0989, Train Steps/Sec: 0.05,
832
+ [2026-01-30 03:10:51] (step=0000821) Train Loss mse: 0.0239, Train Loss ce: 0.0780, Train Steps/Sec: 0.07,
833
+ [2026-01-30 03:11:08] (step=0000822) Train Loss mse: 0.0224, Train Loss ce: 0.0904, Train Steps/Sec: 0.06,
834
+ [2026-01-30 03:11:25] (step=0000823) Train Loss mse: 0.0222, Train Loss ce: 0.1022, Train Steps/Sec: 0.06,
835
+ [2026-01-30 03:11:44] (step=0000824) Train Loss mse: 0.0243, Train Loss ce: 0.1039, Train Steps/Sec: 0.05,
836
+ [2026-01-30 03:12:02] (step=0000825) Train Loss mse: 0.0204, Train Loss ce: 0.0916, Train Steps/Sec: 0.06,
837
+ [2026-01-30 03:12:16] (step=0000826) Train Loss mse: 0.0201, Train Loss ce: 0.0970, Train Steps/Sec: 0.07,
838
+ [2026-01-30 03:12:37] (step=0000827) Train Loss mse: 0.0192, Train Loss ce: 0.1047, Train Steps/Sec: 0.05,
839
+ [2026-01-30 03:12:57] (step=0000828) Train Loss mse: 0.0214, Train Loss ce: 0.1020, Train Steps/Sec: 0.05,
840
+ [2026-01-30 03:13:18] (step=0000829) Train Loss mse: 0.0250, Train Loss ce: 0.0784, Train Steps/Sec: 0.05,
841
+ [2026-01-30 03:13:36] (step=0000830) Train Loss mse: 0.0287, Train Loss ce: 0.0897, Train Steps/Sec: 0.06,
842
+ [2026-01-30 03:13:57] (step=0000831) Train Loss mse: 0.0190, Train Loss ce: 0.0986, Train Steps/Sec: 0.05,
843
+ [2026-01-30 03:14:13] (step=0000832) Train Loss mse: 0.0256, Train Loss ce: 0.0924, Train Steps/Sec: 0.06,
844
+ [2026-01-30 03:14:25] (step=0000833) Train Loss mse: 0.0232, Train Loss ce: 0.0824, Train Steps/Sec: 0.08,
845
+ [2026-01-30 03:14:42] (step=0000834) Train Loss mse: 0.0249, Train Loss ce: 0.0951, Train Steps/Sec: 0.06,
846
+ [2026-01-30 03:14:58] (step=0000835) Train Loss mse: 0.0275, Train Loss ce: 0.0966, Train Steps/Sec: 0.06,
847
+ [2026-01-30 03:15:19] (step=0000836) Train Loss mse: 0.0208, Train Loss ce: 0.1061, Train Steps/Sec: 0.05,
848
  FullyShardedDataParallel(
849
  (_fsdp_wrapped_module): Bagel(
850
  (language_model): Qwen2ForCausalLM(
 
1045
  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
1046
  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
1047
  ce_avg: 0.08397598564624786, mse_avg: 0.015734810382127762
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1048
  [2026-01-30 03:15:35] (step=0000837) Train Loss mse: 0.0218, Train Loss ce: 0.0916, Train Steps/Sec: 0.06,
1049
  [2026-01-30 03:15:53] (step=0000838) Train Loss mse: 0.0238, Train Loss ce: 0.1028, Train Steps/Sec: 0.05,
1050
  [2026-01-30 03:16:12] (step=0000839) Train Loss mse: 0.0211, Train Loss ce: 0.1001, Train Steps/Sec: 0.05,
 
2155
  [2026-01-30 09:06:52] (step=0001944) Train Loss mse: 0.0175, Train Loss ce: 0.0748, Train Steps/Sec: 0.06,
2156
  [2026-01-30 09:07:12] (step=0001945) Train Loss mse: 0.0267, Train Loss ce: 0.0836, Train Steps/Sec: 0.05,
2157
  [2026-01-30 09:07:26] (step=0001946) Train Loss mse: 0.0200, Train Loss ce: 0.0730, Train Steps/Sec: 0.07,
2158
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins_step2000
2159
+ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_evalonce/vlm_gym_patch_reassembly_alt_val
2160
+ [eval debug] first 3 batch fingerprints:
2161
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
2162
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
2163
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
2164
+ ce_avg: 0.08462081849575043, mse_avg: 0.015557804144918919
2165
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins_step2500
2166
+ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_evalonce/vlm_gym_patch_reassembly_alt_val
2167
+ [eval debug] first 3 batch fingerprints:
2168
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
2169
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
2170
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
2171
+ ce_avg: 0.3942395746707916, mse_avg: 0.015481945127248764
2172
  [2026-01-30 09:07:46] (step=0001947) Train Loss mse: 0.0181, Train Loss ce: 0.0834, Train Steps/Sec: 0.05,
2173
  [2026-01-30 09:08:06] (step=0001948) Train Loss mse: 0.0197, Train Loss ce: 0.0996, Train Steps/Sec: 0.05,
2174
  [2026-01-30 09:08:26] (step=0001949) Train Loss mse: 0.0215, Train Loss ce: 0.0908, Train Steps/Sec: 0.05,
 
2206
  [2026-01-30 09:18:42] (step=0001981) Train Loss mse: 0.0167, Train Loss ce: 0.0742, Train Steps/Sec: 0.06,
2207
  [2026-01-30 09:19:04] (step=0001982) Train Loss mse: 0.0220, Train Loss ce: 0.0905, Train Steps/Sec: 0.05,
2208
  [2026-01-30 09:19:20] (step=0001983) Train Loss mse: 0.0224, Train Loss ce: 0.0735, Train Steps/Sec: 0.06,
 
 
 
 
 
 
 
2209
  [2026-01-30 09:19:42] (step=0001984) Train Loss mse: 0.0192, Train Loss ce: 0.0788, Train Steps/Sec: 0.05,
2210
  [2026-01-30 09:20:04] (step=0001985) Train Loss mse: 0.0141, Train Loss ce: 0.0955, Train Steps/Sec: 0.05,
2211
  [2026-01-30 09:20:25] (step=0001986) Train Loss mse: 0.0198, Train Loss ce: 0.0789, Train Steps/Sec: 0.05,
 
3072
  [2026-01-30 13:55:59] (step=0002844) Train Loss mse: 0.0198, Train Loss ce: 0.0788, Train Steps/Sec: 0.05,
3073
  [2026-01-30 13:56:17] (step=0002845) Train Loss mse: 0.0168, Train Loss ce: 0.0757, Train Steps/Sec: 0.05,
3074
  [2026-01-30 13:56:37] (step=0002846) Train Loss mse: 0.0185, Train Loss ce: 0.0793, Train Steps/Sec: 0.05,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3075
  [2026-01-30 13:56:59] (step=0002847) Train Loss mse: 0.0204, Train Loss ce: 0.0870, Train Steps/Sec: 0.05,
3076
  [2026-01-30 13:57:16] (step=0002848) Train Loss mse: 0.0180, Train Loss ce: 0.0854, Train Steps/Sec: 0.06,
3077
  [2026-01-30 13:57:38] (step=0002849) Train Loss mse: 0.0181, Train Loss ce: 0.0848, Train Steps/Sec: 0.05,
 
3311
  [2026-01-30 15:14:49] (step=0003083) Train Loss mse: 0.0170, Train Loss ce: 0.0682, Train Steps/Sec: 0.05,
3312
  [2026-01-30 15:15:11] (step=0003084) Train Loss mse: 0.0136, Train Loss ce: 0.0838, Train Steps/Sec: 0.04,
3313
  [2026-01-30 15:15:34] (step=0003085) Train Loss mse: 0.0167, Train Loss ce: 0.0730, Train Steps/Sec: 0.04,
3314
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins_step3000
3315
+ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_evalonce/vlm_gym_patch_reassembly_alt_val
3316
+ [eval debug] first 3 batch fingerprints:
3317
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
3318
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
3319
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
3320
+ ce_avg: 0.07592044025659561, mse_avg: 0.013209872879087925
3321
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins_step3500
3322
+ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_evalonce/vlm_gym_patch_reassembly_alt_val
3323
+ [eval debug] first 3 batch fingerprints:
3324
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
3325
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
3326
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
3327
+ ce_avg: 0.07461568713188171, mse_avg: 0.013009372167289257
3328
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins_step4000
3329
+ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_evalonce/vlm_gym_patch_reassembly_alt_val
3330
+ [eval debug] first 3 batch fingerprints:
3331
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
3332
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
3333
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
3334
+ ce_avg: 0.4290580153465271, mse_avg: 0.012517130933701992
3335
  [2026-01-30 15:19:30] (step=0003097) Train Loss mse: 0.0185, Train Loss ce: 0.0726, Train Steps/Sec: 0.05,
3336
  [2026-01-30 15:19:48] (step=0003098) Train Loss mse: 0.0155, Train Loss ce: 0.0671, Train Steps/Sec: 0.06,
3337
  [2026-01-30 15:20:07] (step=0003099) Train Loss mse: 0.0187, Train Loss ce: 0.0632, Train Steps/Sec: 0.05,
 
4244
  [2026-01-30 20:11:11] (step=0004006) Train Loss mse: 0.0208, Train Loss ce: 0.0885, Train Steps/Sec: 0.04,
4245
  [2026-01-30 20:11:29] (step=0004007) Train Loss mse: 0.0143, Train Loss ce: 0.0732, Train Steps/Sec: 0.05,
4246
  [2026-01-30 20:11:49] (step=0004008) Train Loss mse: 0.0147, Train Loss ce: 0.0707, Train Steps/Sec: 0.05,
4247
+ [2026-01-30 15:15:58] (step=0003086) Train Loss mse: 0.0185, Train Loss ce: 0.0791, Train Steps/Sec: 0.04,
4248
+ [2026-01-30 15:16:17] (step=0003087) Train Loss mse: 0.0131, Train Loss ce: 0.0739, Train Steps/Sec: 0.05,
4249
+ [2026-01-30 15:16:38] (step=0003088) Train Loss mse: 0.0154, Train Loss ce: 0.0845, Train Steps/Sec: 0.05,
4250
+ [2026-01-30 15:17:00] (step=0003089) Train Loss mse: 0.0206, Train Loss ce: 0.0767, Train Steps/Sec: 0.05,
4251
+ [2026-01-30 15:17:16] (step=0003090) Train Loss mse: 0.0185, Train Loss ce: 0.0730, Train Steps/Sec: 0.06,
4252
+ [2026-01-30 15:17:34] (step=0003091) Train Loss mse: 0.0181, Train Loss ce: 0.0746, Train Steps/Sec: 0.05,
4253
+ [2026-01-30 15:17:55] (step=0003092) Train Loss mse: 0.0198, Train Loss ce: 0.0845, Train Steps/Sec: 0.05,
4254
+ [2026-01-30 15:18:16] (step=0003093) Train Loss mse: 0.0185, Train Loss ce: 0.0748, Train Steps/Sec: 0.05,
4255
+ [2026-01-30 15:18:33] (step=0003094) Train Loss mse: 0.0219, Train Loss ce: 0.0700, Train Steps/Sec: 0.06,
4256
+ [2026-01-30 15:18:52] (step=0003095) Train Loss mse: 0.0138, Train Loss ce: 0.0704, Train Steps/Sec: 0.05,
4257
+ [2026-01-30 15:19:12] (step=0003096) Train Loss mse: 0.0226, Train Loss ce: 0.0813, Train Steps/Sec: 0.05,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4258
  base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_ins_step4500
4259
  Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_evalonce/vlm_gym_patch_reassembly_alt_val
4260
  [eval debug] first 3 batch fingerprints:
 
4268
  fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
4269
  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
4270
  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_evalonce'}]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4271
  [2026-01-30 20:42:28] (step=0004108) Train Loss mse: 0.0194, Train Loss ce: 0.0764, Train Steps/Sec: 0.07,
4272
  [2026-01-30 20:42:48] (step=0004109) Train Loss mse: 0.0146, Train Loss ce: 0.0788, Train Steps/Sec: 0.05,
4273
  [2026-01-30 20:43:07] (step=0004110) Train Loss mse: 0.0180, Train Loss ce: 0.0578, Train Steps/Sec: 0.05,