File size: 27,896 Bytes
76e4ab1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | nohup: ignoring input 2026-02-25 18:05:41,969 [INFO] __main__: βββ WORLD MODEL TRAINING βββ 2026-02-25 18:05:41,969 [INFO] __main__: Trajectories: data/training/tutoring_trajectories_merged.pt 2026-02-25 18:05:41,969 [INFO] __main__: Device: cuda 2026-02-25 18:05:41,969 [INFO] __main__: Config: obs=20, act=8, latent=128, hidden=512 2026-02-25 18:05:41,969 [INFO] __main__: Rollout: horizon=5, discount=0.95, weight=0.50 2026-02-25 18:05:42,158 [INFO] __main__: Loaded trajectory dataset: 100901 trajectories, seq_len=20 2026-02-25 18:05:42,172 [INFO] __main__: Train: 95856 trajectories, Eval: 5045 trajectories 2026-02-25 18:05:42,196 [INFO] __main__: TutoringRSSM initialized: 2802838 trainable params (obs=20, act=8, latent=128, hidden=512) 2026-02-25 18:05:43,302 [INFO] __main__: AMP: enabled (dtype=torch.bfloat16) 2026-02-25 18:06:54,815 [INFO] __main__: Epoch 1/100 | train_loss=1.1062 (recon=0.8257 kl=0.0119 rew=0.1221 done=0.2374 rollout=1.0153) | eval_loss=0.5283 | lr=1.00e-04 | 71.5s (1340 samples/s) | gpu_mem=1.3GB 2026-02-25 18:06:54,842 [INFO] __main__: β New best eval loss: 0.5283 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:08:05,197 [INFO] __main__: Epoch 2/100 | train_loss=0.5135 (recon=0.2962 kl=0.0162 rew=0.1142 done=0.1189 rollout=0.4816) | eval_loss=0.4655 | lr=9.99e-05 | 70.4s (1362 samples/s) | gpu_mem=1.3GB 2026-02-25 18:08:05,217 [INFO] __main__: β New best eval loss: 0.4655 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:09:15,732 [INFO] __main__: Epoch 3/100 | train_loss=0.4439 (recon=0.2452 kl=0.0068 rew=0.1086 done=0.0963 rollout=0.4309) | eval_loss=0.4277 | lr=9.98e-05 | 70.5s (1359 samples/s) | gpu_mem=1.3GB 2026-02-25 18:09:15,753 [INFO] __main__: β New best eval loss: 0.4277 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:10:25,717 [INFO] __main__: Epoch 4/100 | train_loss=0.4088 (recon=0.2179 kl=0.0087 rew=0.1034 done=0.0865 rollout=0.4011) | eval_loss=0.3946 | lr=9.96e-05 | 70.0s (1370 samples/s) | gpu_mem=1.3GB 2026-02-25 18:10:25,739 [INFO] __main__: β New best eval loss: 0.3946 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:11:36,483 [INFO] __main__: Epoch 5/100 | train_loss=0.3867 (recon=0.2010 kl=0.0095 rew=0.0995 done=0.0816 rollout=0.3817) | eval_loss=0.3807 | lr=9.94e-05 | 70.7s (1355 samples/s) | gpu_mem=1.3GB 2026-02-25 18:11:36,506 [INFO] __main__: β New best eval loss: 0.3807 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:12:47,250 [INFO] __main__: Epoch 6/100 | train_loss=0.3736 (recon=0.1909 kl=0.0102 rew=0.0966 done=0.0785 rollout=0.3709) | eval_loss=0.3709 | lr=9.91e-05 | 70.7s (1355 samples/s) | gpu_mem=1.3GB 2026-02-25 18:12:47,274 [INFO] __main__: β New best eval loss: 0.3709 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:13:58,025 [INFO] __main__: Epoch 7/100 | train_loss=0.3653 (recon=0.1835 kl=0.0108 rew=0.0947 done=0.0765 rollout=0.3652) | eval_loss=0.3697 | lr=9.88e-05 | 70.8s (1355 samples/s) | gpu_mem=1.3GB 2026-02-25 18:13:58,046 [INFO] __main__: β New best eval loss: 0.3697 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:15:08,628 [INFO] __main__: Epoch 8/100 | train_loss=0.3587 (recon=0.1779 kl=0.0113 rew=0.0928 done=0.0748 rollout=0.3606) | eval_loss=0.3572 | lr=9.84e-05 | 70.6s (1358 samples/s) | gpu_mem=1.3GB 2026-02-25 18:15:08,651 [INFO] __main__: β New best eval loss: 0.3572 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:16:19,315 [INFO] __main__: Epoch 9/100 | train_loss=0.3522 (recon=0.1725 kl=0.0115 rew=0.0910 done=0.0731 rollout=0.3563) | eval_loss=0.3507 | lr=9.80e-05 | 70.7s (1357 samples/s) | gpu_mem=1.3GB 2026-02-25 18:16:19,340 [INFO] __main__: β New best eval loss: 0.3507 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:17:30,150 [INFO] __main__: Epoch 10/100 | train_loss=0.3475 (recon=0.1685 kl=0.0114 rew=0.0898 done=0.0719 rollout=0.3534) | eval_loss=0.3452 | lr=9.76e-05 | 70.8s (1354 samples/s) | gpu_mem=1.3GB 2026-02-25 18:17:30,171 [INFO] __main__: β New best eval loss: 0.3452 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:18:41,124 [INFO] __main__: Epoch 11/100 | train_loss=0.3426 (recon=0.1645 kl=0.0112 rew=0.0886 done=0.0707 rollout=0.3503) | eval_loss=0.3483 | lr=9.70e-05 | 70.9s (1351 samples/s) | gpu_mem=1.3GB 2026-02-25 18:19:51,548 [INFO] __main__: Epoch 12/100 | train_loss=0.3404 (recon=0.1625 kl=0.0110 rew=0.0879 done=0.0701 rollout=0.3492) | eval_loss=0.3401 | lr=9.65e-05 | 70.4s (1361 samples/s) | gpu_mem=1.3GB 2026-02-25 18:19:51,571 [INFO] __main__: β New best eval loss: 0.3401 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:21:02,429 [INFO] __main__: Epoch 13/100 | train_loss=0.3379 (recon=0.1607 kl=0.0111 rew=0.0871 done=0.0693 rollout=0.3476) | eval_loss=0.3385 | lr=9.59e-05 | 70.9s (1353 samples/s) | gpu_mem=1.3GB 2026-02-25 18:21:02,450 [INFO] __main__: β New best eval loss: 0.3385 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:22:12,961 [INFO] __main__: Epoch 14/100 | train_loss=0.3375 (recon=0.1606 kl=0.0112 rew=0.0868 done=0.0690 rollout=0.3473) | eval_loss=0.3408 | lr=9.52e-05 | 70.5s (1359 samples/s) | gpu_mem=1.3GB 2026-02-25 18:23:23,462 [INFO] __main__: Epoch 15/100 | train_loss=0.3363 (recon=0.1591 kl=0.0114 rew=0.0866 done=0.0688 rollout=0.3467) | eval_loss=0.3414 | lr=9.46e-05 | 70.5s (1360 samples/s) | gpu_mem=1.3GB 2026-02-25 18:24:33,788 [INFO] __main__: Epoch 16/100 | train_loss=0.3351 (recon=0.1586 kl=0.0111 rew=0.0862 done=0.0685 rollout=0.3456) | eval_loss=0.3473 | lr=9.38e-05 | 70.3s (1363 samples/s) | gpu_mem=1.3GB 2026-02-25 18:25:44,746 [INFO] __main__: Epoch 17/100 | train_loss=0.5437 (recon=0.1957 kl=0.3120 rew=0.0954 done=0.0791 rollout=0.4052) | eval_loss=0.4109 | lr=9.30e-05 | 71.0s (1351 samples/s) | gpu_mem=1.3GB 2026-02-25 18:26:55,420 [INFO] __main__: Epoch 18/100 | train_loss=0.3521 (recon=0.1768 kl=0.0077 rew=0.0899 done=0.0727 rollout=0.3571) | eval_loss=0.3392 | lr=9.22e-05 | 70.7s (1356 samples/s) | gpu_mem=1.3GB 2026-02-25 18:28:05,836 [INFO] __main__: Epoch 19/100 | train_loss=0.3347 (recon=0.1594 kl=0.0092 rew=0.0868 done=0.0689 rollout=0.3450) | eval_loss=0.3335 | lr=9.14e-05 | 70.4s (1361 samples/s) | gpu_mem=1.3GB 2026-02-25 18:28:05,858 [INFO] __main__: β New best eval loss: 0.3335 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:29:16,516 [INFO] __main__: Epoch 20/100 | train_loss=0.3308 (recon=0.1559 kl=0.0098 rew=0.0856 done=0.0679 rollout=0.3425) | eval_loss=0.3300 | lr=9.05e-05 | 70.7s (1357 samples/s) | gpu_mem=1.3GB 2026-02-25 18:29:16,539 [INFO] __main__: β New best eval loss: 0.3300 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:30:27,172 [INFO] __main__: Epoch 21/100 | train_loss=0.3289 (recon=0.1543 kl=0.0101 rew=0.0850 done=0.0672 rollout=0.3412) | eval_loss=0.3289 | lr=8.95e-05 | 70.6s (1358 samples/s) | gpu_mem=1.3GB 2026-02-25 18:30:27,194 [INFO] __main__: β New best eval loss: 0.3289 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:31:37,839 [INFO] __main__: Epoch 22/100 | train_loss=0.3281 (recon=0.1536 kl=0.0103 rew=0.0846 done=0.0669 rollout=0.3406) | eval_loss=0.3292 | lr=8.85e-05 | 70.6s (1357 samples/s) | gpu_mem=1.3GB 2026-02-25 18:32:48,010 [INFO] __main__: Epoch 23/100 | train_loss=0.3272 (recon=0.1531 kl=0.0104 rew=0.0843 done=0.0665 rollout=0.3400) | eval_loss=0.3296 | lr=8.75e-05 | 70.2s (1366 samples/s) | gpu_mem=1.3GB 2026-02-25 18:33:58,113 [INFO] __main__: Epoch 24/100 | train_loss=0.3269 (recon=0.1525 kl=0.0105 rew=0.0841 done=0.0664 rollout=0.3401) | eval_loss=0.3279 | lr=8.64e-05 | 70.1s (1367 samples/s) | gpu_mem=1.3GB 2026-02-25 18:33:58,135 [INFO] __main__: β New best eval loss: 0.3279 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:35:09,021 [INFO] __main__: Epoch 25/100 | train_loss=0.3263 (recon=0.1523 kl=0.0105 rew=0.0840 done=0.0663 rollout=0.3396) | eval_loss=0.3275 | lr=8.54e-05 | 70.9s (1352 samples/s) | gpu_mem=1.3GB 2026-02-25 18:35:09,044 [INFO] __main__: β New best eval loss: 0.3275 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:36:19,718 [INFO] __main__: Epoch 26/100 | train_loss=0.3260 (recon=0.1522 kl=0.0106 rew=0.0837 done=0.0660 rollout=0.3395) | eval_loss=0.3315 | lr=8.42e-05 | 70.7s (1356 samples/s) | gpu_mem=1.3GB 2026-02-25 18:37:29,992 [INFO] __main__: Epoch 27/100 | train_loss=0.3259 (recon=0.1518 kl=0.0107 rew=0.0837 done=0.0660 rollout=0.3395) | eval_loss=0.3270 | lr=8.31e-05 | 70.3s (1364 samples/s) | gpu_mem=1.3GB 2026-02-25 18:37:30,015 [INFO] __main__: β New best eval loss: 0.3270 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:38:40,921 [INFO] __main__: Epoch 28/100 | train_loss=0.3266 (recon=0.1520 kl=0.0110 rew=0.0839 done=0.0661 rollout=0.3402) | eval_loss=0.3265 | lr=8.19e-05 | 70.9s (1352 samples/s) | gpu_mem=1.3GB 2026-02-25 18:38:40,942 [INFO] __main__: β New best eval loss: 0.3265 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:39:51,355 [INFO] __main__: Epoch 29/100 | train_loss=0.3256 (recon=0.1513 kl=0.0110 rew=0.0836 done=0.0658 rollout=0.3395) | eval_loss=0.3274 | lr=8.06e-05 | 70.4s (1361 samples/s) | gpu_mem=1.3GB 2026-02-25 18:41:02,495 [INFO] __main__: Epoch 30/100 | train_loss=0.3250 (recon=0.1509 kl=0.0111 rew=0.0834 done=0.0656 rollout=0.3390) | eval_loss=0.3284 | lr=7.94e-05 | 71.1s (1347 samples/s) | gpu_mem=1.3GB 2026-02-25 18:42:12,904 [INFO] __main__: Epoch 31/100 | train_loss=0.3251 (recon=0.1508 kl=0.0111 rew=0.0834 done=0.0656 rollout=0.3392) | eval_loss=0.3278 | lr=7.81e-05 | 70.4s (1362 samples/s) | gpu_mem=1.3GB 2026-02-25 18:43:23,731 [INFO] __main__: Epoch 32/100 | train_loss=0.3253 (recon=0.1507 kl=0.0113 rew=0.0836 done=0.0658 rollout=0.3392) | eval_loss=0.3256 | lr=7.68e-05 | 70.8s (1353 samples/s) | gpu_mem=1.3GB 2026-02-25 18:43:23,754 [INFO] __main__: β New best eval loss: 0.3256 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:44:34,007 [INFO] __main__: Epoch 33/100 | train_loss=0.3250 (recon=0.1503 kl=0.0113 rew=0.0835 done=0.0657 rollout=0.3392) | eval_loss=0.3246 | lr=7.55e-05 | 70.3s (1364 samples/s) | gpu_mem=1.3GB 2026-02-25 18:44:34,030 [INFO] __main__: β New best eval loss: 0.3246 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:45:45,357 [INFO] __main__: Epoch 34/100 | train_loss=0.3250 (recon=0.1502 kl=0.0116 rew=0.0835 done=0.0657 rollout=0.3390) | eval_loss=0.3235 | lr=7.41e-05 | 71.3s (1344 samples/s) | gpu_mem=1.3GB 2026-02-25 18:45:45,380 [INFO] __main__: β New best eval loss: 0.3235 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:46:56,106 [INFO] __main__: Epoch 35/100 | train_loss=0.3236 (recon=0.1495 kl=0.0113 rew=0.0833 done=0.0655 rollout=0.3377) | eval_loss=0.3261 | lr=7.27e-05 | 70.7s (1355 samples/s) | gpu_mem=1.3GB 2026-02-25 18:48:06,339 [INFO] __main__: Epoch 36/100 | train_loss=0.3235 (recon=0.1490 kl=0.0114 rew=0.0833 done=0.0655 rollout=0.3377) | eval_loss=0.3237 | lr=7.13e-05 | 70.2s (1365 samples/s) | gpu_mem=1.3GB 2026-02-25 18:49:16,519 [INFO] __main__: Epoch 37/100 | train_loss=0.3236 (recon=0.1495 kl=0.0115 rew=0.0831 done=0.0653 rollout=0.3377) | eval_loss=0.3267 | lr=6.99e-05 | 70.2s (1366 samples/s) | gpu_mem=1.3GB 2026-02-25 18:50:27,556 [INFO] __main__: Epoch 38/100 | train_loss=0.3527 (recon=0.1496 kl=0.0665 rew=0.0836 done=0.0659 rollout=0.3398) | eval_loss=2.2169 | lr=6.84e-05 | 71.0s (1349 samples/s) | gpu_mem=1.3GB 2026-02-25 18:51:38,153 [INFO] __main__: Epoch 39/100 | train_loss=0.3815 (recon=0.1745 kl=0.0569 rew=0.0906 done=0.0711 rollout=0.3697) | eval_loss=0.3257 | lr=6.69e-05 | 70.6s (1358 samples/s) | gpu_mem=1.3GB 2026-02-25 18:52:49,003 [INFO] __main__: Epoch 40/100 | train_loss=0.3221 (recon=0.1484 kl=0.0096 rew=0.0837 done=0.0659 rollout=0.3367) | eval_loss=0.3214 | lr=6.55e-05 | 70.8s (1353 samples/s) | gpu_mem=1.3GB 2026-02-25 18:52:49,026 [INFO] __main__: β New best eval loss: 0.3214 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:53:59,507 [INFO] __main__: Epoch 41/100 | train_loss=0.3204 (recon=0.1467 kl=0.0101 rew=0.0829 done=0.0652 rollout=0.3358) | eval_loss=0.3207 | lr=6.39e-05 | 70.5s (1360 samples/s) | gpu_mem=1.3GB 2026-02-25 18:53:59,530 [INFO] __main__: β New best eval loss: 0.3207 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:55:10,159 [INFO] __main__: Epoch 42/100 | train_loss=0.3198 (recon=0.1463 kl=0.0105 rew=0.0826 done=0.0649 rollout=0.3353) | eval_loss=0.3206 | lr=6.24e-05 | 70.6s (1357 samples/s) | gpu_mem=1.3GB 2026-02-25 18:55:10,182 [INFO] __main__: β New best eval loss: 0.3206 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:56:20,740 [INFO] __main__: Epoch 43/100 | train_loss=0.3191 (recon=0.1458 kl=0.0105 rew=0.0825 done=0.0647 rollout=0.3348) | eval_loss=0.3209 | lr=6.09e-05 | 70.6s (1359 samples/s) | gpu_mem=1.3GB 2026-02-25 18:57:31,289 [INFO] __main__: Epoch 44/100 | train_loss=0.3191 (recon=0.1458 kl=0.0108 rew=0.0822 done=0.0645 rollout=0.3350) | eval_loss=0.3205 | lr=5.94e-05 | 70.5s (1359 samples/s) | gpu_mem=1.3GB 2026-02-25 18:57:31,312 [INFO] __main__: β New best eval loss: 0.3205 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:58:42,262 [INFO] __main__: Epoch 45/100 | train_loss=0.3190 (recon=0.1455 kl=0.0109 rew=0.0823 done=0.0644 rollout=0.3349) | eval_loss=0.3199 | lr=5.78e-05 | 70.9s (1351 samples/s) | gpu_mem=1.3GB 2026-02-25 18:58:42,284 [INFO] __main__: β New best eval loss: 0.3199 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 18:59:53,374 [INFO] __main__: Epoch 46/100 | train_loss=0.3185 (recon=0.1452 kl=0.0108 rew=0.0822 done=0.0643 rollout=0.3346) | eval_loss=0.3209 | lr=5.63e-05 | 71.1s (1348 samples/s) | gpu_mem=1.3GB 2026-02-25 19:01:04,213 [INFO] __main__: Epoch 47/100 | train_loss=0.3188 (recon=0.1451 kl=0.0110 rew=0.0824 done=0.0644 rollout=0.3347) | eval_loss=0.3196 | lr=5.47e-05 | 70.8s (1353 samples/s) | gpu_mem=1.3GB 2026-02-25 19:01:04,236 [INFO] __main__: β New best eval loss: 0.3196 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 19:02:14,681 [INFO] __main__: Epoch 48/100 | train_loss=0.3182 (recon=0.1448 kl=0.0110 rew=0.0822 done=0.0642 rollout=0.3341) | eval_loss=0.3195 | lr=5.31e-05 | 70.4s (1361 samples/s) | gpu_mem=1.3GB 2026-02-25 19:02:14,704 [INFO] __main__: β New best eval loss: 0.3195 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 19:03:25,389 [INFO] __main__: Epoch 49/100 | train_loss=0.3182 (recon=0.1448 kl=0.0110 rew=0.0822 done=0.0642 rollout=0.3342) | eval_loss=0.3294 | lr=5.16e-05 | 70.7s (1356 samples/s) | gpu_mem=1.3GB 2026-02-25 19:04:36,190 [INFO] __main__: Epoch 50/100 | train_loss=0.3184 (recon=0.1445 kl=0.0111 rew=0.0822 done=0.0643 rollout=0.3346) | eval_loss=0.3213 | lr=5.00e-05 | 70.8s (1354 samples/s) | gpu_mem=1.3GB 2026-02-25 19:05:46,967 [INFO] __main__: Epoch 51/100 | train_loss=0.3177 (recon=0.1442 kl=0.0110 rew=0.0821 done=0.0642 rollout=0.3339) | eval_loss=0.3190 | lr=4.84e-05 | 70.8s (1355 samples/s) | gpu_mem=1.3GB 2026-02-25 19:05:46,990 [INFO] __main__: β New best eval loss: 0.3190 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 19:06:57,321 [INFO] __main__: Epoch 52/100 | train_loss=0.3180 (recon=0.1442 kl=0.0111 rew=0.0821 done=0.0642 rollout=0.3344) | eval_loss=0.3201 | lr=4.69e-05 | 70.3s (1363 samples/s) | gpu_mem=1.3GB 2026-02-25 19:08:07,968 [INFO] __main__: Epoch 53/100 | train_loss=0.3179 (recon=0.1437 kl=0.0112 rew=0.0824 done=0.0644 rollout=0.3342) | eval_loss=0.3172 | lr=4.53e-05 | 70.6s (1357 samples/s) | gpu_mem=1.3GB 2026-02-25 19:08:07,991 [INFO] __main__: β New best eval loss: 0.3172 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 19:09:18,618 [INFO] __main__: Epoch 54/100 | train_loss=0.3170 (recon=0.1433 kl=0.0111 rew=0.0820 done=0.0641 rollout=0.3334) | eval_loss=0.3191 | lr=4.37e-05 | 70.6s (1357 samples/s) | gpu_mem=1.3GB 2026-02-25 19:10:29,306 [INFO] __main__: Epoch 55/100 | train_loss=0.3167 (recon=0.1430 kl=0.0113 rew=0.0820 done=0.0641 rollout=0.3331) | eval_loss=0.3181 | lr=4.22e-05 | 70.7s (1356 samples/s) | gpu_mem=1.3GB 2026-02-25 19:11:40,099 [INFO] __main__: Epoch 56/100 | train_loss=0.3168 (recon=0.1429 kl=0.0113 rew=0.0820 done=0.0642 rollout=0.3332) | eval_loss=0.3191 | lr=4.06e-05 | 70.8s (1354 samples/s) | gpu_mem=1.3GB 2026-02-25 19:12:50,815 [INFO] __main__: Epoch 57/100 | train_loss=0.3163 (recon=0.1424 kl=0.0112 rew=0.0819 done=0.0641 rollout=0.3329) | eval_loss=0.3188 | lr=3.91e-05 | 70.7s (1356 samples/s) | gpu_mem=1.3GB 2026-02-25 19:14:01,170 [INFO] __main__: Epoch 58/100 | train_loss=0.3168 (recon=0.1426 kl=0.0114 rew=0.0820 done=0.0641 rollout=0.3335) | eval_loss=0.3182 | lr=3.76e-05 | 70.4s (1362 samples/s) | gpu_mem=1.3GB 2026-02-25 19:15:12,063 [INFO] __main__: Epoch 59/100 | train_loss=0.3163 (recon=0.1425 kl=0.0113 rew=0.0820 done=0.0640 rollout=0.3327) | eval_loss=0.3188 | lr=3.61e-05 | 70.9s (1352 samples/s) | gpu_mem=1.3GB 2026-02-25 19:16:22,721 [INFO] __main__: Epoch 60/100 | train_loss=0.3157 (recon=0.1421 kl=0.0113 rew=0.0818 done=0.0639 rollout=0.3322) | eval_loss=0.3179 | lr=3.45e-05 | 70.7s (1357 samples/s) | gpu_mem=1.3GB 2026-02-25 19:17:33,459 [INFO] __main__: Epoch 61/100 | train_loss=0.3162 (recon=0.1420 kl=0.0114 rew=0.0820 done=0.0641 rollout=0.3328) | eval_loss=0.3165 | lr=3.31e-05 | 70.7s (1356 samples/s) | gpu_mem=1.3GB 2026-02-25 19:17:33,480 [INFO] __main__: β New best eval loss: 0.3165 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 19:18:44,368 [INFO] __main__: Epoch 62/100 | train_loss=0.3155 (recon=0.1415 kl=0.0113 rew=0.0820 done=0.0640 rollout=0.3321) | eval_loss=0.3156 | lr=3.16e-05 | 70.9s (1352 samples/s) | gpu_mem=1.3GB 2026-02-25 19:18:44,389 [INFO] __main__: β New best eval loss: 0.3156 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 19:19:55,957 [INFO] __main__: Epoch 63/100 | train_loss=0.3151 (recon=0.1414 kl=0.0112 rew=0.0819 done=0.0640 rollout=0.3317) | eval_loss=0.3181 | lr=3.01e-05 | 71.6s (1339 samples/s) | gpu_mem=1.3GB 2026-02-25 19:21:06,500 [INFO] __main__: Epoch 64/100 | train_loss=0.3146 (recon=0.1412 kl=0.0112 rew=0.0817 done=0.0639 rollout=0.3313) | eval_loss=0.3156 | lr=2.87e-05 | 70.5s (1359 samples/s) | gpu_mem=1.3GB 2026-02-25 19:22:18,147 [INFO] __main__: Epoch 65/100 | train_loss=0.3152 (recon=0.1415 kl=0.0114 rew=0.0819 done=0.0640 rollout=0.3317) | eval_loss=0.3259 | lr=2.73e-05 | 71.6s (1338 samples/s) | gpu_mem=1.3GB 2026-02-25 19:23:29,450 [INFO] __main__: Epoch 66/100 | train_loss=0.3153 (recon=0.1414 kl=0.0113 rew=0.0820 done=0.0641 rollout=0.3318) | eval_loss=0.3175 | lr=2.59e-05 | 71.3s (1344 samples/s) | gpu_mem=1.3GB 2026-02-25 19:24:40,964 [INFO] __main__: Epoch 67/100 | train_loss=0.3145 (recon=0.1408 kl=0.0112 rew=0.0819 done=0.0641 rollout=0.3310) | eval_loss=0.3169 | lr=2.45e-05 | 71.5s (1340 samples/s) | gpu_mem=1.3GB 2026-02-25 19:25:51,897 [INFO] __main__: Epoch 68/100 | train_loss=0.3149 (recon=0.1411 kl=0.0114 rew=0.0819 done=0.0640 rollout=0.3313) | eval_loss=0.3191 | lr=2.32e-05 | 70.9s (1351 samples/s) | gpu_mem=1.3GB 2026-02-25 19:27:02,722 [INFO] __main__: Epoch 69/100 | train_loss=0.3148 (recon=0.1408 kl=0.0112 rew=0.0821 done=0.0642 rollout=0.3313) | eval_loss=0.3160 | lr=2.19e-05 | 70.8s (1353 samples/s) | gpu_mem=1.3GB 2026-02-25 19:28:14,130 [INFO] __main__: Epoch 70/100 | train_loss=0.3139 (recon=0.1406 kl=0.0110 rew=0.0819 done=0.0640 rollout=0.3303) | eval_loss=0.3164 | lr=2.06e-05 | 71.4s (1342 samples/s) | gpu_mem=1.3GB 2026-02-25 19:29:25,313 [INFO] __main__: Epoch 71/100 | train_loss=0.3142 (recon=0.1406 kl=0.0111 rew=0.0819 done=0.0640 rollout=0.3307) | eval_loss=0.3176 | lr=1.94e-05 | 71.2s (1347 samples/s) | gpu_mem=1.3GB 2026-02-25 19:30:36,305 [INFO] __main__: Epoch 72/100 | train_loss=0.3141 (recon=0.1407 kl=0.0111 rew=0.0819 done=0.0640 rollout=0.3307) | eval_loss=0.3148 | lr=1.81e-05 | 71.0s (1350 samples/s) | gpu_mem=1.3GB 2026-02-25 19:30:36,326 [INFO] __main__: β New best eval loss: 0.3148 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 19:31:47,498 [INFO] __main__: Epoch 73/100 | train_loss=0.3139 (recon=0.1402 kl=0.0111 rew=0.0820 done=0.0640 rollout=0.3305) | eval_loss=0.3138 | lr=1.69e-05 | 71.2s (1347 samples/s) | gpu_mem=1.3GB 2026-02-25 19:31:47,521 [INFO] __main__: β New best eval loss: 0.3138 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 19:32:58,167 [INFO] __main__: Epoch 74/100 | train_loss=0.3135 (recon=0.1400 kl=0.0109 rew=0.0820 done=0.0640 rollout=0.3301) | eval_loss=0.3154 | lr=1.58e-05 | 70.6s (1357 samples/s) | gpu_mem=1.3GB 2026-02-25 19:34:09,526 [INFO] __main__: Epoch 75/100 | train_loss=0.3139 (recon=0.1399 kl=0.0112 rew=0.0821 done=0.0641 rollout=0.3304) | eval_loss=0.3162 | lr=1.46e-05 | 71.4s (1343 samples/s) | gpu_mem=1.3GB 2026-02-25 19:35:20,593 [INFO] __main__: Epoch 76/100 | train_loss=0.3137 (recon=0.1399 kl=0.0110 rew=0.0820 done=0.0641 rollout=0.3304) | eval_loss=0.3144 | lr=1.36e-05 | 71.1s (1349 samples/s) | gpu_mem=1.3GB 2026-02-25 19:36:31,515 [INFO] __main__: Epoch 77/100 | train_loss=0.3132 (recon=0.1397 kl=0.0109 rew=0.0820 done=0.0640 rollout=0.3299) | eval_loss=0.3146 | lr=1.25e-05 | 70.9s (1352 samples/s) | gpu_mem=1.3GB 2026-02-25 19:37:43,067 [INFO] __main__: Epoch 78/100 | train_loss=0.3128 (recon=0.1395 kl=0.0109 rew=0.0818 done=0.0639 rollout=0.3295) | eval_loss=0.3158 | lr=1.15e-05 | 71.6s (1340 samples/s) | gpu_mem=1.3GB 2026-02-25 19:38:54,333 [INFO] __main__: Epoch 79/100 | train_loss=0.3132 (recon=0.1397 kl=0.0110 rew=0.0819 done=0.0640 rollout=0.3299) | eval_loss=0.3141 | lr=1.05e-05 | 71.3s (1345 samples/s) | gpu_mem=1.3GB 2026-02-25 19:40:05,333 [INFO] __main__: Epoch 80/100 | train_loss=0.3131 (recon=0.1394 kl=0.0109 rew=0.0821 done=0.0641 rollout=0.3297) | eval_loss=0.3148 | lr=9.55e-06 | 71.0s (1350 samples/s) | gpu_mem=1.3GB 2026-02-25 19:41:16,170 [INFO] __main__: Epoch 81/100 | train_loss=0.3127 (recon=0.1395 kl=0.0109 rew=0.0818 done=0.0639 rollout=0.3294) | eval_loss=0.3149 | lr=8.65e-06 | 70.8s (1354 samples/s) | gpu_mem=1.3GB 2026-02-25 19:42:26,882 [INFO] __main__: Epoch 82/100 | train_loss=0.3132 (recon=0.1394 kl=0.0109 rew=0.0820 done=0.0641 rollout=0.3299) | eval_loss=0.3134 | lr=7.78e-06 | 70.7s (1356 samples/s) | gpu_mem=1.3GB 2026-02-25 19:42:26,903 [INFO] __main__: β New best eval loss: 0.3134 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 19:43:38,250 [INFO] __main__: Epoch 83/100 | train_loss=0.3129 (recon=0.1394 kl=0.0109 rew=0.0820 done=0.0641 rollout=0.3295) | eval_loss=0.3135 | lr=6.96e-06 | 71.3s (1344 samples/s) | gpu_mem=1.3GB 2026-02-25 19:44:48,938 [INFO] __main__: Epoch 84/100 | train_loss=0.3129 (recon=0.1393 kl=0.0109 rew=0.0821 done=0.0641 rollout=0.3296) | eval_loss=0.3134 | lr=6.18e-06 | 70.7s (1356 samples/s) | gpu_mem=1.3GB 2026-02-25 19:44:48,960 [INFO] __main__: β New best eval loss: 0.3134 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 19:45:59,739 [INFO] __main__: Epoch 85/100 | train_loss=0.3127 (recon=0.1391 kl=0.0108 rew=0.0819 done=0.0639 rollout=0.3295) | eval_loss=0.3146 | lr=5.45e-06 | 70.8s (1354 samples/s) | gpu_mem=1.3GB 2026-02-25 19:47:11,503 [INFO] __main__: Epoch 86/100 | train_loss=0.3126 (recon=0.1391 kl=0.0108 rew=0.0820 done=0.0640 rollout=0.3292) | eval_loss=0.3152 | lr=4.76e-06 | 71.8s (1336 samples/s) | gpu_mem=1.3GB 2026-02-25 19:48:22,493 [INFO] __main__: Epoch 87/100 | train_loss=0.3125 (recon=0.1392 kl=0.0108 rew=0.0819 done=0.0639 rollout=0.3293) | eval_loss=0.3145 | lr=4.11e-06 | 71.0s (1350 samples/s) | gpu_mem=1.3GB 2026-02-25 19:49:34,161 [INFO] __main__: Epoch 88/100 | train_loss=0.3124 (recon=0.1391 kl=0.0107 rew=0.0819 done=0.0640 rollout=0.3291) | eval_loss=0.3147 | lr=3.51e-06 | 71.7s (1338 samples/s) | gpu_mem=1.3GB 2026-02-25 19:50:45,579 [INFO] __main__: Epoch 89/100 | train_loss=0.3123 (recon=0.1391 kl=0.0109 rew=0.0818 done=0.0639 rollout=0.3291) | eval_loss=0.3132 | lr=2.96e-06 | 71.4s (1342 samples/s) | gpu_mem=1.3GB 2026-02-25 19:50:45,600 [INFO] __main__: β New best eval loss: 0.3132 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 19:51:57,816 [INFO] __main__: Epoch 90/100 | train_loss=0.3123 (recon=0.1390 kl=0.0108 rew=0.0819 done=0.0638 rollout=0.3290) | eval_loss=0.3142 | lr=2.45e-06 | 72.2s (1327 samples/s) | gpu_mem=1.3GB 2026-02-25 19:53:09,370 [INFO] __main__: Epoch 91/100 | train_loss=0.3123 (recon=0.1390 kl=0.0108 rew=0.0819 done=0.0638 rollout=0.3290) | eval_loss=0.3145 | lr=1.99e-06 | 71.5s (1340 samples/s) | gpu_mem=1.3GB 2026-02-25 19:54:20,932 [INFO] __main__: Epoch 92/100 | train_loss=0.3124 (recon=0.1389 kl=0.0108 rew=0.0820 done=0.0641 rollout=0.3291) | eval_loss=0.3143 | lr=1.57e-06 | 71.6s (1339 samples/s) | gpu_mem=1.3GB 2026-02-25 19:55:32,652 [INFO] __main__: Epoch 93/100 | train_loss=0.3122 (recon=0.1391 kl=0.0107 rew=0.0819 done=0.0639 rollout=0.3288) | eval_loss=0.3124 | lr=1.20e-06 | 71.7s (1337 samples/s) | gpu_mem=1.3GB 2026-02-25 19:55:32,682 [INFO] __main__: β New best eval loss: 0.3124 β checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 19:56:45,681 [INFO] __main__: Epoch 94/100 | train_loss=0.3124 (recon=0.1390 kl=0.0109 rew=0.0820 done=0.0640 rollout=0.3291) | eval_loss=0.3139 | lr=8.86e-07 | 73.0s (1313 samples/s) | gpu_mem=1.3GB 2026-02-25 19:57:57,869 [INFO] __main__: Epoch 95/100 | train_loss=0.3125 (recon=0.1390 kl=0.0108 rew=0.0819 done=0.0639 rollout=0.3293) | eval_loss=0.3136 | lr=6.16e-07 | 72.2s (1328 samples/s) | gpu_mem=1.3GB 2026-02-25 19:59:10,503 [INFO] __main__: Epoch 96/100 | train_loss=0.3121 (recon=0.1390 kl=0.0108 rew=0.0818 done=0.0638 rollout=0.3289) | eval_loss=0.3130 | lr=3.94e-07 | 72.6s (1320 samples/s) | gpu_mem=1.3GB 2026-02-25 20:00:23,114 [INFO] __main__: Epoch 97/100 | train_loss=0.3125 (recon=0.1389 kl=0.0108 rew=0.0820 done=0.0640 rollout=0.3293) | eval_loss=0.3127 | lr=2.22e-07 | 72.6s (1320 samples/s) | gpu_mem=1.3GB 2026-02-25 20:01:35,276 [INFO] __main__: Epoch 98/100 | train_loss=0.3121 (recon=0.1389 kl=0.0107 rew=0.0819 done=0.0639 rollout=0.3288) | eval_loss=0.3136 | lr=9.87e-08 | 72.2s (1328 samples/s) | gpu_mem=1.3GB 2026-02-25 20:02:47,305 [INFO] __main__: Epoch 99/100 | train_loss=0.3118 (recon=0.1388 kl=0.0107 rew=0.0818 done=0.0639 rollout=0.3285) | eval_loss=0.3140 | lr=2.47e-08 | 72.0s (1331 samples/s) | gpu_mem=1.3GB 2026-02-25 20:03:59,255 [INFO] __main__: Epoch 100/100 | train_loss=0.3119 (recon=0.1389 kl=0.0108 rew=0.0818 done=0.0638 rollout=0.3286) | eval_loss=0.3145 | lr=0.00e+00 | 71.9s (1332 samples/s) | gpu_mem=1.3GB 2026-02-25 20:03:59,299 [INFO] __main__: βββ WORLD MODEL TRAINING COMPLETE βββ 2026-02-25 20:03:59,299 [INFO] __main__: Best eval loss: 0.3124 2026-02-25 20:03:59,299 [INFO] __main__: Best checkpoint: checkpoints/world-model/tutoring_rssm_best.pt 2026-02-25 20:03:59,299 [INFO] __main__: Final checkpoint: checkpoints/world-model/tutoring_rssm_final.pt ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ World Model Training Complete ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Best checkpoint: checkpoints/world-model/tutoring_rssm_best.pt ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |