File size: 27,896 Bytes
76e4ab1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
nohup: ignoring input
2026-02-25 18:05:41,969 [INFO] __main__: ═══ WORLD MODEL TRAINING ═══
2026-02-25 18:05:41,969 [INFO] __main__:   Trajectories: data/training/tutoring_trajectories_merged.pt
2026-02-25 18:05:41,969 [INFO] __main__:   Device: cuda
2026-02-25 18:05:41,969 [INFO] __main__:   Config: obs=20, act=8, latent=128, hidden=512
2026-02-25 18:05:41,969 [INFO] __main__:   Rollout: horizon=5, discount=0.95, weight=0.50
2026-02-25 18:05:42,158 [INFO] __main__: Loaded trajectory dataset: 100901 trajectories, seq_len=20
2026-02-25 18:05:42,172 [INFO] __main__:   Train: 95856 trajectories, Eval: 5045 trajectories
2026-02-25 18:05:42,196 [INFO] __main__: TutoringRSSM initialized: 2802838 trainable params (obs=20, act=8, latent=128, hidden=512)
2026-02-25 18:05:43,302 [INFO] __main__:   AMP: enabled (dtype=torch.bfloat16)
2026-02-25 18:06:54,815 [INFO] __main__: Epoch   1/100 | train_loss=1.1062 (recon=0.8257 kl=0.0119 rew=0.1221 done=0.2374 rollout=1.0153) | eval_loss=0.5283 | lr=1.00e-04 | 71.5s (1340 samples/s) | gpu_mem=1.3GB
2026-02-25 18:06:54,842 [INFO] __main__:   β˜… New best eval loss: 0.5283 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:08:05,197 [INFO] __main__: Epoch   2/100 | train_loss=0.5135 (recon=0.2962 kl=0.0162 rew=0.1142 done=0.1189 rollout=0.4816) | eval_loss=0.4655 | lr=9.99e-05 | 70.4s (1362 samples/s) | gpu_mem=1.3GB
2026-02-25 18:08:05,217 [INFO] __main__:   β˜… New best eval loss: 0.4655 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:09:15,732 [INFO] __main__: Epoch   3/100 | train_loss=0.4439 (recon=0.2452 kl=0.0068 rew=0.1086 done=0.0963 rollout=0.4309) | eval_loss=0.4277 | lr=9.98e-05 | 70.5s (1359 samples/s) | gpu_mem=1.3GB
2026-02-25 18:09:15,753 [INFO] __main__:   β˜… New best eval loss: 0.4277 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:10:25,717 [INFO] __main__: Epoch   4/100 | train_loss=0.4088 (recon=0.2179 kl=0.0087 rew=0.1034 done=0.0865 rollout=0.4011) | eval_loss=0.3946 | lr=9.96e-05 | 70.0s (1370 samples/s) | gpu_mem=1.3GB
2026-02-25 18:10:25,739 [INFO] __main__:   β˜… New best eval loss: 0.3946 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:11:36,483 [INFO] __main__: Epoch   5/100 | train_loss=0.3867 (recon=0.2010 kl=0.0095 rew=0.0995 done=0.0816 rollout=0.3817) | eval_loss=0.3807 | lr=9.94e-05 | 70.7s (1355 samples/s) | gpu_mem=1.3GB
2026-02-25 18:11:36,506 [INFO] __main__:   β˜… New best eval loss: 0.3807 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:12:47,250 [INFO] __main__: Epoch   6/100 | train_loss=0.3736 (recon=0.1909 kl=0.0102 rew=0.0966 done=0.0785 rollout=0.3709) | eval_loss=0.3709 | lr=9.91e-05 | 70.7s (1355 samples/s) | gpu_mem=1.3GB
2026-02-25 18:12:47,274 [INFO] __main__:   β˜… New best eval loss: 0.3709 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:13:58,025 [INFO] __main__: Epoch   7/100 | train_loss=0.3653 (recon=0.1835 kl=0.0108 rew=0.0947 done=0.0765 rollout=0.3652) | eval_loss=0.3697 | lr=9.88e-05 | 70.8s (1355 samples/s) | gpu_mem=1.3GB
2026-02-25 18:13:58,046 [INFO] __main__:   β˜… New best eval loss: 0.3697 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:15:08,628 [INFO] __main__: Epoch   8/100 | train_loss=0.3587 (recon=0.1779 kl=0.0113 rew=0.0928 done=0.0748 rollout=0.3606) | eval_loss=0.3572 | lr=9.84e-05 | 70.6s (1358 samples/s) | gpu_mem=1.3GB
2026-02-25 18:15:08,651 [INFO] __main__:   β˜… New best eval loss: 0.3572 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:16:19,315 [INFO] __main__: Epoch   9/100 | train_loss=0.3522 (recon=0.1725 kl=0.0115 rew=0.0910 done=0.0731 rollout=0.3563) | eval_loss=0.3507 | lr=9.80e-05 | 70.7s (1357 samples/s) | gpu_mem=1.3GB
2026-02-25 18:16:19,340 [INFO] __main__:   β˜… New best eval loss: 0.3507 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:17:30,150 [INFO] __main__: Epoch  10/100 | train_loss=0.3475 (recon=0.1685 kl=0.0114 rew=0.0898 done=0.0719 rollout=0.3534) | eval_loss=0.3452 | lr=9.76e-05 | 70.8s (1354 samples/s) | gpu_mem=1.3GB
2026-02-25 18:17:30,171 [INFO] __main__:   β˜… New best eval loss: 0.3452 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:18:41,124 [INFO] __main__: Epoch  11/100 | train_loss=0.3426 (recon=0.1645 kl=0.0112 rew=0.0886 done=0.0707 rollout=0.3503) | eval_loss=0.3483 | lr=9.70e-05 | 70.9s (1351 samples/s) | gpu_mem=1.3GB
2026-02-25 18:19:51,548 [INFO] __main__: Epoch  12/100 | train_loss=0.3404 (recon=0.1625 kl=0.0110 rew=0.0879 done=0.0701 rollout=0.3492) | eval_loss=0.3401 | lr=9.65e-05 | 70.4s (1361 samples/s) | gpu_mem=1.3GB
2026-02-25 18:19:51,571 [INFO] __main__:   β˜… New best eval loss: 0.3401 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:21:02,429 [INFO] __main__: Epoch  13/100 | train_loss=0.3379 (recon=0.1607 kl=0.0111 rew=0.0871 done=0.0693 rollout=0.3476) | eval_loss=0.3385 | lr=9.59e-05 | 70.9s (1353 samples/s) | gpu_mem=1.3GB
2026-02-25 18:21:02,450 [INFO] __main__:   β˜… New best eval loss: 0.3385 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:22:12,961 [INFO] __main__: Epoch  14/100 | train_loss=0.3375 (recon=0.1606 kl=0.0112 rew=0.0868 done=0.0690 rollout=0.3473) | eval_loss=0.3408 | lr=9.52e-05 | 70.5s (1359 samples/s) | gpu_mem=1.3GB
2026-02-25 18:23:23,462 [INFO] __main__: Epoch  15/100 | train_loss=0.3363 (recon=0.1591 kl=0.0114 rew=0.0866 done=0.0688 rollout=0.3467) | eval_loss=0.3414 | lr=9.46e-05 | 70.5s (1360 samples/s) | gpu_mem=1.3GB
2026-02-25 18:24:33,788 [INFO] __main__: Epoch  16/100 | train_loss=0.3351 (recon=0.1586 kl=0.0111 rew=0.0862 done=0.0685 rollout=0.3456) | eval_loss=0.3473 | lr=9.38e-05 | 70.3s (1363 samples/s) | gpu_mem=1.3GB
2026-02-25 18:25:44,746 [INFO] __main__: Epoch  17/100 | train_loss=0.5437 (recon=0.1957 kl=0.3120 rew=0.0954 done=0.0791 rollout=0.4052) | eval_loss=0.4109 | lr=9.30e-05 | 71.0s (1351 samples/s) | gpu_mem=1.3GB
2026-02-25 18:26:55,420 [INFO] __main__: Epoch  18/100 | train_loss=0.3521 (recon=0.1768 kl=0.0077 rew=0.0899 done=0.0727 rollout=0.3571) | eval_loss=0.3392 | lr=9.22e-05 | 70.7s (1356 samples/s) | gpu_mem=1.3GB
2026-02-25 18:28:05,836 [INFO] __main__: Epoch  19/100 | train_loss=0.3347 (recon=0.1594 kl=0.0092 rew=0.0868 done=0.0689 rollout=0.3450) | eval_loss=0.3335 | lr=9.14e-05 | 70.4s (1361 samples/s) | gpu_mem=1.3GB
2026-02-25 18:28:05,858 [INFO] __main__:   β˜… New best eval loss: 0.3335 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:29:16,516 [INFO] __main__: Epoch  20/100 | train_loss=0.3308 (recon=0.1559 kl=0.0098 rew=0.0856 done=0.0679 rollout=0.3425) | eval_loss=0.3300 | lr=9.05e-05 | 70.7s (1357 samples/s) | gpu_mem=1.3GB
2026-02-25 18:29:16,539 [INFO] __main__:   β˜… New best eval loss: 0.3300 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:30:27,172 [INFO] __main__: Epoch  21/100 | train_loss=0.3289 (recon=0.1543 kl=0.0101 rew=0.0850 done=0.0672 rollout=0.3412) | eval_loss=0.3289 | lr=8.95e-05 | 70.6s (1358 samples/s) | gpu_mem=1.3GB
2026-02-25 18:30:27,194 [INFO] __main__:   β˜… New best eval loss: 0.3289 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:31:37,839 [INFO] __main__: Epoch  22/100 | train_loss=0.3281 (recon=0.1536 kl=0.0103 rew=0.0846 done=0.0669 rollout=0.3406) | eval_loss=0.3292 | lr=8.85e-05 | 70.6s (1357 samples/s) | gpu_mem=1.3GB
2026-02-25 18:32:48,010 [INFO] __main__: Epoch  23/100 | train_loss=0.3272 (recon=0.1531 kl=0.0104 rew=0.0843 done=0.0665 rollout=0.3400) | eval_loss=0.3296 | lr=8.75e-05 | 70.2s (1366 samples/s) | gpu_mem=1.3GB
2026-02-25 18:33:58,113 [INFO] __main__: Epoch  24/100 | train_loss=0.3269 (recon=0.1525 kl=0.0105 rew=0.0841 done=0.0664 rollout=0.3401) | eval_loss=0.3279 | lr=8.64e-05 | 70.1s (1367 samples/s) | gpu_mem=1.3GB
2026-02-25 18:33:58,135 [INFO] __main__:   β˜… New best eval loss: 0.3279 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:35:09,021 [INFO] __main__: Epoch  25/100 | train_loss=0.3263 (recon=0.1523 kl=0.0105 rew=0.0840 done=0.0663 rollout=0.3396) | eval_loss=0.3275 | lr=8.54e-05 | 70.9s (1352 samples/s) | gpu_mem=1.3GB
2026-02-25 18:35:09,044 [INFO] __main__:   β˜… New best eval loss: 0.3275 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:36:19,718 [INFO] __main__: Epoch  26/100 | train_loss=0.3260 (recon=0.1522 kl=0.0106 rew=0.0837 done=0.0660 rollout=0.3395) | eval_loss=0.3315 | lr=8.42e-05 | 70.7s (1356 samples/s) | gpu_mem=1.3GB
2026-02-25 18:37:29,992 [INFO] __main__: Epoch  27/100 | train_loss=0.3259 (recon=0.1518 kl=0.0107 rew=0.0837 done=0.0660 rollout=0.3395) | eval_loss=0.3270 | lr=8.31e-05 | 70.3s (1364 samples/s) | gpu_mem=1.3GB
2026-02-25 18:37:30,015 [INFO] __main__:   β˜… New best eval loss: 0.3270 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:38:40,921 [INFO] __main__: Epoch  28/100 | train_loss=0.3266 (recon=0.1520 kl=0.0110 rew=0.0839 done=0.0661 rollout=0.3402) | eval_loss=0.3265 | lr=8.19e-05 | 70.9s (1352 samples/s) | gpu_mem=1.3GB
2026-02-25 18:38:40,942 [INFO] __main__:   β˜… New best eval loss: 0.3265 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:39:51,355 [INFO] __main__: Epoch  29/100 | train_loss=0.3256 (recon=0.1513 kl=0.0110 rew=0.0836 done=0.0658 rollout=0.3395) | eval_loss=0.3274 | lr=8.06e-05 | 70.4s (1361 samples/s) | gpu_mem=1.3GB
2026-02-25 18:41:02,495 [INFO] __main__: Epoch  30/100 | train_loss=0.3250 (recon=0.1509 kl=0.0111 rew=0.0834 done=0.0656 rollout=0.3390) | eval_loss=0.3284 | lr=7.94e-05 | 71.1s (1347 samples/s) | gpu_mem=1.3GB
2026-02-25 18:42:12,904 [INFO] __main__: Epoch  31/100 | train_loss=0.3251 (recon=0.1508 kl=0.0111 rew=0.0834 done=0.0656 rollout=0.3392) | eval_loss=0.3278 | lr=7.81e-05 | 70.4s (1362 samples/s) | gpu_mem=1.3GB
2026-02-25 18:43:23,731 [INFO] __main__: Epoch  32/100 | train_loss=0.3253 (recon=0.1507 kl=0.0113 rew=0.0836 done=0.0658 rollout=0.3392) | eval_loss=0.3256 | lr=7.68e-05 | 70.8s (1353 samples/s) | gpu_mem=1.3GB
2026-02-25 18:43:23,754 [INFO] __main__:   β˜… New best eval loss: 0.3256 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:44:34,007 [INFO] __main__: Epoch  33/100 | train_loss=0.3250 (recon=0.1503 kl=0.0113 rew=0.0835 done=0.0657 rollout=0.3392) | eval_loss=0.3246 | lr=7.55e-05 | 70.3s (1364 samples/s) | gpu_mem=1.3GB
2026-02-25 18:44:34,030 [INFO] __main__:   β˜… New best eval loss: 0.3246 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:45:45,357 [INFO] __main__: Epoch  34/100 | train_loss=0.3250 (recon=0.1502 kl=0.0116 rew=0.0835 done=0.0657 rollout=0.3390) | eval_loss=0.3235 | lr=7.41e-05 | 71.3s (1344 samples/s) | gpu_mem=1.3GB
2026-02-25 18:45:45,380 [INFO] __main__:   β˜… New best eval loss: 0.3235 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:46:56,106 [INFO] __main__: Epoch  35/100 | train_loss=0.3236 (recon=0.1495 kl=0.0113 rew=0.0833 done=0.0655 rollout=0.3377) | eval_loss=0.3261 | lr=7.27e-05 | 70.7s (1355 samples/s) | gpu_mem=1.3GB
2026-02-25 18:48:06,339 [INFO] __main__: Epoch  36/100 | train_loss=0.3235 (recon=0.1490 kl=0.0114 rew=0.0833 done=0.0655 rollout=0.3377) | eval_loss=0.3237 | lr=7.13e-05 | 70.2s (1365 samples/s) | gpu_mem=1.3GB
2026-02-25 18:49:16,519 [INFO] __main__: Epoch  37/100 | train_loss=0.3236 (recon=0.1495 kl=0.0115 rew=0.0831 done=0.0653 rollout=0.3377) | eval_loss=0.3267 | lr=6.99e-05 | 70.2s (1366 samples/s) | gpu_mem=1.3GB
2026-02-25 18:50:27,556 [INFO] __main__: Epoch  38/100 | train_loss=0.3527 (recon=0.1496 kl=0.0665 rew=0.0836 done=0.0659 rollout=0.3398) | eval_loss=2.2169 | lr=6.84e-05 | 71.0s (1349 samples/s) | gpu_mem=1.3GB
2026-02-25 18:51:38,153 [INFO] __main__: Epoch  39/100 | train_loss=0.3815 (recon=0.1745 kl=0.0569 rew=0.0906 done=0.0711 rollout=0.3697) | eval_loss=0.3257 | lr=6.69e-05 | 70.6s (1358 samples/s) | gpu_mem=1.3GB
2026-02-25 18:52:49,003 [INFO] __main__: Epoch  40/100 | train_loss=0.3221 (recon=0.1484 kl=0.0096 rew=0.0837 done=0.0659 rollout=0.3367) | eval_loss=0.3214 | lr=6.55e-05 | 70.8s (1353 samples/s) | gpu_mem=1.3GB
2026-02-25 18:52:49,026 [INFO] __main__:   β˜… New best eval loss: 0.3214 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:53:59,507 [INFO] __main__: Epoch  41/100 | train_loss=0.3204 (recon=0.1467 kl=0.0101 rew=0.0829 done=0.0652 rollout=0.3358) | eval_loss=0.3207 | lr=6.39e-05 | 70.5s (1360 samples/s) | gpu_mem=1.3GB
2026-02-25 18:53:59,530 [INFO] __main__:   β˜… New best eval loss: 0.3207 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:55:10,159 [INFO] __main__: Epoch  42/100 | train_loss=0.3198 (recon=0.1463 kl=0.0105 rew=0.0826 done=0.0649 rollout=0.3353) | eval_loss=0.3206 | lr=6.24e-05 | 70.6s (1357 samples/s) | gpu_mem=1.3GB
2026-02-25 18:55:10,182 [INFO] __main__:   β˜… New best eval loss: 0.3206 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:56:20,740 [INFO] __main__: Epoch  43/100 | train_loss=0.3191 (recon=0.1458 kl=0.0105 rew=0.0825 done=0.0647 rollout=0.3348) | eval_loss=0.3209 | lr=6.09e-05 | 70.6s (1359 samples/s) | gpu_mem=1.3GB
2026-02-25 18:57:31,289 [INFO] __main__: Epoch  44/100 | train_loss=0.3191 (recon=0.1458 kl=0.0108 rew=0.0822 done=0.0645 rollout=0.3350) | eval_loss=0.3205 | lr=5.94e-05 | 70.5s (1359 samples/s) | gpu_mem=1.3GB
2026-02-25 18:57:31,312 [INFO] __main__:   β˜… New best eval loss: 0.3205 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:58:42,262 [INFO] __main__: Epoch  45/100 | train_loss=0.3190 (recon=0.1455 kl=0.0109 rew=0.0823 done=0.0644 rollout=0.3349) | eval_loss=0.3199 | lr=5.78e-05 | 70.9s (1351 samples/s) | gpu_mem=1.3GB
2026-02-25 18:58:42,284 [INFO] __main__:   β˜… New best eval loss: 0.3199 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 18:59:53,374 [INFO] __main__: Epoch  46/100 | train_loss=0.3185 (recon=0.1452 kl=0.0108 rew=0.0822 done=0.0643 rollout=0.3346) | eval_loss=0.3209 | lr=5.63e-05 | 71.1s (1348 samples/s) | gpu_mem=1.3GB
2026-02-25 19:01:04,213 [INFO] __main__: Epoch  47/100 | train_loss=0.3188 (recon=0.1451 kl=0.0110 rew=0.0824 done=0.0644 rollout=0.3347) | eval_loss=0.3196 | lr=5.47e-05 | 70.8s (1353 samples/s) | gpu_mem=1.3GB
2026-02-25 19:01:04,236 [INFO] __main__:   β˜… New best eval loss: 0.3196 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 19:02:14,681 [INFO] __main__: Epoch  48/100 | train_loss=0.3182 (recon=0.1448 kl=0.0110 rew=0.0822 done=0.0642 rollout=0.3341) | eval_loss=0.3195 | lr=5.31e-05 | 70.4s (1361 samples/s) | gpu_mem=1.3GB
2026-02-25 19:02:14,704 [INFO] __main__:   β˜… New best eval loss: 0.3195 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 19:03:25,389 [INFO] __main__: Epoch  49/100 | train_loss=0.3182 (recon=0.1448 kl=0.0110 rew=0.0822 done=0.0642 rollout=0.3342) | eval_loss=0.3294 | lr=5.16e-05 | 70.7s (1356 samples/s) | gpu_mem=1.3GB
2026-02-25 19:04:36,190 [INFO] __main__: Epoch  50/100 | train_loss=0.3184 (recon=0.1445 kl=0.0111 rew=0.0822 done=0.0643 rollout=0.3346) | eval_loss=0.3213 | lr=5.00e-05 | 70.8s (1354 samples/s) | gpu_mem=1.3GB
2026-02-25 19:05:46,967 [INFO] __main__: Epoch  51/100 | train_loss=0.3177 (recon=0.1442 kl=0.0110 rew=0.0821 done=0.0642 rollout=0.3339) | eval_loss=0.3190 | lr=4.84e-05 | 70.8s (1355 samples/s) | gpu_mem=1.3GB
2026-02-25 19:05:46,990 [INFO] __main__:   β˜… New best eval loss: 0.3190 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 19:06:57,321 [INFO] __main__: Epoch  52/100 | train_loss=0.3180 (recon=0.1442 kl=0.0111 rew=0.0821 done=0.0642 rollout=0.3344) | eval_loss=0.3201 | lr=4.69e-05 | 70.3s (1363 samples/s) | gpu_mem=1.3GB
2026-02-25 19:08:07,968 [INFO] __main__: Epoch  53/100 | train_loss=0.3179 (recon=0.1437 kl=0.0112 rew=0.0824 done=0.0644 rollout=0.3342) | eval_loss=0.3172 | lr=4.53e-05 | 70.6s (1357 samples/s) | gpu_mem=1.3GB
2026-02-25 19:08:07,991 [INFO] __main__:   β˜… New best eval loss: 0.3172 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 19:09:18,618 [INFO] __main__: Epoch  54/100 | train_loss=0.3170 (recon=0.1433 kl=0.0111 rew=0.0820 done=0.0641 rollout=0.3334) | eval_loss=0.3191 | lr=4.37e-05 | 70.6s (1357 samples/s) | gpu_mem=1.3GB
2026-02-25 19:10:29,306 [INFO] __main__: Epoch  55/100 | train_loss=0.3167 (recon=0.1430 kl=0.0113 rew=0.0820 done=0.0641 rollout=0.3331) | eval_loss=0.3181 | lr=4.22e-05 | 70.7s (1356 samples/s) | gpu_mem=1.3GB
2026-02-25 19:11:40,099 [INFO] __main__: Epoch  56/100 | train_loss=0.3168 (recon=0.1429 kl=0.0113 rew=0.0820 done=0.0642 rollout=0.3332) | eval_loss=0.3191 | lr=4.06e-05 | 70.8s (1354 samples/s) | gpu_mem=1.3GB
2026-02-25 19:12:50,815 [INFO] __main__: Epoch  57/100 | train_loss=0.3163 (recon=0.1424 kl=0.0112 rew=0.0819 done=0.0641 rollout=0.3329) | eval_loss=0.3188 | lr=3.91e-05 | 70.7s (1356 samples/s) | gpu_mem=1.3GB
2026-02-25 19:14:01,170 [INFO] __main__: Epoch  58/100 | train_loss=0.3168 (recon=0.1426 kl=0.0114 rew=0.0820 done=0.0641 rollout=0.3335) | eval_loss=0.3182 | lr=3.76e-05 | 70.4s (1362 samples/s) | gpu_mem=1.3GB
2026-02-25 19:15:12,063 [INFO] __main__: Epoch  59/100 | train_loss=0.3163 (recon=0.1425 kl=0.0113 rew=0.0820 done=0.0640 rollout=0.3327) | eval_loss=0.3188 | lr=3.61e-05 | 70.9s (1352 samples/s) | gpu_mem=1.3GB
2026-02-25 19:16:22,721 [INFO] __main__: Epoch  60/100 | train_loss=0.3157 (recon=0.1421 kl=0.0113 rew=0.0818 done=0.0639 rollout=0.3322) | eval_loss=0.3179 | lr=3.45e-05 | 70.7s (1357 samples/s) | gpu_mem=1.3GB
2026-02-25 19:17:33,459 [INFO] __main__: Epoch  61/100 | train_loss=0.3162 (recon=0.1420 kl=0.0114 rew=0.0820 done=0.0641 rollout=0.3328) | eval_loss=0.3165 | lr=3.31e-05 | 70.7s (1356 samples/s) | gpu_mem=1.3GB
2026-02-25 19:17:33,480 [INFO] __main__:   β˜… New best eval loss: 0.3165 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 19:18:44,368 [INFO] __main__: Epoch  62/100 | train_loss=0.3155 (recon=0.1415 kl=0.0113 rew=0.0820 done=0.0640 rollout=0.3321) | eval_loss=0.3156 | lr=3.16e-05 | 70.9s (1352 samples/s) | gpu_mem=1.3GB
2026-02-25 19:18:44,389 [INFO] __main__:   β˜… New best eval loss: 0.3156 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 19:19:55,957 [INFO] __main__: Epoch  63/100 | train_loss=0.3151 (recon=0.1414 kl=0.0112 rew=0.0819 done=0.0640 rollout=0.3317) | eval_loss=0.3181 | lr=3.01e-05 | 71.6s (1339 samples/s) | gpu_mem=1.3GB
2026-02-25 19:21:06,500 [INFO] __main__: Epoch  64/100 | train_loss=0.3146 (recon=0.1412 kl=0.0112 rew=0.0817 done=0.0639 rollout=0.3313) | eval_loss=0.3156 | lr=2.87e-05 | 70.5s (1359 samples/s) | gpu_mem=1.3GB
2026-02-25 19:22:18,147 [INFO] __main__: Epoch  65/100 | train_loss=0.3152 (recon=0.1415 kl=0.0114 rew=0.0819 done=0.0640 rollout=0.3317) | eval_loss=0.3259 | lr=2.73e-05 | 71.6s (1338 samples/s) | gpu_mem=1.3GB
2026-02-25 19:23:29,450 [INFO] __main__: Epoch  66/100 | train_loss=0.3153 (recon=0.1414 kl=0.0113 rew=0.0820 done=0.0641 rollout=0.3318) | eval_loss=0.3175 | lr=2.59e-05 | 71.3s (1344 samples/s) | gpu_mem=1.3GB
2026-02-25 19:24:40,964 [INFO] __main__: Epoch  67/100 | train_loss=0.3145 (recon=0.1408 kl=0.0112 rew=0.0819 done=0.0641 rollout=0.3310) | eval_loss=0.3169 | lr=2.45e-05 | 71.5s (1340 samples/s) | gpu_mem=1.3GB
2026-02-25 19:25:51,897 [INFO] __main__: Epoch  68/100 | train_loss=0.3149 (recon=0.1411 kl=0.0114 rew=0.0819 done=0.0640 rollout=0.3313) | eval_loss=0.3191 | lr=2.32e-05 | 70.9s (1351 samples/s) | gpu_mem=1.3GB
2026-02-25 19:27:02,722 [INFO] __main__: Epoch  69/100 | train_loss=0.3148 (recon=0.1408 kl=0.0112 rew=0.0821 done=0.0642 rollout=0.3313) | eval_loss=0.3160 | lr=2.19e-05 | 70.8s (1353 samples/s) | gpu_mem=1.3GB
2026-02-25 19:28:14,130 [INFO] __main__: Epoch  70/100 | train_loss=0.3139 (recon=0.1406 kl=0.0110 rew=0.0819 done=0.0640 rollout=0.3303) | eval_loss=0.3164 | lr=2.06e-05 | 71.4s (1342 samples/s) | gpu_mem=1.3GB
2026-02-25 19:29:25,313 [INFO] __main__: Epoch  71/100 | train_loss=0.3142 (recon=0.1406 kl=0.0111 rew=0.0819 done=0.0640 rollout=0.3307) | eval_loss=0.3176 | lr=1.94e-05 | 71.2s (1347 samples/s) | gpu_mem=1.3GB
2026-02-25 19:30:36,305 [INFO] __main__: Epoch  72/100 | train_loss=0.3141 (recon=0.1407 kl=0.0111 rew=0.0819 done=0.0640 rollout=0.3307) | eval_loss=0.3148 | lr=1.81e-05 | 71.0s (1350 samples/s) | gpu_mem=1.3GB
2026-02-25 19:30:36,326 [INFO] __main__:   β˜… New best eval loss: 0.3148 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 19:31:47,498 [INFO] __main__: Epoch  73/100 | train_loss=0.3139 (recon=0.1402 kl=0.0111 rew=0.0820 done=0.0640 rollout=0.3305) | eval_loss=0.3138 | lr=1.69e-05 | 71.2s (1347 samples/s) | gpu_mem=1.3GB
2026-02-25 19:31:47,521 [INFO] __main__:   β˜… New best eval loss: 0.3138 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 19:32:58,167 [INFO] __main__: Epoch  74/100 | train_loss=0.3135 (recon=0.1400 kl=0.0109 rew=0.0820 done=0.0640 rollout=0.3301) | eval_loss=0.3154 | lr=1.58e-05 | 70.6s (1357 samples/s) | gpu_mem=1.3GB
2026-02-25 19:34:09,526 [INFO] __main__: Epoch  75/100 | train_loss=0.3139 (recon=0.1399 kl=0.0112 rew=0.0821 done=0.0641 rollout=0.3304) | eval_loss=0.3162 | lr=1.46e-05 | 71.4s (1343 samples/s) | gpu_mem=1.3GB
2026-02-25 19:35:20,593 [INFO] __main__: Epoch  76/100 | train_loss=0.3137 (recon=0.1399 kl=0.0110 rew=0.0820 done=0.0641 rollout=0.3304) | eval_loss=0.3144 | lr=1.36e-05 | 71.1s (1349 samples/s) | gpu_mem=1.3GB
2026-02-25 19:36:31,515 [INFO] __main__: Epoch  77/100 | train_loss=0.3132 (recon=0.1397 kl=0.0109 rew=0.0820 done=0.0640 rollout=0.3299) | eval_loss=0.3146 | lr=1.25e-05 | 70.9s (1352 samples/s) | gpu_mem=1.3GB
2026-02-25 19:37:43,067 [INFO] __main__: Epoch  78/100 | train_loss=0.3128 (recon=0.1395 kl=0.0109 rew=0.0818 done=0.0639 rollout=0.3295) | eval_loss=0.3158 | lr=1.15e-05 | 71.6s (1340 samples/s) | gpu_mem=1.3GB
2026-02-25 19:38:54,333 [INFO] __main__: Epoch  79/100 | train_loss=0.3132 (recon=0.1397 kl=0.0110 rew=0.0819 done=0.0640 rollout=0.3299) | eval_loss=0.3141 | lr=1.05e-05 | 71.3s (1345 samples/s) | gpu_mem=1.3GB
2026-02-25 19:40:05,333 [INFO] __main__: Epoch  80/100 | train_loss=0.3131 (recon=0.1394 kl=0.0109 rew=0.0821 done=0.0641 rollout=0.3297) | eval_loss=0.3148 | lr=9.55e-06 | 71.0s (1350 samples/s) | gpu_mem=1.3GB
2026-02-25 19:41:16,170 [INFO] __main__: Epoch  81/100 | train_loss=0.3127 (recon=0.1395 kl=0.0109 rew=0.0818 done=0.0639 rollout=0.3294) | eval_loss=0.3149 | lr=8.65e-06 | 70.8s (1354 samples/s) | gpu_mem=1.3GB
2026-02-25 19:42:26,882 [INFO] __main__: Epoch  82/100 | train_loss=0.3132 (recon=0.1394 kl=0.0109 rew=0.0820 done=0.0641 rollout=0.3299) | eval_loss=0.3134 | lr=7.78e-06 | 70.7s (1356 samples/s) | gpu_mem=1.3GB
2026-02-25 19:42:26,903 [INFO] __main__:   β˜… New best eval loss: 0.3134 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 19:43:38,250 [INFO] __main__: Epoch  83/100 | train_loss=0.3129 (recon=0.1394 kl=0.0109 rew=0.0820 done=0.0641 rollout=0.3295) | eval_loss=0.3135 | lr=6.96e-06 | 71.3s (1344 samples/s) | gpu_mem=1.3GB
2026-02-25 19:44:48,938 [INFO] __main__: Epoch  84/100 | train_loss=0.3129 (recon=0.1393 kl=0.0109 rew=0.0821 done=0.0641 rollout=0.3296) | eval_loss=0.3134 | lr=6.18e-06 | 70.7s (1356 samples/s) | gpu_mem=1.3GB
2026-02-25 19:44:48,960 [INFO] __main__:   β˜… New best eval loss: 0.3134 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 19:45:59,739 [INFO] __main__: Epoch  85/100 | train_loss=0.3127 (recon=0.1391 kl=0.0108 rew=0.0819 done=0.0639 rollout=0.3295) | eval_loss=0.3146 | lr=5.45e-06 | 70.8s (1354 samples/s) | gpu_mem=1.3GB
2026-02-25 19:47:11,503 [INFO] __main__: Epoch  86/100 | train_loss=0.3126 (recon=0.1391 kl=0.0108 rew=0.0820 done=0.0640 rollout=0.3292) | eval_loss=0.3152 | lr=4.76e-06 | 71.8s (1336 samples/s) | gpu_mem=1.3GB
2026-02-25 19:48:22,493 [INFO] __main__: Epoch  87/100 | train_loss=0.3125 (recon=0.1392 kl=0.0108 rew=0.0819 done=0.0639 rollout=0.3293) | eval_loss=0.3145 | lr=4.11e-06 | 71.0s (1350 samples/s) | gpu_mem=1.3GB
2026-02-25 19:49:34,161 [INFO] __main__: Epoch  88/100 | train_loss=0.3124 (recon=0.1391 kl=0.0107 rew=0.0819 done=0.0640 rollout=0.3291) | eval_loss=0.3147 | lr=3.51e-06 | 71.7s (1338 samples/s) | gpu_mem=1.3GB
2026-02-25 19:50:45,579 [INFO] __main__: Epoch  89/100 | train_loss=0.3123 (recon=0.1391 kl=0.0109 rew=0.0818 done=0.0639 rollout=0.3291) | eval_loss=0.3132 | lr=2.96e-06 | 71.4s (1342 samples/s) | gpu_mem=1.3GB
2026-02-25 19:50:45,600 [INFO] __main__:   β˜… New best eval loss: 0.3132 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 19:51:57,816 [INFO] __main__: Epoch  90/100 | train_loss=0.3123 (recon=0.1390 kl=0.0108 rew=0.0819 done=0.0638 rollout=0.3290) | eval_loss=0.3142 | lr=2.45e-06 | 72.2s (1327 samples/s) | gpu_mem=1.3GB
2026-02-25 19:53:09,370 [INFO] __main__: Epoch  91/100 | train_loss=0.3123 (recon=0.1390 kl=0.0108 rew=0.0819 done=0.0638 rollout=0.3290) | eval_loss=0.3145 | lr=1.99e-06 | 71.5s (1340 samples/s) | gpu_mem=1.3GB
2026-02-25 19:54:20,932 [INFO] __main__: Epoch  92/100 | train_loss=0.3124 (recon=0.1389 kl=0.0108 rew=0.0820 done=0.0641 rollout=0.3291) | eval_loss=0.3143 | lr=1.57e-06 | 71.6s (1339 samples/s) | gpu_mem=1.3GB
2026-02-25 19:55:32,652 [INFO] __main__: Epoch  93/100 | train_loss=0.3122 (recon=0.1391 kl=0.0107 rew=0.0819 done=0.0639 rollout=0.3288) | eval_loss=0.3124 | lr=1.20e-06 | 71.7s (1337 samples/s) | gpu_mem=1.3GB
2026-02-25 19:55:32,682 [INFO] __main__:   β˜… New best eval loss: 0.3124 β†’ checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 19:56:45,681 [INFO] __main__: Epoch  94/100 | train_loss=0.3124 (recon=0.1390 kl=0.0109 rew=0.0820 done=0.0640 rollout=0.3291) | eval_loss=0.3139 | lr=8.86e-07 | 73.0s (1313 samples/s) | gpu_mem=1.3GB
2026-02-25 19:57:57,869 [INFO] __main__: Epoch  95/100 | train_loss=0.3125 (recon=0.1390 kl=0.0108 rew=0.0819 done=0.0639 rollout=0.3293) | eval_loss=0.3136 | lr=6.16e-07 | 72.2s (1328 samples/s) | gpu_mem=1.3GB
2026-02-25 19:59:10,503 [INFO] __main__: Epoch  96/100 | train_loss=0.3121 (recon=0.1390 kl=0.0108 rew=0.0818 done=0.0638 rollout=0.3289) | eval_loss=0.3130 | lr=3.94e-07 | 72.6s (1320 samples/s) | gpu_mem=1.3GB
2026-02-25 20:00:23,114 [INFO] __main__: Epoch  97/100 | train_loss=0.3125 (recon=0.1389 kl=0.0108 rew=0.0820 done=0.0640 rollout=0.3293) | eval_loss=0.3127 | lr=2.22e-07 | 72.6s (1320 samples/s) | gpu_mem=1.3GB
2026-02-25 20:01:35,276 [INFO] __main__: Epoch  98/100 | train_loss=0.3121 (recon=0.1389 kl=0.0107 rew=0.0819 done=0.0639 rollout=0.3288) | eval_loss=0.3136 | lr=9.87e-08 | 72.2s (1328 samples/s) | gpu_mem=1.3GB
2026-02-25 20:02:47,305 [INFO] __main__: Epoch  99/100 | train_loss=0.3118 (recon=0.1388 kl=0.0107 rew=0.0818 done=0.0639 rollout=0.3285) | eval_loss=0.3140 | lr=2.47e-08 | 72.0s (1331 samples/s) | gpu_mem=1.3GB
2026-02-25 20:03:59,255 [INFO] __main__: Epoch 100/100 | train_loss=0.3119 (recon=0.1389 kl=0.0108 rew=0.0818 done=0.0638 rollout=0.3286) | eval_loss=0.3145 | lr=0.00e+00 | 71.9s (1332 samples/s) | gpu_mem=1.3GB
2026-02-25 20:03:59,299 [INFO] __main__: ═══ WORLD MODEL TRAINING COMPLETE ═══
2026-02-25 20:03:59,299 [INFO] __main__:   Best eval loss: 0.3124
2026-02-25 20:03:59,299 [INFO] __main__:   Best checkpoint: checkpoints/world-model/tutoring_rssm_best.pt
2026-02-25 20:03:59,299 [INFO] __main__:   Final checkpoint: checkpoints/world-model/tutoring_rssm_final.pt

════════════════════════════════════════════════════════════
  World Model Training Complete
════════════════════════════════════════════════════════════
  Best checkpoint: checkpoints/world-model/tutoring_rssm_best.pt
════════════════════════════════════════════════════════════