Auto-sync checkpoint during training

Browse files

Files changed (3) hide show

log/log-train-2026-01-13-11-15-39 +47 -2
tensorboard/events.out.tfevents.1768302939.8e64ffbd666a.73099.0 +2 -2
tensorboard/events.out.tfevents.1768302939.8e64ffbd666a.73100.0 +2 -2

log/log-train-2026-01-13-11-15-39 CHANGED Viewed

@@ -164,5 +164,50 @@
 2026-01-13 11:2026-01-13 11:21:33,034 INFO [zipformer.py:1188] warmup_begin=1333.3, warmup_end=2000.0, batch_count=439.0, num_to_drop=2, layers_to_drop={0, 2}
 2026-01-13 11:21:36,909 INFO [zipformer.py:1188] warmup_begin=3333.3, warmup_end=4000.0, batch_count=448.0, num_to_drop=2, layers_to_drop={1, 3}
 2026-01-13 11:21:38,077 INFO [train.py:895] Epoch 1, batch 450, loss[loss=0.9799, simple_loss=0.7873, pruned_loss=0.9287, over 1340.00 frames. ], tot_loss[loss=1.02, simple_loss=0.853, pruned_loss=0.9918, over 233614.14 frames. ], batch size: 4, lr: 4.75e-02, grad_scale: 4.0
- batch_count=448.0, num_to_drop=2, layers_to_drop={0, 1}
-2026-01-13 11:21:36,807 INFO [train.py:895] Epoch 1, batch 450, loss[loss=0.9186, simple_loss=0.7324, pruned_loss=0.887, over 1332.00 frames. ], tot_loss[loss=1.02, simple_loss=0.8523, pruned_loss=0.9903, over 233825.80 frames. ], batch size: 4, lr: 4.75e-02, grad_scale: 4.0

 2026-01-13 11:2026-01-13 11:21:33,034 INFO [zipformer.py:1188] warmup_begin=1333.3, warmup_end=2000.0, batch_count=439.0, num_to_drop=2, layers_to_drop={0, 2}
 2026-01-13 11:21:36,909 INFO [zipformer.py:1188] warmup_begin=3333.3, warmup_end=4000.0, batch_count=448.0, num_to_drop=2, layers_to_drop={1, 3}
 2026-01-13 11:21:38,077 INFO [train.py:895] Epoch 1, batch 450, loss[loss=0.9799, simple_loss=0.7873, pruned_loss=0.9287, over 1340.00 frames. ], tot_loss[loss=1.02, simple_loss=0.853, pruned_loss=0.9918, over 233614.14 frames. ], batch size: 4, lr: 4.75e-02, grad_scale: 4.0
+2026-01-13 11:21:49,271 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=192, metric=12.09 vs. limit=2.0
+2026-01-13 11:21:52,525 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=4.05 vs. limit=2.0
+2026-01-13 11:21:54,995 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=192, metric=9.42 vs2026-01-13 12026-01-13 11:21:58,741 INFO [optim.py:365] Clipping_scale=2.0, grad-norm quartiles 1.640e+01 2.240e+01 2.443e+01 2.816e+01 5.056e+01, threshold=4.885e+01, percent-clipped=1.0
+2026-01-13 11:21:58,815 INFO [train.py:895] Epoch 1, batch 500, loss[loss=0.9501, simple_loss=0.7581, pruned_loss=0.8815, over 1402.00 frames. ], tot_loss[loss=1.007, simple_loss=0.8332, pruned_loss=0.9714, over 240701.63 frames. ], batch size: 5, lr: 4.99e-02, gra2026-01-13 11:22:17,984 INFO [train.py:895] Epoch 1, batch 550, loss[loss=1.27, simple_loss=0.9931, pruned_loss=1.19, over 13492026-01-13 11:22:19,432 INFO [train.py:895] Epoch 1, batch 550, loss[loss=0.9545, simple_loss=0.7527, pruned_loss=0.8779, over 1360.00 frames. ], t2026-01-13 11:22:22,721 INFO [zipformer.py:1188] warmup_begin=2666.7, warmup_end=3333.3, batch_count=562.0, num_to_drop=1, layers_to_drop={1}
+2026-01-13 11:22:23,190 INFO [zipformer.py:2441] attn_weights_entropy = tensor([4.3523, 4.3526, 4.3531, 2026-01-13 11:22:24,121 INFO [zipformer.py:1188] warmup_begin=2666.7, warmup_end=3333.3, batch_count=562.0, num_to_drop=1, layers_to_drop={1}
+2026-01-13 11:22:24,210 INFO [scaling.py:681] Whitening: num_groups=1, num_channels=384, metric=30.06 vs. limit=5.0
+2026-01-13 11:22:26,659 INFO [zipformer.py:1188] warmup_begin=2666.7, warmup_end=3333.3, batch_count=568.0, num_to_drop=1, layers_to_drop={0}
+2026-01-13 11:22:35,666 INFO [zipformer.py:1188] warmup_begin=2666.7, warmup_end=3333.3, batch_count=590.0, num_to_drop=1, layers_to_drop={0}
+2026-01-13 11:22:36,243 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=2.71 vs. limit=2.0
+2026-01-13 11:22:40,168 INFO [zipformer.py:1188] warmup_begin=2000.0, warmup_end=2666.7, batch_count=600.0, num_to_drop=2, layers_to_drop={1, 2}
+2026-01-13 11:22:40,394 INFO [optim.py:365] Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.259e+01 2.444e+01 2.796e+01 5.546e+01, threshold=4.889e+01, percent-clipped=1.0
+2026-01-13 11:22:40,467 INFO [train.py:895] Epoch 1, batch 600, loss[loss=0.9216, simple_loss=0.7192, pruned_loss=0.8379, over 1245.00 frames. ], tot_loss[loss=0.9881, simple_loss=0.8028, pruned_loss=0.9325, over 249086.99 frames. ], batch size: 5, lr: 4.98e-02, grad_scale: 4.0
+2026-01-13 11:22:49,627 INFO [zipformer.py:1188] warmup_begin=3333.3, warmup_end=4000.0, batch_count=623.0, num_to_drop=2, layers_to_drop={0, 2}
+2026-01-13 11:22:51,067 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=192, metric=17.23 vs. limit=2.0
+2026-01-13 11:22:52,102 INFO [zipformer.py:1188] warmup_begin=3333.3, warmup_end=4000.0, batch_count=629.0, num_to_drop=2, layers_to_drop={1, 2}
+2026-01-13 11:22:59,940 INFO [zipformer.py:1188] warmup_begin=666.7, warmup_end=1333.3, batch_count=648.0, num_to_drop=1, layers_to_drop={1}
+2026-01-13 11:23:01,109 INFO [train.py:895] Epoch 1, batch 650, loss[loss=1.035, simple_loss=0.7986, pruned_loss=0.9314, over 1368.00 frames. ], tot_loss[loss=0.9818, simple_loss=0.79, pruned_loss=0.9164, over 252792.16 frames. ], batch size: 4, lr: 4.98e-02, grad_scale: 4.0
+2026-01-13 11:23:01,188 INFO [zipformer.py:1188] warmup_begin=3333.3, warmup_end=4000.0, batch_count=651.0, num_to_drop=2, layers_to_drop={1, 2}
+2026-01-13 11:23:01,559 INFO [zipformer.py:1188] warmup_begin=1333.3, warmup_end=2000.0, batch_count=652.0, num_to_drop=2, layers_to_drop={0, 1}
+2026-01-13 11:23:04,278 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=4.69 vs. limit=2.0
+2026-01-13 11:23:16,824 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=192, metric=16.62 vs. limit=2.0
+2026-01-13 11:23:21,355 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=192, metric=18.34 vs. limit=2.0
+2026-01-13 11:23:21,923 INFO [optim.py:365] Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.404e+01 2.549e+01 2.939e+01 2.087e+02, threshold=5.099e+01, percent-clipped=1.0
+2026-01-13 11:23:21,994 INFO [train.py:895] Epoch 1, batch 700, loss[loss=0.8437, simple_loss=0.6385, pruned_loss=0.7633, over 1446.00 frames. ], tot_loss[loss=0.9797, simple_loss=0.7806, pruned_loss=0.905, over 254941.09 frames. ], batch size: 4, lr: 4.98e-02, grad_scale: 4.0
+2026-01-13 11:23:37,476 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=7.37 vs. limit=2.0
+2026-01-13 11:23:37,738 INFO [zipformer.py:1188] warmup_begin=2000.0, warmup_end=2666.7, batch_count=739.0, num_to_drop=2, layers_to_drop={1, 2}
+2026-01-13 11:23:38,462 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=3.71 vs. limit=2.0
+2026-01-13 11:23:39,428 INFO [zipformer.py:1188] warmup_begin=1333.3, warmup_end=2000.0, batch_count=743.0, num_to_drop=2, layers_to_drop={0, 3}
+2026-01-13 11:23:42,594 INFO [train.py:895] Epoch 1, batch 750, loss[loss=0.9901, simple_loss=0.7447, pruned_loss=0.8798, over 1348.00 frames. ], tot_loss[loss=0.9787, simple_loss=0.7722, pruned_loss=0.8945, over 256006.06 frames. ], batch size: 4, lr: 4.97e-02, grad_scale: 4.0
+2026-01-13 11:23:51,533 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=5.40 vs. limit=2.0
+2026-01-13 11:23:52,776 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=3.20 vs. limit=2.0
+2026-01-13 11:23:57,735 INFO [zipformer.py:1188] warmup_begin=666.7, warmup_end=1333.3, batch_count=787.0, num_to_drop=1, layers_to_drop={1}
+2026-01-13 11:24:03,503 INFO [optim.py:365] Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.448e+01 2.745e+01 3.247e+01 1.125e+02, threshold=5.490e+01, percent-clipped=4.0
+2026-01-13 11:24:03,578 INFO [train.py:895] Epoch 1, batch 800, loss[loss=0.9385, simple_loss=0.6966, pruned_loss=0.8282, over 1138.00 frames. ], tot_loss[loss=0.9765, simple_loss=0.7626, pruned_loss=0.8838, over 257371.31 frames. ], batch size: 3, lr: 4.97e-02, grad_scale: 8.0
+2026-01-13 11:24:07,262 INFO [train.py:1204] Saving batch to /kaggle/working/amharic_training/exp_amharic_streaming/batch-bdd640fb-0667-1ad1-1c80-317fa3b1799d.pt
+2026-01-13 11:24:07,269 INFO [train.py:1210] features shape: torch.Size([4, 1357, 80])
+2026-01-13 11:24:07,270 INFO [train.py:1214] num tokens: 203
+5],
+       device='cuda:0'), in_proj_covar=tensor([0.0019, 0.0023, 0.0018, 0.0023, 0.0021, 0.0021, 0.0015, 0.0019],
+       device='cuda:0'), out_proj_covar=tensor([1.6519e-05, 2.2933e-05, 1.5402e-05, 2.1749e-05, 1.9232e-05, 2.0288e-05,
+        1.3570e-05, 1.7327e-05], device='cuda:0')
+2026-01-13 11:23:55,720 INFO [zipformer.py:1188] warmup_begin=666.7, warmup_end=1333.3, batch_count=787.0, num_to_drop=1, layers_to_drop={0}
+2026-01-13 11:24:02,086 INFO [optim.py:365] Clipping_scale=2.0, grad-norm quartiles 2.169e+01 2.544e+01 2.866e+01 3.372e+01 1.237e+02, threshold=5.733e+01, percent-clipped=4.0
+2026-01-13 11:24:02,164 INFO [train.py:895] Epoch 1, batch 800, loss[loss=1.057, simple_loss=0.7852, pruned_loss=0.9327, over 1138.00 frames. ], tot_loss[loss=0.9867, simple_loss=0.7702, pruned_loss=0.8932, over 257287.78 frames. ], batch size: 3, lr: 4.97e-02, grad_scale: 8.0
+2026-01-13 11:24:07,129 INFO [train.py:1204] Saving batch to /kaggle/working/amharic_training/exp_amharic_streaming/batch-bdd640fb-0667-1ad1-1c80-317fa3b1799d.pt
+2026-01-13 11:24:07,134 INFO [train.py:1210] features shape: torch.Size([4, 1455, 80])
+2026-01-13 11:24:07,138 INFO [train.py:1214] num tokens: 259

tensorboard/events.out.tfevents.1768302939.8e64ffbd666a.73099.0 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:009de64d9947e993c86d0943b6de2d8e3283c94c1e100e9bce04aa63184a2e6a
-size 2171

 version https://git-lfs.github.com/spec/v1
+oid sha256:95d58b05b29a447b5f6a258451eaead74e206bf9906d71e9a239f89ecb21bd18
+size 7352

tensorboard/events.out.tfevents.1768302939.8e64ffbd666a.73100.0 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c35e1721129a26ae993933db8a6a37e9a457eb31db0bba3cbd759e1e468f5460
-size 2171

 version https://git-lfs.github.com/spec/v1
+oid sha256:24943379a034b1b7c7d000c126585067d64d89c2e9675300d5c40f5bf04d0141
+size 7352