Auto-sync checkpoint during training
Browse files
checkpoint-5000.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2b7b22f784c2bf43d319bf81ab42ad2d2ebe42203086a5e49e290c3c22692ab7
|
| 3 |
+
size 1141963947
|
log/log-train-2026-01-13-10-02-58
CHANGED
|
@@ -1152,3 +1152,63 @@
|
|
| 1152 |
2026-01-13 10:40:55,544 INFO [checkpoint.py:74] Saving checkpoint to /kaggle/working/amharic_training/exp_amharic_streaming/checkpoint-5000.pt
|
| 1153 |
2026-01-13 10:40:57,736 INFO [train.py:895] Epoch 1, batch 5000, loss[loss=1.344, simple_loss=0.8505, pruned_loss=0.9187, over 1226.00 frames. ], tot_loss[loss=1.183, simple_loss=0.7445, pruned_loss=0.811, over 261199.81 frames. ], batch size: 4, lr: 4.20e-02, grad_scale: 8.0
|
| 1154 |
2026-01-13 10:40:58,505 INFO [optim.py:365] Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.550e+02 1.948e+02 2.658e+02 4.998e+02, threshold=3.897e+02, percent-clipped=2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1152 |
2026-01-13 10:40:55,544 INFO [checkpoint.py:74] Saving checkpoint to /kaggle/working/amharic_training/exp_amharic_streaming/checkpoint-5000.pt
|
| 1153 |
2026-01-13 10:40:57,736 INFO [train.py:895] Epoch 1, batch 5000, loss[loss=1.344, simple_loss=0.8505, pruned_loss=0.9187, over 1226.00 frames. ], tot_loss[loss=1.183, simple_loss=0.7445, pruned_loss=0.811, over 261199.81 frames. ], batch size: 4, lr: 4.20e-02, grad_scale: 8.0
|
| 1154 |
2026-01-13 10:40:58,505 INFO [optim.py:365] Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.550e+02 1.948e+02 2.658e+02 4.998e+02, threshold=3.897e+02, percent-clipped=2.0
|
| 1155 |
+
2026-01-13 10:41:10,589 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=5.09 vs. limit=2.0
|
| 1156 |
+
2026-01-13 10:41:13,232 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=4.03 vs. limit=2.0
|
| 1157 |
+
2026-01-13 10:41:14,234 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=5.10 vs. limit=2.0
|
| 1158 |
+
2026-01-13 10:41:16,760 INFO [train.py:895] Epoch 1, batch 5050, loss[loss=1.217, simple_loss=0.7619, pruned_loss=0.8363, over 1138.00 frames. ], tot_loss[loss=1.179, simple_loss=0.7419, pruned_loss=0.8077, over 261392.37 frames. ], batch size: 3, lr: 4.19e-02, grad_scale: 8.0
|
| 1159 |
+
2026-01-13 10:41:23,867 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=192, metric=11.20 vs. limit=2.0
|
| 1160 |
+
2026-01-13 10:41:31,426 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=10.02 vs. limit=2.0
|
| 1161 |
+
2026-01-13 10:41:31,459 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=3.78 vs. limit=2.0
|
| 1162 |
+
2026-01-13 10:41:33,337 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=8.05 vs. limit=2.0
|
| 1163 |
+
2026-01-13 10:41:35,187 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=5.20 vs. limit=2.0
|
| 1164 |
+
2026-01-13 10:41:35,316 INFO [train.py:895] Epoch 1, batch 5100, loss[loss=1.113, simple_loss=0.7103, pruned_loss=0.7575, over 1121.00 frames. ], tot_loss[loss=1.187, simple_loss=0.7486, pruned_loss=0.8123, over 261489.15 frames. ], batch size: 3, lr: 4.18e-02, grad_scale: 8.0
|
| 1165 |
+
2026-01-13 10:41:36,059 INFO [optim.py:365] Clipping_scale=2.0, grad-norm quartiles 1.073e+02 1.510e+02 1.867e+02 2.320e+02 4.308e+02, threshold=3.734e+02, percent-clipped=2.0
|
| 1166 |
+
2026-01-13 10:41:36,670 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=4.60 vs. limit=2.0
|
| 1167 |
+
2026-01-13 10:41:41,070 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=192, metric=10.95 vs. limit=2.0
|
| 1168 |
+
2026-01-13 10:41:41,448 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=192, metric=4.70 vs. limit=2.0
|
| 1169 |
+
2026-01-13 10:41:50,847 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=5.33 vs. limit=2.0
|
| 1170 |
+
2026-01-13 10:41:51,239 INFO [scaling.py:681] Whitening: num_groups=1, num_channels=384, metric=123.93 vs. limit=5.0
|
| 1171 |
+
2026-01-13 10:41:53,518 INFO [train.py:895] Epoch 1, batch 5150, loss[loss=1.11, simple_loss=0.7024, pruned_loss=0.759, over 1357.00 frames. ], tot_loss[loss=1.19, simple_loss=0.7509, pruned_loss=0.8141, over 261747.26 frames. ], batch size: 5, lr: 4.17e-02, grad_scale: 8.0
|
| 1172 |
+
2026-01-13 10:41:56,145 INFO [zipformer.py:2441] attn_weights_entropy = tensor([5.9237, 5.8775, 5.8967, 5.9382, 5.9365, 5.9035, 5.8902, 5.9402],
|
| 1173 |
+
device='cuda:0'), covar=tensor([0.0004, 0.0005, 0.0006, 0.0004, 0.0002, 0.0004, 0.0004, 0.0007],
|
| 1174 |
+
device='cuda:0'), in_proj_covar=tensor([0.0027, 0.0025, 0.0028, 0.0025, 0.0026, 0.0027, 0.0028, 0.0030],
|
| 1175 |
+
device='cuda:0'), out_proj_covar=tensor([2.0564e-05, 1.9347e-05, 2.0468e-05, 1.9058e-05, 1.9749e-05, 1.8507e-05,
|
| 1176 |
+
1.8818e-05, 2.0944e-05], device='cuda:0')
|
| 1177 |
+
2026-01-13 10:41:57,297 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=192, metric=19.79 vs. limit=2.0
|
| 1178 |
+
2026-01-13 10:41:58,349 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=192, metric=13.13 vs. limit=2.0
|
| 1179 |
+
2026-01-13 10:42:08,394 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=3.47 vs. limit=2.0
|
| 1180 |
+
2026-01-13 10:42:12,024 INFO [train.py:895] Epoch 1, batch 5200, loss[loss=1.175, simple_loss=0.7546, pruned_loss=0.7974, over 1223.00 frames. ], tot_loss[loss=1.181, simple_loss=0.7473, pruned_loss=0.8077, over 262728.41 frames. ], batch size: 4, lr: 4.16e-02, grad_scale: 16.0
|
| 1181 |
+
2026-01-13 10:42:12,719 INFO [optim.py:365] Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.647e+02 2.032e+02 2.475e+02 5.041e+02, threshold=4.063e+02, percent-clipped=3.0
|
| 1182 |
+
2026-01-13 10:42:16,184 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=192, metric=10.65 vs. limit=2.0
|
| 1183 |
+
2026-01-13 10:42:18,696 INFO [zipformer.py:2441] attn_weights_entropy = tensor([4.3020, 4.3072, 4.4812, 4.4995, 4.5392, 4.1825, 3.6442, 4.4599],
|
| 1184 |
+
device='cuda:0'), covar=tensor([3.1556e-04, 4.4409e-04, 6.7422e-05, 2.5028e-04, 6.1217e-05, 1.7915e-03,
|
| 1185 |
+
1.4611e-03, 9.5898e-05], device='cuda:0'), in_proj_covar=tensor([0.0011, 0.0010, 0.0010, 0.0010, 0.0011, 0.0011, 0.0009, 0.0010],
|
| 1186 |
+
device='cuda:0'), out_proj_covar=tensor([6.5464e-06, 6.8163e-06, 5.8527e-06, 6.1450e-06, 6.2692e-06, 6.2807e-06,
|
| 1187 |
+
5.5706e-06, 6.3424e-06], device='cuda:0')
|
| 1188 |
+
2026-01-13 10:42:30,426 INFO [train.py:895] Epoch 1, batch 5250, loss[loss=1.194, simple_loss=0.7537, pruned_loss=0.8171, over 1408.00 frames. ], tot_loss[loss=1.181, simple_loss=0.7473, pruned_loss=0.8076, over 261861.99 frames. ], batch size: 4, lr: 4.15e-02, grad_scale: 16.0
|
| 1189 |
+
2026-01-13 10:42:33,144 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=192, metric=6.35 vs. limit=2.0
|
| 1190 |
+
2026-01-13 10:42:33,429 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=3.99 vs. limit=2.0
|
| 1191 |
+
2026-01-13 10:42:35,749 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=192, metric=10.96 vs. limit=2.0
|
| 1192 |
+
2026-01-13 10:42:43,525 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=192, metric=7.47 vs. limit=2.0
|
| 1193 |
+
2026-01-13 10:42:48,634 INFO [train.py:895] Epoch 1, batch 5300, loss[loss=1.3, simple_loss=0.8249, pruned_loss=0.8877, over 1252.00 frames. ], tot_loss[loss=1.181, simple_loss=0.7462, pruned_loss=0.8077, over 260419.74 frames. ], batch size: 5, lr: 4.14e-02, grad_scale: 16.0
|
| 1194 |
+
2026-01-13 10:42:49,359 INFO [optim.py:365] Clipping_scale=2.0, grad-norm quartiles 1.194e+02 1.666e+02 2.002e+02 2.441e+02 5.439e+02, threshold=4.004e+02, percent-clipped=2.0
|
| 1195 |
+
2026-01-13 10:42:51,422 INFO [scaling.py:681] Whitening: num_groups=1, num_channels=384, metric=76.90 vs. limit=5.0
|
| 1196 |
+
2026-01-13 10:42:56,233 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=192, metric=9.81 vs. limit=2.0
|
| 1197 |
+
2026-01-13 10:42:56,244 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=4.08 vs. limit=2.0
|
| 1198 |
+
2026-01-13 10:42:58,711 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=4.68 vs. limit=2.0
|
| 1199 |
+
2026-01-13 10:43:00,420 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=8.49 vs. limit=2.0
|
| 1200 |
+
2026-01-13 10:43:00,451 INFO [scaling.py:681] Whitening: num_groups=1, num_channels=384, metric=116.02 vs. limit=5.0
|
| 1201 |
+
2026-01-13 10:43:02,604 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=4.57 vs. limit=2.0
|
| 1202 |
+
2026-01-13 10:43:07,089 INFO [train.py:895] Epoch 1, batch 5350, loss[loss=1.109, simple_loss=0.692, pruned_loss=0.7634, over 960.00 frames. ], tot_loss[loss=1.18, simple_loss=0.7459, pruned_loss=0.8072, over 261111.35 frames. ], batch size: 2, lr: 4.13e-02, grad_scale: 16.0
|
| 1203 |
+
2026-01-13 10:43:18,711 INFO [zipformer.py:2441] attn_weights_entropy = tensor([3.9821, 3.4176, 3.7392, 4.1860, 4.1892, 3.3098, 2.8594, 3.3125],
|
| 1204 |
+
device='cuda:0'), covar=tensor([0.0008, 0.0008, 0.0008, 0.0003, 0.0005, 0.0008, 0.0012, 0.0010],
|
| 1205 |
+
device='cuda:0'), in_proj_covar=tensor([0.0029, 0.0028, 0.0028, 0.0027, 0.0028, 0.0025, 0.0027, 0.0027],
|
| 1206 |
+
device='cuda:0'), out_proj_covar=tensor([2.0674e-05, 1.9950e-05, 2.0654e-05, 1.8624e-05, 1.9121e-05, 1.7109e-05,
|
| 1207 |
+
1.9009e-05, 1.9238e-05], device='cuda:0')
|
| 1208 |
+
2026-01-13 10:43:20,063 INFO [scaling.py:681] Whitening: num_groups=1, num_channels=384, metric=36.53 vs. limit=5.0
|
| 1209 |
+
2026-01-13 10:43:25,856 INFO [train.py:895] Epoch 1, batch 5400, loss[loss=1.398, simple_loss=0.8767, pruned_loss=0.9593, over 1262.00 frames. ], tot_loss[loss=1.196, simple_loss=0.7544, pruned_loss=0.8186, over 261659.25 frames. ], batch size: 10, lr: 4.12e-02, grad_scale: 16.0
|
| 1210 |
+
2026-01-13 10:43:26,639 INFO [optim.py:365] Clipping_scale=2.0, grad-norm quartiles 1.272e+02 1.850e+02 2.331e+02 2.967e+02 6.736e+02, threshold=4.661e+02, percent-clipped=10.0
|
| 1211 |
+
2026-01-13 10:43:34,081 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=5.25 vs. limit=2.0
|
| 1212 |
+
2026-01-13 10:43:40,196 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=192, metric=12.08 vs. limit=2.0
|
| 1213 |
+
2026-01-13 10:43:45,521 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=3.92 vs. limit=2.0
|
| 1214 |
+
2026-01-13 10:43:46,862 INFO [scaling.py:681] Whitening: num_groups=8, num_channels=96, metric=4.27 vs. limit=2.0
|
tensorboard/events.out.tfevents.1768298578.8e64ffbd666a.24203.0
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:dc5abb682e99a718aa419d5d49c883c284686fa54df3c35c359ff76112775962
|
| 3 |
+
size 51221
|