Syncing latest checkpoint
Browse files
epoch-2.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3863dae24a68c1b08a60ab9da21ea9f0bd66b5dccb12e41a13fe0a01dfb6ad97
|
| 3 |
+
size 1141949587
|
log/log-train-2026-01-13-11-44-05-0
CHANGED
|
@@ -548,3 +548,73 @@
|
|
| 548 |
6.1886e-05, 8.6025e-05], device='cuda:0')
|
| 549 |
2026-01-13 12:16:06,116 INFO [train.py:895] (0/2) Epoch 2, batch 1600, loss[loss=0.4407, simple_loss=0.4219, pruned_loss=0.2298, over 2673.00 frames. ], tot_loss[loss=0.455, simple_loss=0.4335, pruned_loss=0.2383, over 546325.13 frames. ], batch size: 10, lr: 4.49e-02, grad_scale: 8.0
|
| 550 |
2026-01-13 12:16:06,117 INFO [train.py:920] (0/2) Computing validation loss
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 548 |
6.1886e-05, 8.6025e-05], device='cuda:0')
|
| 549 |
2026-01-13 12:16:06,116 INFO [train.py:895] (0/2) Epoch 2, batch 1600, loss[loss=0.4407, simple_loss=0.4219, pruned_loss=0.2298, over 2673.00 frames. ], tot_loss[loss=0.455, simple_loss=0.4335, pruned_loss=0.2383, over 546325.13 frames. ], batch size: 10, lr: 4.49e-02, grad_scale: 8.0
|
| 550 |
2026-01-13 12:16:06,117 INFO [train.py:920] (0/2) Computing validation loss
|
| 551 |
+
2026-01-13 12:16:49,558 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.1992, 1.4505, 1.2953, 1.2141, 1.5568, 1.7720, 1.4270, 0.4766],
|
| 552 |
+
device='cuda:0'), covar=tensor([0.5525, 0.5068, 0.5483, 0.7259, 0.3983, 0.2023, 0.2831, 0.8611],
|
| 553 |
+
device='cuda:0'), in_proj_covar=tensor([0.0051, 0.0048, 0.0039, 0.0056, 0.0038, 0.0029, 0.0032, 0.0053],
|
| 554 |
+
device='cuda:0'), out_proj_covar=tensor([4.5247e-05, 4.0525e-05, 3.2784e-05, 4.8518e-05, 3.2402e-05, 2.0377e-05,
|
| 555 |
+
2.5383e-05, 4.6920e-05], device='cuda:0')
|
| 556 |
+
2026-01-13 12:16:56,200 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.6384, 0.3722, 1.7899, 1.9367, 1.9887, 0.1859, 0.9336, 0.8558],
|
| 557 |
+
device='cuda:0'), covar=tensor([0.1352, 0.5904, 0.0864, 0.0880, 0.0763, 0.3290, 0.3137, 0.3269],
|
| 558 |
+
device='cuda:0'), in_proj_covar=tensor([0.0059, 0.0090, 0.0054, 0.0058, 0.0050, 0.0063, 0.0074, 0.0075],
|
| 559 |
+
device='cuda:0'), out_proj_covar=tensor([4.3494e-05, 8.7914e-05, 3.5095e-05, 3.8224e-05, 3.5978e-05, 5.3630e-05,
|
| 560 |
+
6.7384e-05, 6.3150e-05], device='cuda:0')
|
| 561 |
+
2026-01-13 12:17:27,375 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([2.2908, 1.9416, 2.2443, 1.8000, 3.0639, 1.6369, 2.8469, 2.2828],
|
| 562 |
+
device='cuda:0'), covar=tensor([0.6780, 0.4786, 0.4490, 0.4767, 0.1142, 1.1813, 0.1340, 0.3560],
|
| 563 |
+
device='cuda:0'), in_proj_covar=tensor([0.0112, 0.0078, 0.0074, 0.0071, 0.0054, 0.0119, 0.0058, 0.0067],
|
| 564 |
+
device='cuda:0'), out_proj_covar=tensor([1.1083e-04, 8.1938e-05, 7.7162e-05, 7.0450e-05, 4.8952e-05, 1.1398e-04,
|
| 565 |
+
6.0951e-05, 7.1292e-05], device='cuda:0')
|
| 566 |
+
2026-01-13 12:17:39,956 INFO [train.py:929] (0/2) Epoch 2, validation: loss=0.9869, simple_loss=0.879, pruned_loss=0.5474, over 1639044.00 frames.
|
| 567 |
+
2026-01-13 12:17:39,957 INFO [train.py:930] (0/2) Maximum memory allocated so far is 4333MB
|
| 568 |
+
2026-01-13 12:17:53,197 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=192, metric=2.39 vs. limit=2.0
|
| 569 |
+
2026-01-13 12:17:53,561 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([2.9106, 2.7323, 2.6331, 1.5799, 3.7219, 2.2751, 3.8299, 2.8924],
|
| 570 |
+
device='cuda:0'), covar=tensor([0.3811, 0.2678, 0.3280, 0.3162, 0.0452, 0.6758, 0.0591, 0.2206],
|
| 571 |
+
device='cuda:0'), in_proj_covar=tensor([0.0112, 0.0078, 0.0074, 0.0072, 0.0055, 0.0119, 0.0057, 0.0068],
|
| 572 |
+
device='cuda:0'), out_proj_covar=tensor([1.1164e-04, 8.1546e-05, 7.7837e-05, 7.1484e-05, 4.9784e-05, 1.1395e-04,
|
| 573 |
+
6.0680e-05, 7.1599e-05], device='cuda:0')
|
| 574 |
+
2026-01-13 12:17:57,710 INFO [zipformer.py:1188] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=3288.0, num_to_drop=1, layers_to_drop={1}
|
| 575 |
+
2026-01-13 12:18:00,294 INFO [zipformer.py:1188] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=3293.0, num_to_drop=0, layers_to_drop=set()
|
| 576 |
+
2026-01-13 12:18:04,793 INFO [optim.py:365] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 2.651e+02 3.519e+02 5.029e+02 1.907e+03, threshold=7.037e+02, percent-clipped=19.0
|
| 577 |
+
2026-01-13 12:18:06,340 INFO [train.py:895] (0/2) Epoch 2, batch 1650, loss[loss=0.6221, simple_loss=0.5369, pruned_loss=0.3536, over 2349.00 frames. ], tot_loss[loss=0.4808, simple_loss=0.4506, pruned_loss=0.2555, over 541340.05 frames. ], batch size: 26, lr: 4.48e-02, grad_scale: 8.0
|
| 578 |
+
2026-01-13 12:18:09,959 INFO [checkpoint.py:74] (0/2) Saving checkpoint to /kaggle/working/amharic_training/exp_amharic_streaming/epoch-2.pt
|
| 579 |
+
2026-01-13 12:18:26,974 INFO [train.py:895] (0/2) Epoch 3, batch 0, loss[loss=0.4633, simple_loss=0.4466, pruned_loss=0.24, over 2650.00 frames. ], tot_loss[loss=0.4633, simple_loss=0.4466, pruned_loss=0.24, over 2650.00 frames. ], batch size: 7, lr: 4.25e-02, grad_scale: 8.0
|
| 580 |
+
2026-01-13 12:18:26,975 INFO [train.py:920] (0/2) Computing validation loss
|
| 581 |
+
2026-01-13 12:18:48,046 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([2.0920, 1.9317, 0.9376, 2.0186, 1.1835, 0.8949, 2.2622, 1.6909],
|
| 582 |
+
device='cuda:0'), covar=tensor([0.0857, 0.1097, 0.2288, 0.1400, 0.2316, 0.3145, 0.0698, 0.1144],
|
| 583 |
+
device='cuda:0'), in_proj_covar=tensor([0.0038, 0.0040, 0.0044, 0.0046, 0.0048, 0.0056, 0.0034, 0.0036],
|
| 584 |
+
device='cuda:0'), out_proj_covar=tensor([3.3563e-05, 3.8454e-05, 4.3896e-05, 4.5658e-05, 4.6807e-05, 5.7549e-05,
|
| 585 |
+
2.7868e-05, 3.2040e-05], device='cuda:0')
|
| 586 |
+
2026-01-13 12:18:55,236 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.6960, 1.7882, 1.5275, 0.5671, 1.4164, 1.3172, 1.6279, 1.5227],
|
| 587 |
+
device='cuda:0'), covar=tensor([0.0137, 0.0106, 0.0145, 0.0709, 0.0291, 0.0272, 0.0199, 0.0162],
|
| 588 |
+
device='cuda:0'), in_proj_covar=tensor([0.0007, 0.0006, 0.0007, 0.0009, 0.0007, 0.0007, 0.0007, 0.0007],
|
| 589 |
+
device='cuda:0'), out_proj_covar=tensor([4.4370e-06, 4.5701e-06, 4.9872e-06, 8.2076e-06, 4.8747e-06, 6.0600e-06,
|
| 590 |
+
4.9387e-06, 4.6421e-06], device='cuda:0')
|
| 591 |
+
2026-01-13 12:19:10,117 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.2481, 1.9855, 1.8480, 1.8893, 1.9768, 1.5113, 2.1403, 1.8290],
|
| 592 |
+
device='cuda:0'), covar=tensor([0.5527, 0.1198, 0.4753, 0.2514, 0.0872, 0.4651, 0.1911, 0.2715],
|
| 593 |
+
device='cuda:0'), in_proj_covar=tensor([0.0089, 0.0064, 0.0112, 0.0081, 0.0059, 0.0099, 0.0074, 0.0090],
|
| 594 |
+
device='cuda:0'), out_proj_covar=tensor([8.4958e-05, 5.2896e-05, 1.1296e-04, 6.9797e-05, 4.6790e-05, 9.5343e-05,
|
| 595 |
+
6.6042e-05, 8.8251e-05], device='cuda:0')
|
| 596 |
+
2026-01-13 12:19:12,913 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([2.4864, 1.6791, 2.7130, 2.0300, 3.1274, 1.6252, 3.2107, 2.6742],
|
| 597 |
+
device='cuda:0'), covar=tensor([0.5708, 0.4357, 0.3065, 0.3433, 0.0959, 1.0186, 0.1032, 0.2606],
|
| 598 |
+
device='cuda:0'), in_proj_covar=tensor([0.0109, 0.0077, 0.0071, 0.0072, 0.0055, 0.0118, 0.0057, 0.0065],
|
| 599 |
+
device='cuda:0'), out_proj_covar=tensor([1.0978e-04, 8.0342e-05, 7.5508e-05, 7.1236e-05, 5.0657e-05, 1.1358e-04,
|
| 600 |
+
6.0147e-05, 6.9622e-05], device='cuda:0')
|
| 601 |
+
2026-01-13 12:19:22,467 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.8906, 1.5682, 0.7498, 0.8287, 1.2622, 1.3813, 1.0183, 1.5815],
|
| 602 |
+
device='cuda:0'), covar=tensor([0.1028, 0.1419, 0.3800, 0.3535, 0.3906, 0.1885, 0.2930, 0.2083],
|
| 603 |
+
device='cuda:0'), in_proj_covar=tensor([0.0029, 0.0027, 0.0041, 0.0041, 0.0040, 0.0030, 0.0039, 0.0036],
|
| 604 |
+
device='cuda:0'), out_proj_covar=tensor([1.7068e-05, 1.7129e-05, 3.4890e-05, 3.4915e-05, 3.1967e-05, 2.1068e-05,
|
| 605 |
+
3.0575e-05, 2.7976e-05], device='cuda:0')
|
| 606 |
+
2026-01-13 12:19:39,947 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.7264, 0.3373, 1.5665, 1.7616, 1.9456, 0.4955, 1.0538, 0.5351],
|
| 607 |
+
device='cuda:0'), covar=tensor([0.1301, 0.7124, 0.1200, 0.0916, 0.0772, 0.4003, 0.3257, 0.4494],
|
| 608 |
+
device='cuda:0'), in_proj_covar=tensor([0.0060, 0.0093, 0.0056, 0.0057, 0.0051, 0.0066, 0.0073, 0.0077],
|
| 609 |
+
device='cuda:0'), out_proj_covar=tensor([4.4511e-05, 8.9485e-05, 3.7603e-05, 3.8290e-05, 3.7331e-05, 5.6253e-05,
|
| 610 |
+
6.6072e-05, 6.4817e-05], device='cuda:0')
|
| 611 |
+
2026-01-13 12:19:59,139 INFO [train.py:929] (0/2) Epoch 3, validation: loss=0.9187, simple_loss=0.8283, pruned_loss=0.5045, over 1639044.00 frames.
|
| 612 |
+
2026-01-13 12:19:59,140 INFO [train.py:930] (0/2) Maximum memory allocated so far is 4333MB
|
| 613 |
+
2026-01-13 12:20:14,611 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=96, metric=2.00 vs. limit=2.0
|
| 614 |
+
2026-01-13 12:20:16,442 INFO [scaling.py:681] (0/2) Whitening: num_groups=1, num_channels=384, metric=5.07 vs. limit=5.0
|
| 615 |
+
2026-01-13 12:20:17,388 INFO [zipformer.py:1188] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=3341.0, num_to_drop=0, layers_to_drop=set()
|
| 616 |
+
2026-01-13 12:20:22,435 INFO [zipformer.py:1188] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=3349.0, num_to_drop=2, layers_to_drop={2, 3}
|
| 617 |
+
2026-01-13 12:20:28,363 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=192, metric=2.32 vs. limit=2.0
|
| 618 |
+
2026-01-13 12:20:28,555 INFO [train.py:895] (0/2) Epoch 3, batch 50, loss[loss=0.375, simple_loss=0.398, pruned_loss=0.176, over 2766.00 frames. ], tot_loss[loss=0.4646, simple_loss=0.4427, pruned_loss=0.2433, over 123227.63 frames. ], batch size: 7, lr: 4.24e-02, grad_scale: 8.0
|
| 619 |
+
2026-01-13 12:20:30,181 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=96, metric=2.26 vs. limit=2.0
|
| 620 |
+
2026-01-13 12:20:31,372 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=192, metric=2.01 vs. limit=2.0
|
log/log-train-2026-01-13-11-44-05-1
CHANGED
|
@@ -581,3 +581,59 @@
|
|
| 581 |
2026-01-13 12:15:38,449 INFO [train.py:895] (1/2) Epoch 2, batch 1550, loss[loss=0.3899, simple_loss=0.4005, pruned_loss=0.1896, over 2889.00 frames. ], tot_loss[loss=0.4516, simple_loss=0.4314, pruned_loss=0.2359, over 547620.80 frames. ], batch size: 10, lr: 4.50e-02, grad_scale: 8.0
|
| 582 |
2026-01-13 12:16:06,126 INFO [train.py:895] (1/2) Epoch 2, batch 1600, loss[loss=0.4453, simple_loss=0.4435, pruned_loss=0.2235, over 2681.00 frames. ], tot_loss[loss=0.4555, simple_loss=0.4339, pruned_loss=0.2386, over 545000.44 frames. ], batch size: 10, lr: 4.49e-02, grad_scale: 8.0
|
| 583 |
2026-01-13 12:16:06,127 INFO [train.py:920] (1/2) Computing validation loss
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 581 |
2026-01-13 12:15:38,449 INFO [train.py:895] (1/2) Epoch 2, batch 1550, loss[loss=0.3899, simple_loss=0.4005, pruned_loss=0.1896, over 2889.00 frames. ], tot_loss[loss=0.4516, simple_loss=0.4314, pruned_loss=0.2359, over 547620.80 frames. ], batch size: 10, lr: 4.50e-02, grad_scale: 8.0
|
| 582 |
2026-01-13 12:16:06,126 INFO [train.py:895] (1/2) Epoch 2, batch 1600, loss[loss=0.4453, simple_loss=0.4435, pruned_loss=0.2235, over 2681.00 frames. ], tot_loss[loss=0.4555, simple_loss=0.4339, pruned_loss=0.2386, over 545000.44 frames. ], batch size: 10, lr: 4.49e-02, grad_scale: 8.0
|
| 583 |
2026-01-13 12:16:06,127 INFO [train.py:920] (1/2) Computing validation loss
|
| 584 |
+
2026-01-13 12:16:31,846 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([0.9146, 1.0261, 1.5539, 1.1242, 1.5726, 0.9312, 0.8308, 0.5649],
|
| 585 |
+
device='cuda:1'), covar=tensor([0.0074, 0.0085, 0.0033, 0.0049, 0.0026, 0.0077, 0.0057, 0.0076],
|
| 586 |
+
device='cuda:1'), in_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002],
|
| 587 |
+
device='cuda:1'), out_proj_covar=tensor([2.0987e-06, 1.9267e-06, 1.8224e-06, 1.8397e-06, 1.4468e-06, 2.5334e-06,
|
| 588 |
+
1.7914e-06, 2.0480e-06], device='cuda:1')
|
| 589 |
+
2026-01-13 12:17:12,889 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([1.1592, 1.8355, 1.6518, 2.3459, 1.3080, 0.7570, 0.8814, 2.6382],
|
| 590 |
+
device='cuda:1'), covar=tensor([0.2002, 0.0977, 0.1122, 0.0610, 0.1817, 0.1639, 0.2137, 0.0418],
|
| 591 |
+
device='cuda:1'), in_proj_covar=tensor([0.0037, 0.0028, 0.0029, 0.0027, 0.0036, 0.0033, 0.0034, 0.0027],
|
| 592 |
+
device='cuda:1'), out_proj_covar=tensor([3.5683e-05, 1.9351e-05, 2.5173e-05, 1.8383e-05, 3.0958e-05, 2.9819e-05,
|
| 593 |
+
3.5024e-05, 1.7139e-05], device='cuda:1')
|
| 594 |
+
2026-01-13 12:17:33,431 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([1.1622, 0.1187, 1.0880, 0.5059, 1.1932, 0.9509, 1.0623, 1.2777],
|
| 595 |
+
device='cuda:1'), covar=tensor([0.0028, 0.0071, 0.0035, 0.0051, 0.0025, 0.0034, 0.0032, 0.0022],
|
| 596 |
+
device='cuda:1'), in_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002],
|
| 597 |
+
device='cuda:1'), out_proj_covar=tensor([1.2177e-06, 1.7327e-06, 1.3247e-06, 1.2895e-06, 1.4067e-06, 1.2497e-06,
|
| 598 |
+
1.3766e-06, 1.3145e-06], device='cuda:1')
|
| 599 |
+
2026-01-13 12:17:39,957 INFO [train.py:929] (1/2) Epoch 2, validation: loss=0.9869, simple_loss=0.879, pruned_loss=0.5474, over 1639044.00 frames.
|
| 600 |
+
2026-01-13 12:17:39,957 INFO [train.py:930] (1/2) Maximum memory allocated so far is 4462MB
|
| 601 |
+
2026-01-13 12:17:47,633 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([2.8254, 1.9520, 1.5732, 3.8457, 2.6316, 4.0628, 3.9863, 2.3172],
|
| 602 |
+
device='cuda:1'), covar=tensor([0.1003, 0.4758, 0.5018, 0.0807, 0.2587, 0.0658, 0.0624, 0.4646],
|
| 603 |
+
device='cuda:1'), in_proj_covar=tensor([0.0050, 0.0083, 0.0092, 0.0069, 0.0075, 0.0067, 0.0061, 0.0087],
|
| 604 |
+
device='cuda:1'), out_proj_covar=tensor([5.0716e-05, 8.1120e-05, 9.1148e-05, 6.0760e-05, 7.9381e-05, 5.9552e-05,
|
| 605 |
+
5.4842e-05, 8.5480e-05], device='cuda:1')
|
| 606 |
+
2026-01-13 12:17:53,571 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([2.9208, 2.0699, 1.9968, 1.7628, 2.5171, 1.6452, 1.7896, 2.8162],
|
| 607 |
+
device='cuda:1'), covar=tensor([0.0098, 0.0444, 0.0329, 0.0616, 0.0179, 0.0561, 0.0469, 0.0194],
|
| 608 |
+
device='cuda:1'), in_proj_covar=tensor([0.0014, 0.0020, 0.0019, 0.0022, 0.0016, 0.0023, 0.0020, 0.0014],
|
| 609 |
+
device='cuda:1'), out_proj_covar=tensor([1.3304e-05, 1.9381e-05, 1.7908e-05, 2.3097e-05, 1.3664e-05, 2.4161e-05,
|
| 610 |
+
1.8719e-05, 1.3050e-05], device='cuda:1')
|
| 611 |
+
2026-01-13 12:17:57,712 INFO [zipformer.py:1188] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=3288.0, num_to_drop=1, layers_to_drop={1}
|
| 612 |
+
2026-01-13 12:18:00,300 INFO [zipformer.py:1188] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=3293.0, num_to_drop=0, layers_to_drop=set()
|
| 613 |
+
2026-01-13 12:18:01,711 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=192, metric=2.39 vs. limit=2.0
|
| 614 |
+
2026-01-13 12:18:04,792 INFO [optim.py:365] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 2.651e+02 3.519e+02 5.029e+02 1.907e+03, threshold=7.037e+02, percent-clipped=19.0
|
| 615 |
+
2026-01-13 12:18:06,340 INFO [train.py:895] (1/2) Epoch 2, batch 1650, loss[loss=0.6584, simple_loss=0.5576, pruned_loss=0.3796, over 2343.00 frames. ], tot_loss[loss=0.4792, simple_loss=0.4506, pruned_loss=0.2539, over 541005.14 frames. ], batch size: 26, lr: 4.48e-02, grad_scale: 8.0
|
| 616 |
+
2026-01-13 12:18:26,974 INFO [train.py:895] (1/2) Epoch 3, batch 0, loss[loss=0.3878, simple_loss=0.3925, pruned_loss=0.1916, over 2652.00 frames. ], tot_loss[loss=0.3878, simple_loss=0.3925, pruned_loss=0.1916, over 2652.00 frames. ], batch size: 7, lr: 4.25e-02, grad_scale: 8.0
|
| 617 |
+
2026-01-13 12:18:26,975 INFO [train.py:920] (1/2) Computing validation loss
|
| 618 |
+
2026-01-13 12:18:56,591 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([0.4170, 0.7865, 1.1271, 1.1866, 0.7548, 1.0538, 0.7846, 0.3964],
|
| 619 |
+
device='cuda:1'), covar=tensor([1.3924, 0.3929, 0.3793, 0.3908, 0.4075, 0.4388, 1.3222, 1.2108],
|
| 620 |
+
device='cuda:1'), in_proj_covar=tensor([0.0044, 0.0027, 0.0027, 0.0029, 0.0026, 0.0028, 0.0037, 0.0049],
|
| 621 |
+
device='cuda:1'), out_proj_covar=tensor([3.4328e-05, 1.3727e-05, 1.4365e-05, 1.7017e-05, 1.2756e-05, 1.5116e-05,
|
| 622 |
+
2.8031e-05, 4.3420e-05], device='cuda:1')
|
| 623 |
+
2026-01-13 12:19:30,330 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([1.7149, 0.3511, 1.5486, 1.7802, 1.9875, 0.4806, 1.0176, 0.5029],
|
| 624 |
+
device='cuda:1'), covar=tensor([0.1266, 0.6684, 0.1064, 0.0905, 0.0703, 0.3639, 0.3178, 0.4251],
|
| 625 |
+
device='cuda:1'), in_proj_covar=tensor([0.0060, 0.0093, 0.0056, 0.0057, 0.0051, 0.0066, 0.0073, 0.0077],
|
| 626 |
+
device='cuda:1'), out_proj_covar=tensor([4.4511e-05, 8.9485e-05, 3.7603e-05, 3.8290e-05, 3.7331e-05, 5.6253e-05,
|
| 627 |
+
6.6072e-05, 6.4817e-05], device='cuda:1')
|
| 628 |
+
2026-01-13 12:19:42,239 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([2.2313, 1.7646, 1.6322, 1.1614, 1.7257, 1.0846, 1.1184, 2.0305],
|
| 629 |
+
device='cuda:1'), covar=tensor([0.0227, 0.0633, 0.0571, 0.1077, 0.0348, 0.1010, 0.0794, 0.0293],
|
| 630 |
+
device='cuda:1'), in_proj_covar=tensor([0.0015, 0.0021, 0.0020, 0.0024, 0.0016, 0.0024, 0.0021, 0.0015],
|
| 631 |
+
device='cuda:1'), out_proj_covar=tensor([1.3409e-05, 2.0347e-05, 1.8681e-05, 2.4442e-05, 1.3800e-05, 2.5326e-05,
|
| 632 |
+
1.9742e-05, 1.3644e-05], device='cuda:1')
|
| 633 |
+
2026-01-13 12:19:59,139 INFO [train.py:929] (1/2) Epoch 3, validation: loss=0.9187, simple_loss=0.8283, pruned_loss=0.5045, over 1639044.00 frames.
|
| 634 |
+
2026-01-13 12:19:59,140 INFO [train.py:930] (1/2) Maximum memory allocated so far is 4462MB
|
| 635 |
+
2026-01-13 12:20:01,111 INFO [scaling.py:681] (1/2) Whitening: num_groups=1, num_channels=384, metric=5.65 vs. limit=5.0
|
| 636 |
+
2026-01-13 12:20:12,821 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=192, metric=2.05 vs. limit=2.0
|
| 637 |
+
2026-01-13 12:20:17,388 INFO [zipformer.py:1188] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=3341.0, num_to_drop=0, layers_to_drop=set()
|
| 638 |
+
2026-01-13 12:20:22,397 INFO [zipformer.py:1188] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=3349.0, num_to_drop=2, layers_to_drop={0, 3}
|
| 639 |
+
2026-01-13 12:20:28,553 INFO [train.py:895] (1/2) Epoch 3, batch 50, loss[loss=0.3994, simple_loss=0.4089, pruned_loss=0.1949, over 2774.00 frames. ], tot_loss[loss=0.4582, simple_loss=0.4392, pruned_loss=0.2386, over 122660.88 frames. ], batch size: 7, lr: 4.24e-02, grad_scale: 8.0
|
tensorboard/events.out.tfevents.1768304645.8e64ffbd666a.97184.0
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2ae7c3a8fc15f53961d25383af2d3ed6588a038d2fdde10ce638137ec0fbc47f
|
| 3 |
+
size 33602
|