projecti7 commited on
Commit
3513247
·
verified ·
1 Parent(s): 24b5f18

Syncing latest checkpoint

Browse files
epoch-2.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3863dae24a68c1b08a60ab9da21ea9f0bd66b5dccb12e41a13fe0a01dfb6ad97
3
+ size 1141949587
log/log-train-2026-01-13-11-44-05-0 CHANGED
@@ -548,3 +548,73 @@
548
  6.1886e-05, 8.6025e-05], device='cuda:0')
549
  2026-01-13 12:16:06,116 INFO [train.py:895] (0/2) Epoch 2, batch 1600, loss[loss=0.4407, simple_loss=0.4219, pruned_loss=0.2298, over 2673.00 frames. ], tot_loss[loss=0.455, simple_loss=0.4335, pruned_loss=0.2383, over 546325.13 frames. ], batch size: 10, lr: 4.49e-02, grad_scale: 8.0
550
  2026-01-13 12:16:06,117 INFO [train.py:920] (0/2) Computing validation loss
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
548
  6.1886e-05, 8.6025e-05], device='cuda:0')
549
  2026-01-13 12:16:06,116 INFO [train.py:895] (0/2) Epoch 2, batch 1600, loss[loss=0.4407, simple_loss=0.4219, pruned_loss=0.2298, over 2673.00 frames. ], tot_loss[loss=0.455, simple_loss=0.4335, pruned_loss=0.2383, over 546325.13 frames. ], batch size: 10, lr: 4.49e-02, grad_scale: 8.0
550
  2026-01-13 12:16:06,117 INFO [train.py:920] (0/2) Computing validation loss
551
+ 2026-01-13 12:16:49,558 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.1992, 1.4505, 1.2953, 1.2141, 1.5568, 1.7720, 1.4270, 0.4766],
552
+ device='cuda:0'), covar=tensor([0.5525, 0.5068, 0.5483, 0.7259, 0.3983, 0.2023, 0.2831, 0.8611],
553
+ device='cuda:0'), in_proj_covar=tensor([0.0051, 0.0048, 0.0039, 0.0056, 0.0038, 0.0029, 0.0032, 0.0053],
554
+ device='cuda:0'), out_proj_covar=tensor([4.5247e-05, 4.0525e-05, 3.2784e-05, 4.8518e-05, 3.2402e-05, 2.0377e-05,
555
+ 2.5383e-05, 4.6920e-05], device='cuda:0')
556
+ 2026-01-13 12:16:56,200 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.6384, 0.3722, 1.7899, 1.9367, 1.9887, 0.1859, 0.9336, 0.8558],
557
+ device='cuda:0'), covar=tensor([0.1352, 0.5904, 0.0864, 0.0880, 0.0763, 0.3290, 0.3137, 0.3269],
558
+ device='cuda:0'), in_proj_covar=tensor([0.0059, 0.0090, 0.0054, 0.0058, 0.0050, 0.0063, 0.0074, 0.0075],
559
+ device='cuda:0'), out_proj_covar=tensor([4.3494e-05, 8.7914e-05, 3.5095e-05, 3.8224e-05, 3.5978e-05, 5.3630e-05,
560
+ 6.7384e-05, 6.3150e-05], device='cuda:0')
561
+ 2026-01-13 12:17:27,375 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([2.2908, 1.9416, 2.2443, 1.8000, 3.0639, 1.6369, 2.8469, 2.2828],
562
+ device='cuda:0'), covar=tensor([0.6780, 0.4786, 0.4490, 0.4767, 0.1142, 1.1813, 0.1340, 0.3560],
563
+ device='cuda:0'), in_proj_covar=tensor([0.0112, 0.0078, 0.0074, 0.0071, 0.0054, 0.0119, 0.0058, 0.0067],
564
+ device='cuda:0'), out_proj_covar=tensor([1.1083e-04, 8.1938e-05, 7.7162e-05, 7.0450e-05, 4.8952e-05, 1.1398e-04,
565
+ 6.0951e-05, 7.1292e-05], device='cuda:0')
566
+ 2026-01-13 12:17:39,956 INFO [train.py:929] (0/2) Epoch 2, validation: loss=0.9869, simple_loss=0.879, pruned_loss=0.5474, over 1639044.00 frames.
567
+ 2026-01-13 12:17:39,957 INFO [train.py:930] (0/2) Maximum memory allocated so far is 4333MB
568
+ 2026-01-13 12:17:53,197 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=192, metric=2.39 vs. limit=2.0
569
+ 2026-01-13 12:17:53,561 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([2.9106, 2.7323, 2.6331, 1.5799, 3.7219, 2.2751, 3.8299, 2.8924],
570
+ device='cuda:0'), covar=tensor([0.3811, 0.2678, 0.3280, 0.3162, 0.0452, 0.6758, 0.0591, 0.2206],
571
+ device='cuda:0'), in_proj_covar=tensor([0.0112, 0.0078, 0.0074, 0.0072, 0.0055, 0.0119, 0.0057, 0.0068],
572
+ device='cuda:0'), out_proj_covar=tensor([1.1164e-04, 8.1546e-05, 7.7837e-05, 7.1484e-05, 4.9784e-05, 1.1395e-04,
573
+ 6.0680e-05, 7.1599e-05], device='cuda:0')
574
+ 2026-01-13 12:17:57,710 INFO [zipformer.py:1188] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=3288.0, num_to_drop=1, layers_to_drop={1}
575
+ 2026-01-13 12:18:00,294 INFO [zipformer.py:1188] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=3293.0, num_to_drop=0, layers_to_drop=set()
576
+ 2026-01-13 12:18:04,793 INFO [optim.py:365] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 2.651e+02 3.519e+02 5.029e+02 1.907e+03, threshold=7.037e+02, percent-clipped=19.0
577
+ 2026-01-13 12:18:06,340 INFO [train.py:895] (0/2) Epoch 2, batch 1650, loss[loss=0.6221, simple_loss=0.5369, pruned_loss=0.3536, over 2349.00 frames. ], tot_loss[loss=0.4808, simple_loss=0.4506, pruned_loss=0.2555, over 541340.05 frames. ], batch size: 26, lr: 4.48e-02, grad_scale: 8.0
578
+ 2026-01-13 12:18:09,959 INFO [checkpoint.py:74] (0/2) Saving checkpoint to /kaggle/working/amharic_training/exp_amharic_streaming/epoch-2.pt
579
+ 2026-01-13 12:18:26,974 INFO [train.py:895] (0/2) Epoch 3, batch 0, loss[loss=0.4633, simple_loss=0.4466, pruned_loss=0.24, over 2650.00 frames. ], tot_loss[loss=0.4633, simple_loss=0.4466, pruned_loss=0.24, over 2650.00 frames. ], batch size: 7, lr: 4.25e-02, grad_scale: 8.0
580
+ 2026-01-13 12:18:26,975 INFO [train.py:920] (0/2) Computing validation loss
581
+ 2026-01-13 12:18:48,046 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([2.0920, 1.9317, 0.9376, 2.0186, 1.1835, 0.8949, 2.2622, 1.6909],
582
+ device='cuda:0'), covar=tensor([0.0857, 0.1097, 0.2288, 0.1400, 0.2316, 0.3145, 0.0698, 0.1144],
583
+ device='cuda:0'), in_proj_covar=tensor([0.0038, 0.0040, 0.0044, 0.0046, 0.0048, 0.0056, 0.0034, 0.0036],
584
+ device='cuda:0'), out_proj_covar=tensor([3.3563e-05, 3.8454e-05, 4.3896e-05, 4.5658e-05, 4.6807e-05, 5.7549e-05,
585
+ 2.7868e-05, 3.2040e-05], device='cuda:0')
586
+ 2026-01-13 12:18:55,236 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.6960, 1.7882, 1.5275, 0.5671, 1.4164, 1.3172, 1.6279, 1.5227],
587
+ device='cuda:0'), covar=tensor([0.0137, 0.0106, 0.0145, 0.0709, 0.0291, 0.0272, 0.0199, 0.0162],
588
+ device='cuda:0'), in_proj_covar=tensor([0.0007, 0.0006, 0.0007, 0.0009, 0.0007, 0.0007, 0.0007, 0.0007],
589
+ device='cuda:0'), out_proj_covar=tensor([4.4370e-06, 4.5701e-06, 4.9872e-06, 8.2076e-06, 4.8747e-06, 6.0600e-06,
590
+ 4.9387e-06, 4.6421e-06], device='cuda:0')
591
+ 2026-01-13 12:19:10,117 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.2481, 1.9855, 1.8480, 1.8893, 1.9768, 1.5113, 2.1403, 1.8290],
592
+ device='cuda:0'), covar=tensor([0.5527, 0.1198, 0.4753, 0.2514, 0.0872, 0.4651, 0.1911, 0.2715],
593
+ device='cuda:0'), in_proj_covar=tensor([0.0089, 0.0064, 0.0112, 0.0081, 0.0059, 0.0099, 0.0074, 0.0090],
594
+ device='cuda:0'), out_proj_covar=tensor([8.4958e-05, 5.2896e-05, 1.1296e-04, 6.9797e-05, 4.6790e-05, 9.5343e-05,
595
+ 6.6042e-05, 8.8251e-05], device='cuda:0')
596
+ 2026-01-13 12:19:12,913 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([2.4864, 1.6791, 2.7130, 2.0300, 3.1274, 1.6252, 3.2107, 2.6742],
597
+ device='cuda:0'), covar=tensor([0.5708, 0.4357, 0.3065, 0.3433, 0.0959, 1.0186, 0.1032, 0.2606],
598
+ device='cuda:0'), in_proj_covar=tensor([0.0109, 0.0077, 0.0071, 0.0072, 0.0055, 0.0118, 0.0057, 0.0065],
599
+ device='cuda:0'), out_proj_covar=tensor([1.0978e-04, 8.0342e-05, 7.5508e-05, 7.1236e-05, 5.0657e-05, 1.1358e-04,
600
+ 6.0147e-05, 6.9622e-05], device='cuda:0')
601
+ 2026-01-13 12:19:22,467 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.8906, 1.5682, 0.7498, 0.8287, 1.2622, 1.3813, 1.0183, 1.5815],
602
+ device='cuda:0'), covar=tensor([0.1028, 0.1419, 0.3800, 0.3535, 0.3906, 0.1885, 0.2930, 0.2083],
603
+ device='cuda:0'), in_proj_covar=tensor([0.0029, 0.0027, 0.0041, 0.0041, 0.0040, 0.0030, 0.0039, 0.0036],
604
+ device='cuda:0'), out_proj_covar=tensor([1.7068e-05, 1.7129e-05, 3.4890e-05, 3.4915e-05, 3.1967e-05, 2.1068e-05,
605
+ 3.0575e-05, 2.7976e-05], device='cuda:0')
606
+ 2026-01-13 12:19:39,947 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.7264, 0.3373, 1.5665, 1.7616, 1.9456, 0.4955, 1.0538, 0.5351],
607
+ device='cuda:0'), covar=tensor([0.1301, 0.7124, 0.1200, 0.0916, 0.0772, 0.4003, 0.3257, 0.4494],
608
+ device='cuda:0'), in_proj_covar=tensor([0.0060, 0.0093, 0.0056, 0.0057, 0.0051, 0.0066, 0.0073, 0.0077],
609
+ device='cuda:0'), out_proj_covar=tensor([4.4511e-05, 8.9485e-05, 3.7603e-05, 3.8290e-05, 3.7331e-05, 5.6253e-05,
610
+ 6.6072e-05, 6.4817e-05], device='cuda:0')
611
+ 2026-01-13 12:19:59,139 INFO [train.py:929] (0/2) Epoch 3, validation: loss=0.9187, simple_loss=0.8283, pruned_loss=0.5045, over 1639044.00 frames.
612
+ 2026-01-13 12:19:59,140 INFO [train.py:930] (0/2) Maximum memory allocated so far is 4333MB
613
+ 2026-01-13 12:20:14,611 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=96, metric=2.00 vs. limit=2.0
614
+ 2026-01-13 12:20:16,442 INFO [scaling.py:681] (0/2) Whitening: num_groups=1, num_channels=384, metric=5.07 vs. limit=5.0
615
+ 2026-01-13 12:20:17,388 INFO [zipformer.py:1188] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=3341.0, num_to_drop=0, layers_to_drop=set()
616
+ 2026-01-13 12:20:22,435 INFO [zipformer.py:1188] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=3349.0, num_to_drop=2, layers_to_drop={2, 3}
617
+ 2026-01-13 12:20:28,363 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=192, metric=2.32 vs. limit=2.0
618
+ 2026-01-13 12:20:28,555 INFO [train.py:895] (0/2) Epoch 3, batch 50, loss[loss=0.375, simple_loss=0.398, pruned_loss=0.176, over 2766.00 frames. ], tot_loss[loss=0.4646, simple_loss=0.4427, pruned_loss=0.2433, over 123227.63 frames. ], batch size: 7, lr: 4.24e-02, grad_scale: 8.0
619
+ 2026-01-13 12:20:30,181 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=96, metric=2.26 vs. limit=2.0
620
+ 2026-01-13 12:20:31,372 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=192, metric=2.01 vs. limit=2.0
log/log-train-2026-01-13-11-44-05-1 CHANGED
@@ -581,3 +581,59 @@
581
  2026-01-13 12:15:38,449 INFO [train.py:895] (1/2) Epoch 2, batch 1550, loss[loss=0.3899, simple_loss=0.4005, pruned_loss=0.1896, over 2889.00 frames. ], tot_loss[loss=0.4516, simple_loss=0.4314, pruned_loss=0.2359, over 547620.80 frames. ], batch size: 10, lr: 4.50e-02, grad_scale: 8.0
582
  2026-01-13 12:16:06,126 INFO [train.py:895] (1/2) Epoch 2, batch 1600, loss[loss=0.4453, simple_loss=0.4435, pruned_loss=0.2235, over 2681.00 frames. ], tot_loss[loss=0.4555, simple_loss=0.4339, pruned_loss=0.2386, over 545000.44 frames. ], batch size: 10, lr: 4.49e-02, grad_scale: 8.0
583
  2026-01-13 12:16:06,127 INFO [train.py:920] (1/2) Computing validation loss
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
581
  2026-01-13 12:15:38,449 INFO [train.py:895] (1/2) Epoch 2, batch 1550, loss[loss=0.3899, simple_loss=0.4005, pruned_loss=0.1896, over 2889.00 frames. ], tot_loss[loss=0.4516, simple_loss=0.4314, pruned_loss=0.2359, over 547620.80 frames. ], batch size: 10, lr: 4.50e-02, grad_scale: 8.0
582
  2026-01-13 12:16:06,126 INFO [train.py:895] (1/2) Epoch 2, batch 1600, loss[loss=0.4453, simple_loss=0.4435, pruned_loss=0.2235, over 2681.00 frames. ], tot_loss[loss=0.4555, simple_loss=0.4339, pruned_loss=0.2386, over 545000.44 frames. ], batch size: 10, lr: 4.49e-02, grad_scale: 8.0
583
  2026-01-13 12:16:06,127 INFO [train.py:920] (1/2) Computing validation loss
584
+ 2026-01-13 12:16:31,846 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([0.9146, 1.0261, 1.5539, 1.1242, 1.5726, 0.9312, 0.8308, 0.5649],
585
+ device='cuda:1'), covar=tensor([0.0074, 0.0085, 0.0033, 0.0049, 0.0026, 0.0077, 0.0057, 0.0076],
586
+ device='cuda:1'), in_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002],
587
+ device='cuda:1'), out_proj_covar=tensor([2.0987e-06, 1.9267e-06, 1.8224e-06, 1.8397e-06, 1.4468e-06, 2.5334e-06,
588
+ 1.7914e-06, 2.0480e-06], device='cuda:1')
589
+ 2026-01-13 12:17:12,889 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([1.1592, 1.8355, 1.6518, 2.3459, 1.3080, 0.7570, 0.8814, 2.6382],
590
+ device='cuda:1'), covar=tensor([0.2002, 0.0977, 0.1122, 0.0610, 0.1817, 0.1639, 0.2137, 0.0418],
591
+ device='cuda:1'), in_proj_covar=tensor([0.0037, 0.0028, 0.0029, 0.0027, 0.0036, 0.0033, 0.0034, 0.0027],
592
+ device='cuda:1'), out_proj_covar=tensor([3.5683e-05, 1.9351e-05, 2.5173e-05, 1.8383e-05, 3.0958e-05, 2.9819e-05,
593
+ 3.5024e-05, 1.7139e-05], device='cuda:1')
594
+ 2026-01-13 12:17:33,431 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([1.1622, 0.1187, 1.0880, 0.5059, 1.1932, 0.9509, 1.0623, 1.2777],
595
+ device='cuda:1'), covar=tensor([0.0028, 0.0071, 0.0035, 0.0051, 0.0025, 0.0034, 0.0032, 0.0022],
596
+ device='cuda:1'), in_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002],
597
+ device='cuda:1'), out_proj_covar=tensor([1.2177e-06, 1.7327e-06, 1.3247e-06, 1.2895e-06, 1.4067e-06, 1.2497e-06,
598
+ 1.3766e-06, 1.3145e-06], device='cuda:1')
599
+ 2026-01-13 12:17:39,957 INFO [train.py:929] (1/2) Epoch 2, validation: loss=0.9869, simple_loss=0.879, pruned_loss=0.5474, over 1639044.00 frames.
600
+ 2026-01-13 12:17:39,957 INFO [train.py:930] (1/2) Maximum memory allocated so far is 4462MB
601
+ 2026-01-13 12:17:47,633 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([2.8254, 1.9520, 1.5732, 3.8457, 2.6316, 4.0628, 3.9863, 2.3172],
602
+ device='cuda:1'), covar=tensor([0.1003, 0.4758, 0.5018, 0.0807, 0.2587, 0.0658, 0.0624, 0.4646],
603
+ device='cuda:1'), in_proj_covar=tensor([0.0050, 0.0083, 0.0092, 0.0069, 0.0075, 0.0067, 0.0061, 0.0087],
604
+ device='cuda:1'), out_proj_covar=tensor([5.0716e-05, 8.1120e-05, 9.1148e-05, 6.0760e-05, 7.9381e-05, 5.9552e-05,
605
+ 5.4842e-05, 8.5480e-05], device='cuda:1')
606
+ 2026-01-13 12:17:53,571 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([2.9208, 2.0699, 1.9968, 1.7628, 2.5171, 1.6452, 1.7896, 2.8162],
607
+ device='cuda:1'), covar=tensor([0.0098, 0.0444, 0.0329, 0.0616, 0.0179, 0.0561, 0.0469, 0.0194],
608
+ device='cuda:1'), in_proj_covar=tensor([0.0014, 0.0020, 0.0019, 0.0022, 0.0016, 0.0023, 0.0020, 0.0014],
609
+ device='cuda:1'), out_proj_covar=tensor([1.3304e-05, 1.9381e-05, 1.7908e-05, 2.3097e-05, 1.3664e-05, 2.4161e-05,
610
+ 1.8719e-05, 1.3050e-05], device='cuda:1')
611
+ 2026-01-13 12:17:57,712 INFO [zipformer.py:1188] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=3288.0, num_to_drop=1, layers_to_drop={1}
612
+ 2026-01-13 12:18:00,300 INFO [zipformer.py:1188] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=3293.0, num_to_drop=0, layers_to_drop=set()
613
+ 2026-01-13 12:18:01,711 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=192, metric=2.39 vs. limit=2.0
614
+ 2026-01-13 12:18:04,792 INFO [optim.py:365] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 2.651e+02 3.519e+02 5.029e+02 1.907e+03, threshold=7.037e+02, percent-clipped=19.0
615
+ 2026-01-13 12:18:06,340 INFO [train.py:895] (1/2) Epoch 2, batch 1650, loss[loss=0.6584, simple_loss=0.5576, pruned_loss=0.3796, over 2343.00 frames. ], tot_loss[loss=0.4792, simple_loss=0.4506, pruned_loss=0.2539, over 541005.14 frames. ], batch size: 26, lr: 4.48e-02, grad_scale: 8.0
616
+ 2026-01-13 12:18:26,974 INFO [train.py:895] (1/2) Epoch 3, batch 0, loss[loss=0.3878, simple_loss=0.3925, pruned_loss=0.1916, over 2652.00 frames. ], tot_loss[loss=0.3878, simple_loss=0.3925, pruned_loss=0.1916, over 2652.00 frames. ], batch size: 7, lr: 4.25e-02, grad_scale: 8.0
617
+ 2026-01-13 12:18:26,975 INFO [train.py:920] (1/2) Computing validation loss
618
+ 2026-01-13 12:18:56,591 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([0.4170, 0.7865, 1.1271, 1.1866, 0.7548, 1.0538, 0.7846, 0.3964],
619
+ device='cuda:1'), covar=tensor([1.3924, 0.3929, 0.3793, 0.3908, 0.4075, 0.4388, 1.3222, 1.2108],
620
+ device='cuda:1'), in_proj_covar=tensor([0.0044, 0.0027, 0.0027, 0.0029, 0.0026, 0.0028, 0.0037, 0.0049],
621
+ device='cuda:1'), out_proj_covar=tensor([3.4328e-05, 1.3727e-05, 1.4365e-05, 1.7017e-05, 1.2756e-05, 1.5116e-05,
622
+ 2.8031e-05, 4.3420e-05], device='cuda:1')
623
+ 2026-01-13 12:19:30,330 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([1.7149, 0.3511, 1.5486, 1.7802, 1.9875, 0.4806, 1.0176, 0.5029],
624
+ device='cuda:1'), covar=tensor([0.1266, 0.6684, 0.1064, 0.0905, 0.0703, 0.3639, 0.3178, 0.4251],
625
+ device='cuda:1'), in_proj_covar=tensor([0.0060, 0.0093, 0.0056, 0.0057, 0.0051, 0.0066, 0.0073, 0.0077],
626
+ device='cuda:1'), out_proj_covar=tensor([4.4511e-05, 8.9485e-05, 3.7603e-05, 3.8290e-05, 3.7331e-05, 5.6253e-05,
627
+ 6.6072e-05, 6.4817e-05], device='cuda:1')
628
+ 2026-01-13 12:19:42,239 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([2.2313, 1.7646, 1.6322, 1.1614, 1.7257, 1.0846, 1.1184, 2.0305],
629
+ device='cuda:1'), covar=tensor([0.0227, 0.0633, 0.0571, 0.1077, 0.0348, 0.1010, 0.0794, 0.0293],
630
+ device='cuda:1'), in_proj_covar=tensor([0.0015, 0.0021, 0.0020, 0.0024, 0.0016, 0.0024, 0.0021, 0.0015],
631
+ device='cuda:1'), out_proj_covar=tensor([1.3409e-05, 2.0347e-05, 1.8681e-05, 2.4442e-05, 1.3800e-05, 2.5326e-05,
632
+ 1.9742e-05, 1.3644e-05], device='cuda:1')
633
+ 2026-01-13 12:19:59,139 INFO [train.py:929] (1/2) Epoch 3, validation: loss=0.9187, simple_loss=0.8283, pruned_loss=0.5045, over 1639044.00 frames.
634
+ 2026-01-13 12:19:59,140 INFO [train.py:930] (1/2) Maximum memory allocated so far is 4462MB
635
+ 2026-01-13 12:20:01,111 INFO [scaling.py:681] (1/2) Whitening: num_groups=1, num_channels=384, metric=5.65 vs. limit=5.0
636
+ 2026-01-13 12:20:12,821 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=192, metric=2.05 vs. limit=2.0
637
+ 2026-01-13 12:20:17,388 INFO [zipformer.py:1188] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=3341.0, num_to_drop=0, layers_to_drop=set()
638
+ 2026-01-13 12:20:22,397 INFO [zipformer.py:1188] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=3349.0, num_to_drop=2, layers_to_drop={0, 3}
639
+ 2026-01-13 12:20:28,553 INFO [train.py:895] (1/2) Epoch 3, batch 50, loss[loss=0.3994, simple_loss=0.4089, pruned_loss=0.1949, over 2774.00 frames. ], tot_loss[loss=0.4582, simple_loss=0.4392, pruned_loss=0.2386, over 122660.88 frames. ], batch size: 7, lr: 4.24e-02, grad_scale: 8.0
tensorboard/events.out.tfevents.1768304645.8e64ffbd666a.97184.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:022227bc99d994bb62aeb9bc3ae92fbb5add2129a9fff8c868156f9d570d6c2e
3
- size 29897
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2ae7c3a8fc15f53961d25383af2d3ed6588a038d2fdde10ce638137ec0fbc47f
3
+ size 33602