Syncing latest checkpoint
Browse files
epoch-3.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:391897c29397506f2b490ba89f3647151e3f484fdfabc6d74724a2a1a299175f
|
| 3 |
+
size 1141949651
|
log/log-train-2026-01-13-11-44-05-0
CHANGED
|
@@ -868,3 +868,54 @@
|
|
| 868 |
2026-01-13 12:36:01,524 INFO [optim.py:365] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.360e+02 2.400e+02 3.068e+02 4.251e+02 1.911e+03, threshold=6.137e+02, percent-clipped=9.0
|
| 869 |
2026-01-13 12:36:05,808 INFO [train.py:895] (0/2) Epoch 3, batch 1600, loss[loss=0.3106, simple_loss=0.3476, pruned_loss=0.1368, over 2667.00 frames. ], tot_loss[loss=0.3992, simple_loss=0.403, pruned_loss=0.1977, over 544943.32 frames. ], batch size: 10, lr: 3.94e-02, grad_scale: 16.0
|
| 870 |
2026-01-13 12:36:05,809 INFO [train.py:920] (0/2) Computing validation loss
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 868 |
2026-01-13 12:36:01,524 INFO [optim.py:365] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.360e+02 2.400e+02 3.068e+02 4.251e+02 1.911e+03, threshold=6.137e+02, percent-clipped=9.0
|
| 869 |
2026-01-13 12:36:05,808 INFO [train.py:895] (0/2) Epoch 3, batch 1600, loss[loss=0.3106, simple_loss=0.3476, pruned_loss=0.1368, over 2667.00 frames. ], tot_loss[loss=0.3992, simple_loss=0.403, pruned_loss=0.1977, over 544943.32 frames. ], batch size: 10, lr: 3.94e-02, grad_scale: 16.0
|
| 870 |
2026-01-13 12:36:05,809 INFO [train.py:920] (0/2) Computing validation loss
|
| 871 |
+
2026-01-13 12:36:29,270 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([0.7969, 1.9097, 1.9930, 1.9088, 2.3991, 2.3107, 1.1164, 1.0260],
|
| 872 |
+
device='cuda:0'), covar=tensor([0.2164, 0.0578, 0.0585, 0.0809, 0.0331, 0.0549, 0.3067, 0.1736],
|
| 873 |
+
device='cuda:0'), in_proj_covar=tensor([0.0056, 0.0036, 0.0036, 0.0039, 0.0034, 0.0038, 0.0059, 0.0058],
|
| 874 |
+
device='cuda:0'), out_proj_covar=tensor([3.8466e-05, 1.9248e-05, 1.8529e-05, 2.3375e-05, 1.6828e-05, 2.0714e-05,
|
| 875 |
+
4.6686e-05, 4.1827e-05], device='cuda:0')
|
| 876 |
+
2026-01-13 12:36:37,395 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([2.3299, 2.1623, 2.2183, 2.0612, 2.4720, 1.1585, 2.4155, 1.4380],
|
| 877 |
+
device='cuda:0'), covar=tensor([0.0438, 0.0423, 0.0631, 0.0421, 0.0356, 0.2474, 0.0554, 0.2739],
|
| 878 |
+
device='cuda:0'), in_proj_covar=tensor([0.0064, 0.0060, 0.0071, 0.0066, 0.0060, 0.0116, 0.0084, 0.0127],
|
| 879 |
+
device='cuda:0'), out_proj_covar=tensor([4.0235e-05, 3.8104e-05, 4.5732e-05, 4.0709e-05, 3.7559e-05, 9.2753e-05,
|
| 880 |
+
6.0715e-05, 1.0149e-04], device='cuda:0')
|
| 881 |
+
2026-01-13 12:36:57,586 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.4703, 1.5898, 1.5631, 1.5399, 0.8925, 0.4816, 1.3927, 1.6913],
|
| 882 |
+
device='cuda:0'), covar=tensor([0.0023, 0.0023, 0.0027, 0.0018, 0.0042, 0.0049, 0.0019, 0.0013],
|
| 883 |
+
device='cuda:0'), in_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0003],
|
| 884 |
+
device='cuda:0'), out_proj_covar=tensor([2.4716e-06, 2.7715e-06, 2.5053e-06, 2.5432e-06, 2.4521e-06, 4.8403e-06,
|
| 885 |
+
1.9234e-06, 2.0327e-06], device='cuda:0')
|
| 886 |
+
2026-01-13 12:37:39,012 INFO [train.py:929] (0/2) Epoch 3, validation: loss=0.8167, simple_loss=0.7421, pruned_loss=0.4457, over 1639044.00 frames.
|
| 887 |
+
2026-01-13 12:37:39,013 INFO [train.py:930] (0/2) Maximum memory allocated so far is 5225MB
|
| 888 |
+
2026-01-13 12:37:54,036 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0
|
| 889 |
+
2026-01-13 12:37:59,022 INFO [zipformer.py:1188] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=4944.0, num_to_drop=0, layers_to_drop=set()
|
| 890 |
+
2026-01-13 12:38:04,898 INFO [scaling.py:681] (0/2) Whitening: num_groups=1, num_channels=384, metric=5.89 vs. limit=5.0
|
| 891 |
+
2026-01-13 12:38:07,236 INFO [train.py:895] (0/2) Epoch 3, batch 1650, loss[loss=0.5517, simple_loss=0.4848, pruned_loss=0.3093, over 2450.00 frames. ], tot_loss[loss=0.4193, simple_loss=0.4173, pruned_loss=0.2107, over 539789.69 frames. ], batch size: 26, lr: 3.93e-02, grad_scale: 16.0
|
| 892 |
+
2026-01-13 12:38:10,413 INFO [checkpoint.py:74] (0/2) Saving checkpoint to /kaggle/working/amharic_training/exp_amharic_streaming/epoch-3.pt
|
| 893 |
+
2026-01-13 12:38:27,699 INFO [train.py:895] (0/2) Epoch 4, batch 0, loss[loss=0.3933, simple_loss=0.404, pruned_loss=0.1913, over 2650.00 frames. ], tot_loss[loss=0.3933, simple_loss=0.404, pruned_loss=0.1913, over 2650.00 frames. ], batch size: 7, lr: 3.67e-02, grad_scale: 16.0
|
| 894 |
+
2026-01-13 12:38:27,700 INFO [train.py:920] (0/2) Computing validation loss
|
| 895 |
+
2026-01-13 12:38:55,712 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.2438, 0.6257, 1.1505, 1.3714, 1.0718, 1.2120, 0.5520, 0.5126],
|
| 896 |
+
device='cuda:0'), covar=tensor([0.0030, 0.0068, 0.0037, 0.0025, 0.0034, 0.0031, 0.0063, 0.0062],
|
| 897 |
+
device='cuda:0'), in_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004],
|
| 898 |
+
device='cuda:0'), out_proj_covar=tensor([2.2384e-06, 3.6934e-06, 2.4731e-06, 1.8118e-06, 2.1465e-06, 2.0979e-06,
|
| 899 |
+
3.2621e-06, 2.6457e-06], device='cuda:0')
|
| 900 |
+
2026-01-13 12:38:59,695 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.5109, 1.7756, 1.9885, 1.4528, 2.1507, 2.3608, 2.3564, 0.6120],
|
| 901 |
+
device='cuda:0'), covar=tensor([0.1606, 0.1221, 0.1806, 0.2030, 0.0561, 0.0304, 0.0435, 0.2235],
|
| 902 |
+
device='cuda:0'), in_proj_covar=tensor([0.0078, 0.0071, 0.0075, 0.0089, 0.0053, 0.0040, 0.0044, 0.0069],
|
| 903 |
+
device='cuda:0'), out_proj_covar=tensor([6.8439e-05, 6.3791e-05, 6.9510e-05, 8.0742e-05, 4.7031e-05, 3.0455e-05,
|
| 904 |
+
3.7064e-05, 6.1342e-05], device='cuda:0')
|
| 905 |
+
2026-01-13 12:39:31,732 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([2.3319, 1.7930, 1.9896, 1.9778, 2.2409, 1.1565, 2.2062, 1.1881],
|
| 906 |
+
device='cuda:0'), covar=tensor([0.0549, 0.0628, 0.0951, 0.0549, 0.0492, 0.2844, 0.0773, 0.3470],
|
| 907 |
+
device='cuda:0'), in_proj_covar=tensor([0.0064, 0.0061, 0.0073, 0.0064, 0.0060, 0.0117, 0.0084, 0.0131],
|
| 908 |
+
device='cuda:0'), out_proj_covar=tensor([4.0488e-05, 3.8970e-05, 4.7216e-05, 3.9881e-05, 3.8209e-05, 9.3525e-05,
|
| 909 |
+
6.0317e-05, 1.0385e-04], device='cuda:0')
|
| 910 |
+
2026-01-13 12:39:43,054 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.5837, 1.9708, 1.7139, 1.6529, 1.2756, 1.8318, 1.7432, 1.5568],
|
| 911 |
+
device='cuda:0'), covar=tensor([0.0897, 0.0823, 0.1776, 0.1828, 0.5074, 0.1027, 0.2064, 0.1725],
|
| 912 |
+
device='cuda:0'), in_proj_covar=tensor([0.0064, 0.0060, 0.0090, 0.0092, 0.0120, 0.0068, 0.0094, 0.0094],
|
| 913 |
+
device='cuda:0'), out_proj_covar=tensor([4.1958e-05, 4.1000e-05, 7.3342e-05, 7.7269e-05, 1.1163e-04, 5.2853e-05,
|
| 914 |
+
7.7509e-05, 7.5175e-05], device='cuda:0')
|
| 915 |
+
2026-01-13 12:39:59,863 INFO [train.py:929] (0/2) Epoch 4, validation: loss=0.8124, simple_loss=0.7414, pruned_loss=0.4417, over 1639044.00 frames.
|
| 916 |
+
2026-01-13 12:39:59,863 INFO [train.py:930] (0/2) Maximum memory allocated so far is 5225MB
|
| 917 |
+
2026-01-13 12:40:13,967 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.99 vs. limit=2.0
|
| 918 |
+
2026-01-13 12:40:17,738 INFO [zipformer.py:1188] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=4992.0, num_to_drop=0, layers_to_drop=set()
|
| 919 |
+
2026-01-13 12:40:22,821 INFO [checkpoint.py:74] (0/2) Saving checkpoint to /kaggle/working/amharic_training/exp_amharic_streaming/checkpoint-5000.pt
|
| 920 |
+
2026-01-13 12:40:25,675 INFO [optim.py:365] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.232e+02 2.595e+02 3.557e+02 4.732e+02 1.336e+03, threshold=7.114e+02, percent-clipped=11.0
|
| 921 |
+
2026-01-13 12:40:31,738 INFO [train.py:895] (0/2) Epoch 4, batch 50, loss[loss=0.3729, simple_loss=0.3885, pruned_loss=0.1787, over 2767.00 frames. ], tot_loss[loss=0.4101, simple_loss=0.4112, pruned_loss=0.2045, over 123089.28 frames. ], batch size: 7, lr: 3.66e-02, grad_scale: 16.0
|
log/log-train-2026-01-13-11-44-05-1
CHANGED
|
@@ -898,3 +898,102 @@
|
|
| 898 |
2026-01-13 12:36:01,520 INFO [optim.py:365] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.360e+02 2.400e+02 3.068e+02 4.251e+02 1.911e+03, threshold=6.137e+02, percent-clipped=9.0
|
| 899 |
2026-01-13 12:36:05,808 INFO [train.py:895] (1/2) Epoch 3, batch 1600, loss[loss=0.3744, simple_loss=0.3878, pruned_loss=0.1805, over 2677.00 frames. ], tot_loss[loss=0.4041, simple_loss=0.4067, pruned_loss=0.2008, over 546057.94 frames. ], batch size: 10, lr: 3.94e-02, grad_scale: 16.0
|
| 900 |
2026-01-13 12:36:05,808 INFO [train.py:920] (1/2) Computing validation loss
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 898 |
2026-01-13 12:36:01,520 INFO [optim.py:365] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.360e+02 2.400e+02 3.068e+02 4.251e+02 1.911e+03, threshold=6.137e+02, percent-clipped=9.0
|
| 899 |
2026-01-13 12:36:05,808 INFO [train.py:895] (1/2) Epoch 3, batch 1600, loss[loss=0.3744, simple_loss=0.3878, pruned_loss=0.1805, over 2677.00 frames. ], tot_loss[loss=0.4041, simple_loss=0.4067, pruned_loss=0.2008, over 546057.94 frames. ], batch size: 10, lr: 3.94e-02, grad_scale: 16.0
|
| 900 |
2026-01-13 12:36:05,808 INFO [train.py:920] (1/2) Computing validation loss
|
| 901 |
+
2026-01-13 12:36:36,586 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([1.2908, 0.3842, 0.9465, 1.3874, 0.9901, 0.4683, 0.8977, 0.6127],
|
| 902 |
+
device='cuda:1'), covar=tensor([0.1025, 0.5527, 0.1236, 0.0914, 0.1418, 0.2927, 0.2877, 0.3058],
|
| 903 |
+
device='cuda:1'), in_proj_covar=tensor([0.0091, 0.0141, 0.0085, 0.0090, 0.0092, 0.0099, 0.0128, 0.0130],
|
| 904 |
+
device='cuda:1'), out_proj_covar=tensor([7.4834e-05, 1.2499e-04, 6.7178e-05, 7.0397e-05, 7.3545e-05, 8.8308e-05,
|
| 905 |
+
1.1269e-04, 1.1256e-04], device='cuda:1')
|
| 906 |
+
2026-01-13 12:36:43,035 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([2.1910, 2.0285, 1.1164, 1.9523, 0.7550, 1.7068, 2.0450, 1.4068],
|
| 907 |
+
device='cuda:1'), covar=tensor([0.0437, 0.0792, 0.1647, 0.0915, 0.2173, 0.1419, 0.0579, 0.0866],
|
| 908 |
+
device='cuda:1'), in_proj_covar=tensor([0.0052, 0.0061, 0.0070, 0.0070, 0.0078, 0.0075, 0.0050, 0.0054],
|
| 909 |
+
device='cuda:1'), out_proj_covar=tensor([5.2286e-05, 6.3066e-05, 7.4302e-05, 7.3464e-05, 7.9594e-05, 7.8236e-05,
|
| 910 |
+
5.2499e-05, 5.4387e-05], device='cuda:1')
|
| 911 |
+
2026-01-13 12:36:43,626 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([1.5589, 2.2440, 2.0960, 1.9169, 1.3603, 2.1980, 1.7216, 2.1413],
|
| 912 |
+
device='cuda:1'), covar=tensor([0.0849, 0.0513, 0.1009, 0.1572, 0.3571, 0.0739, 0.1785, 0.1357],
|
| 913 |
+
device='cuda:1'), in_proj_covar=tensor([0.0057, 0.0054, 0.0085, 0.0087, 0.0104, 0.0063, 0.0085, 0.0088],
|
| 914 |
+
device='cuda:1'), out_proj_covar=tensor([3.7026e-05, 3.6405e-05, 6.8484e-05, 7.2775e-05, 9.7131e-05, 4.9044e-05,
|
| 915 |
+
6.9859e-05, 6.9962e-05], device='cuda:1')
|
| 916 |
+
2026-01-13 12:37:13,170 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([2.1140, 1.1477, 2.1848, 1.7130, 2.8386, 1.1333, 2.9608, 2.2788],
|
| 917 |
+
device='cuda:1'), covar=tensor([0.4436, 0.4053, 0.3423, 0.3212, 0.0843, 0.7656, 0.0664, 0.2833],
|
| 918 |
+
device='cuda:1'), in_proj_covar=tensor([0.0102, 0.0078, 0.0080, 0.0075, 0.0055, 0.0111, 0.0050, 0.0071],
|
| 919 |
+
device='cuda:1'), out_proj_covar=tensor([1.1514e-04, 8.9821e-05, 9.0734e-05, 8.2935e-05, 5.3638e-05, 1.1691e-04,
|
| 920 |
+
5.9586e-05, 8.3831e-05], device='cuda:1')
|
| 921 |
+
2026-01-13 12:37:39,012 INFO [train.py:929] (1/2) Epoch 3, validation: loss=0.8167, simple_loss=0.7421, pruned_loss=0.4457, over 1639044.00 frames.
|
| 922 |
+
2026-01-13 12:37:39,013 INFO [train.py:930] (1/2) Maximum memory allocated so far is 4802MB
|
| 923 |
+
2026-01-13 12:37:50,526 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([2.0551, 1.9250, 2.3895, 1.5809, 2.1993, 2.8317, 2.6309, 0.8988],
|
| 924 |
+
device='cuda:1'), covar=tensor([0.1172, 0.1472, 0.1547, 0.2293, 0.0663, 0.0317, 0.0618, 0.2153],
|
| 925 |
+
device='cuda:1'), in_proj_covar=tensor([0.0078, 0.0069, 0.0071, 0.0088, 0.0052, 0.0040, 0.0043, 0.0069],
|
| 926 |
+
device='cuda:1'), out_proj_covar=tensor([6.8005e-05, 6.2532e-05, 6.5659e-05, 7.9264e-05, 4.6448e-05, 3.0838e-05,
|
| 927 |
+
3.6162e-05, 6.1024e-05], device='cuda:1')
|
| 928 |
+
2026-01-13 12:37:50,656 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.99 vs. limit=2.0
|
| 929 |
+
2026-01-13 12:37:58,977 INFO [zipformer.py:1188] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=4944.0, num_to_drop=0, layers_to_drop=set()
|
| 930 |
+
2026-01-13 12:38:07,235 INFO [train.py:895] (1/2) Epoch 3, batch 1650, loss[loss=0.5533, simple_loss=0.4914, pruned_loss=0.3076, over 2498.00 frames. ], tot_loss[loss=0.4251, simple_loss=0.4211, pruned_loss=0.2145, over 540357.11 frames. ], batch size: 26, lr: 3.93e-02, grad_scale: 16.0
|
| 931 |
+
2026-01-13 12:38:27,699 INFO [train.py:895] (1/2) Epoch 4, batch 0, loss[loss=0.3967, simple_loss=0.3885, pruned_loss=0.2025, over 2643.00 frames. ], tot_loss[loss=0.3967, simple_loss=0.3885, pruned_loss=0.2025, over 2643.00 frames. ], batch size: 7, lr: 3.67e-02, grad_scale: 16.0
|
| 932 |
+
2026-01-13 12:38:27,700 INFO [train.py:920] (1/2) Computing validation loss
|
| 933 |
+
2026-01-13 12:38:47,408 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([2.3524, 1.7948, 2.0307, 2.0186, 2.1958, 1.1016, 2.2413, 1.2050],
|
| 934 |
+
device='cuda:1'), covar=tensor([0.0623, 0.0732, 0.0878, 0.0604, 0.0520, 0.2847, 0.0867, 0.3512],
|
| 935 |
+
device='cuda:1'), in_proj_covar=tensor([0.0064, 0.0061, 0.0073, 0.0064, 0.0060, 0.0117, 0.0084, 0.0131],
|
| 936 |
+
device='cuda:1'), out_proj_covar=tensor([4.0488e-05, 3.8970e-05, 4.7216e-05, 3.9881e-05, 3.8209e-05, 9.3525e-05,
|
| 937 |
+
6.0317e-05, 1.0385e-04], device='cuda:1')
|
| 938 |
+
2026-01-13 12:38:47,440 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([1.5568, 1.7576, 1.8864, 1.3968, 2.1039, 2.4357, 2.3552, 0.6386],
|
| 939 |
+
device='cuda:1'), covar=tensor([0.1762, 0.1611, 0.2353, 0.2499, 0.0654, 0.0337, 0.0480, 0.2474],
|
| 940 |
+
device='cuda:1'), in_proj_covar=tensor([0.0078, 0.0071, 0.0075, 0.0089, 0.0053, 0.0040, 0.0044, 0.0069],
|
| 941 |
+
device='cuda:1'), out_proj_covar=tensor([6.8439e-05, 6.3791e-05, 6.9510e-05, 8.0742e-05, 4.7031e-05, 3.0455e-05,
|
| 942 |
+
3.7064e-05, 6.1342e-05], device='cuda:1')
|
| 943 |
+
2026-01-13 12:38:59,811 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([2.1910, 1.7135, 1.9363, 1.9356, 2.0831, 1.0516, 2.0635, 1.1508],
|
| 944 |
+
device='cuda:1'), covar=tensor([0.0363, 0.0498, 0.0580, 0.0416, 0.0333, 0.1889, 0.0561, 0.2594],
|
| 945 |
+
device='cuda:1'), in_proj_covar=tensor([0.0064, 0.0061, 0.0073, 0.0064, 0.0060, 0.0117, 0.0084, 0.0131],
|
| 946 |
+
device='cuda:1'), out_proj_covar=tensor([4.0488e-05, 3.8970e-05, 4.7216e-05, 3.9881e-05, 3.8209e-05, 9.3525e-05,
|
| 947 |
+
6.0317e-05, 1.0385e-04], device='cuda:1')
|
| 948 |
+
2026-01-13 12:39:07,542 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([1.2342, 0.5710, 1.2351, 1.4419, 1.1177, 1.2032, 0.5340, 0.5298],
|
| 949 |
+
device='cuda:1'), covar=tensor([0.0033, 0.0079, 0.0036, 0.0024, 0.0035, 0.0029, 0.0061, 0.0058],
|
| 950 |
+
device='cuda:1'), in_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004],
|
| 951 |
+
device='cuda:1'), out_proj_covar=tensor([2.2384e-06, 3.6934e-06, 2.4731e-06, 1.8118e-06, 2.1465e-06, 2.0979e-06,
|
| 952 |
+
3.2621e-06, 2.6457e-06], device='cuda:1')
|
| 953 |
+
2026-01-13 12:39:08,817 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([1.9719, 1.7721, 1.3160, 1.6701, 0.7730, 1.6160, 2.1978, 1.6817],
|
| 954 |
+
device='cuda:1'), covar=tensor([0.0716, 0.1247, 0.2017, 0.1683, 0.3089, 0.1909, 0.0682, 0.1086],
|
| 955 |
+
device='cuda:1'), in_proj_covar=tensor([0.0053, 0.0061, 0.0073, 0.0074, 0.0081, 0.0080, 0.0051, 0.0056],
|
| 956 |
+
device='cuda:1'), out_proj_covar=tensor([5.3396e-05, 6.4062e-05, 7.7408e-05, 7.7910e-05, 8.2198e-05, 8.2826e-05,
|
| 957 |
+
5.3088e-05, 5.6501e-05], device='cuda:1')
|
| 958 |
+
2026-01-13 12:39:20,437 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([1.2938, 2.2480, 1.6380, 1.7504, 2.6295, 1.2563, 1.6842, 1.6525],
|
| 959 |
+
device='cuda:1'), covar=tensor([0.5369, 0.0731, 0.4785, 0.3389, 0.0518, 0.4746, 0.2988, 0.3548],
|
| 960 |
+
device='cuda:1'), in_proj_covar=tensor([0.0134, 0.0077, 0.0149, 0.0123, 0.0069, 0.0132, 0.0112, 0.0131],
|
| 961 |
+
device='cuda:1'), out_proj_covar=tensor([1.3018e-04, 7.3140e-05, 1.4545e-04, 1.1898e-04, 6.9222e-05, 1.3073e-04,
|
| 962 |
+
1.1018e-04, 1.3280e-04], device='cuda:1')
|
| 963 |
+
2026-01-13 12:39:20,627 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([2.0369, 1.5831, 1.0740, 2.2887, 1.7588, 2.5650, 2.3067, 1.6291],
|
| 964 |
+
device='cuda:1'), covar=tensor([0.1611, 0.5858, 0.7271, 0.2284, 0.3265, 0.1631, 0.2965, 0.7333],
|
| 965 |
+
device='cuda:1'), in_proj_covar=tensor([0.0055, 0.0088, 0.0093, 0.0070, 0.0069, 0.0070, 0.0072, 0.0092],
|
| 966 |
+
device='cuda:1'), out_proj_covar=tensor([5.6537e-05, 8.5812e-05, 9.1480e-05, 6.6171e-05, 7.5444e-05, 6.5282e-05,
|
| 967 |
+
6.7314e-05, 9.0299e-05], device='cuda:1')
|
| 968 |
+
2026-01-13 12:39:28,605 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([1.7841, 1.6862, 1.6585, 1.0411, 1.9444, 1.1284, 1.4045, 1.9772],
|
| 969 |
+
device='cuda:1'), covar=tensor([0.0522, 0.0726, 0.0482, 0.1215, 0.0318, 0.0970, 0.0675, 0.0294],
|
| 970 |
+
device='cuda:1'), in_proj_covar=tensor([0.0021, 0.0036, 0.0030, 0.0039, 0.0025, 0.0039, 0.0033, 0.0021],
|
| 971 |
+
device='cuda:1'), out_proj_covar=tensor([1.9861e-05, 3.6596e-05, 2.9809e-05, 4.0775e-05, 2.2825e-05, 4.1181e-05,
|
| 972 |
+
3.3351e-05, 2.0242e-05], device='cuda:1')
|
| 973 |
+
2026-01-13 12:39:34,796 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([0.6436, 0.4864, 1.8832, 2.1219, 1.7241, 1.1220, 1.1490, 0.9529],
|
| 974 |
+
device='cuda:1'), covar=tensor([0.0651, 0.0699, 0.0186, 0.0149, 0.0263, 0.0418, 0.0457, 0.0510],
|
| 975 |
+
device='cuda:1'), in_proj_covar=tensor([0.0028, 0.0027, 0.0021, 0.0019, 0.0022, 0.0023, 0.0025, 0.0028],
|
| 976 |
+
device='cuda:1'), out_proj_covar=tensor([2.5085e-05, 2.6605e-05, 1.5945e-05, 1.5316e-05, 1.5734e-05, 1.8041e-05,
|
| 977 |
+
2.1878e-05, 2.6637e-05], device='cuda:1')
|
| 978 |
+
2026-01-13 12:39:38,497 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([1.2816, 1.3474, 1.2817, 1.3493, 1.0627, 0.5852, 1.2865, 1.4857],
|
| 979 |
+
device='cuda:1'), covar=tensor([0.0033, 0.0042, 0.0037, 0.0033, 0.0063, 0.0077, 0.0027, 0.0020],
|
| 980 |
+
device='cuda:1'), in_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003],
|
| 981 |
+
device='cuda:1'), out_proj_covar=tensor([2.6638e-06, 2.9662e-06, 2.7170e-06, 2.8313e-06, 2.7990e-06, 5.3002e-06,
|
| 982 |
+
2.0117e-06, 2.1323e-06], device='cuda:1')
|
| 983 |
+
2026-01-13 12:39:54,242 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([2.3979, 1.8733, 2.0951, 2.0229, 2.2591, 1.1783, 2.2777, 1.2767],
|
| 984 |
+
device='cuda:1'), covar=tensor([0.0506, 0.0600, 0.0820, 0.0544, 0.0472, 0.2703, 0.0718, 0.3465],
|
| 985 |
+
device='cuda:1'), in_proj_covar=tensor([0.0064, 0.0061, 0.0073, 0.0064, 0.0060, 0.0117, 0.0084, 0.0131],
|
| 986 |
+
device='cuda:1'), out_proj_covar=tensor([4.0488e-05, 3.8970e-05, 4.7216e-05, 3.9881e-05, 3.8209e-05, 9.3525e-05,
|
| 987 |
+
6.0317e-05, 1.0385e-04], device='cuda:1')
|
| 988 |
+
2026-01-13 12:39:59,863 INFO [train.py:929] (1/2) Epoch 4, validation: loss=0.8124, simple_loss=0.7414, pruned_loss=0.4417, over 1639044.00 frames.
|
| 989 |
+
2026-01-13 12:39:59,863 INFO [train.py:930] (1/2) Maximum memory allocated so far is 4802MB
|
| 990 |
+
2026-01-13 12:40:07,459 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=96, metric=2.18 vs. limit=2.0
|
| 991 |
+
2026-01-13 12:40:17,736 INFO [zipformer.py:1188] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=4992.0, num_to_drop=0, layers_to_drop=set()
|
| 992 |
+
2026-01-13 12:40:25,675 INFO [optim.py:365] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.232e+02 2.595e+02 3.557e+02 4.732e+02 1.336e+03, threshold=7.114e+02, percent-clipped=11.0
|
| 993 |
+
2026-01-13 12:40:31,739 INFO [train.py:895] (1/2) Epoch 4, batch 50, loss[loss=0.4493, simple_loss=0.4416, pruned_loss=0.2285, over 2765.00 frames. ], tot_loss[loss=0.4144, simple_loss=0.418, pruned_loss=0.2054, over 122701.88 frames. ], batch size: 7, lr: 3.66e-02, grad_scale: 16.0
|
| 994 |
+
2026-01-13 12:40:48,251 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.99 vs. limit=2.0
|
| 995 |
+
2026-01-13 12:40:51,804 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([3.0651, 1.5650, 2.5684, 2.6059, 2.8750, 1.8216, 2.6394, 1.7553],
|
| 996 |
+
device='cuda:1'), covar=tensor([0.0234, 0.0691, 0.0554, 0.0218, 0.0232, 0.2862, 0.0797, 0.2556],
|
| 997 |
+
device='cuda:1'), in_proj_covar=tensor([0.0061, 0.0060, 0.0071, 0.0062, 0.0059, 0.0118, 0.0082, 0.0128],
|
| 998 |
+
device='cuda:1'), out_proj_covar=tensor([3.8205e-05, 3.9480e-05, 4.6615e-05, 3.8175e-05, 3.7666e-05, 9.3861e-05,
|
| 999 |
+
5.8656e-05, 1.0144e-04], device='cuda:1')
|
tensorboard/events.out.tfevents.1768304645.8e64ffbd666a.97184.0
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5d140f935357b104738007bd1189f7d6ab1140036b3a04a251303e73205efae9
|
| 3 |
+
size 50024
|