Syncing latest checkpoint
Browse files
epoch-8.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0976679d04b49a41e644ed8324ce78584028245143d278cb7eb201c6520ca4b0
|
| 3 |
+
size 1141949907
|
log/log-train-2026-01-13-11-44-05-0
CHANGED
|
@@ -2497,3 +2497,76 @@
|
|
| 2497 |
device='cuda:0'), in_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003],
|
| 2498 |
device='cuda:0'), out_proj_covar=tensor([1.8853e-06, 2.2781e-06, 1.9978e-06, 1.8390e-06, 1.9671e-06, 1.8451e-06,
|
| 2499 |
2.3447e-06, 2.1284e-06], device='cuda:0')
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2497 |
device='cuda:0'), in_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003],
|
| 2498 |
device='cuda:0'), out_proj_covar=tensor([1.8853e-06, 2.2781e-06, 1.9978e-06, 1.8390e-06, 1.9671e-06, 1.8451e-06,
|
| 2499 |
2.3447e-06, 2.1284e-06], device='cuda:0')
|
| 2500 |
+
2026-01-13 14:17:49,129 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([0.6393, 2.5399, 2.5533, 1.7404, 2.4464, 2.3230, 0.9139, 0.8368],
|
| 2501 |
+
device='cuda:0'), covar=tensor([0.1168, 0.0151, 0.0124, 0.0371, 0.0158, 0.0192, 0.2028, 0.0842],
|
| 2502 |
+
device='cuda:0'), in_proj_covar=tensor([0.0137, 0.0082, 0.0079, 0.0110, 0.0087, 0.0082, 0.0182, 0.0136],
|
| 2503 |
+
device='cuda:0'), out_proj_covar=tensor([8.3280e-05, 3.2959e-05, 3.0598e-05, 5.2449e-05, 3.3355e-05, 3.2258e-05,
|
| 2504 |
+
1.2301e-04, 7.3572e-05], device='cuda:0')
|
| 2505 |
+
2026-01-13 14:17:54,772 INFO [train.py:929] (0/2) Epoch 8, validation: loss=0.7084, simple_loss=0.668, pruned_loss=0.3744, over 1639044.00 frames.
|
| 2506 |
+
2026-01-13 14:17:54,772 INFO [train.py:930] (0/2) Maximum memory allocated so far is 5712MB
|
| 2507 |
+
2026-01-13 14:18:14,047 INFO [optim.py:365] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.921e+01 1.609e+02 2.212e+02 3.013e+02 8.937e+02, threshold=4.425e+02, percent-clipped=10.0
|
| 2508 |
+
2026-01-13 14:18:18,215 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([0.8760, 0.9436, 1.5177, 2.2698, 2.4522, 1.3634, 1.0347, 0.7155],
|
| 2509 |
+
device='cuda:0'), covar=tensor([0.0468, 0.0376, 0.0228, 0.0186, 0.0085, 0.0221, 0.0530, 0.0422],
|
| 2510 |
+
device='cuda:0'), in_proj_covar=tensor([0.0050, 0.0046, 0.0037, 0.0036, 0.0034, 0.0037, 0.0046, 0.0049],
|
| 2511 |
+
device='cuda:0'), out_proj_covar=tensor([5.1635e-05, 4.7296e-05, 3.5267e-05, 3.4166e-05, 3.1647e-05, 3.6796e-05,
|
| 2512 |
+
4.8916e-05, 4.9860e-05], device='cuda:0')
|
| 2513 |
+
2026-01-13 14:18:23,278 INFO [train.py:895] (0/2) Epoch 8, batch 1650, loss[loss=0.3682, simple_loss=0.3802, pruned_loss=0.1781, over 2514.00 frames. ], tot_loss[loss=0.2985, simple_loss=0.338, pruned_loss=0.1295, over 540783.37 frames. ], batch size: 26, lr: 1.99e-02, grad_scale: 16.0
|
| 2514 |
+
2026-01-13 14:18:26,999 INFO [checkpoint.py:74] (0/2) Saving checkpoint to /kaggle/working/amharic_training/exp_amharic_streaming/epoch-8.pt
|
| 2515 |
+
2026-01-13 14:18:44,254 INFO [train.py:895] (0/2) Epoch 9, batch 0, loss[loss=0.2181, simple_loss=0.3001, pruned_loss=0.06807, over 2652.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.3001, pruned_loss=0.06807, over 2652.00 frames. ], batch size: 7, lr: 1.88e-02, grad_scale: 16.0
|
| 2516 |
+
2026-01-13 14:18:44,255 INFO [train.py:920] (0/2) Computing validation loss
|
| 2517 |
+
2026-01-13 14:19:35,357 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([0.4456, 0.8764, 1.3074, 1.8786, 1.5928, 1.6558, 0.8408, 1.1361],
|
| 2518 |
+
device='cuda:0'), covar=tensor([0.0711, 0.0489, 0.0268, 0.0217, 0.0206, 0.0191, 0.0619, 0.0426],
|
| 2519 |
+
device='cuda:0'), in_proj_covar=tensor([0.0049, 0.0046, 0.0037, 0.0036, 0.0034, 0.0037, 0.0046, 0.0048],
|
| 2520 |
+
device='cuda:0'), out_proj_covar=tensor([5.1087e-05, 4.7042e-05, 3.5134e-05, 3.4022e-05, 3.1743e-05, 3.6845e-05,
|
| 2521 |
+
4.8657e-05, 4.9294e-05], device='cuda:0')
|
| 2522 |
+
2026-01-13 14:19:43,132 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([2.0770, 2.3124, 2.0293, 1.9715, 1.0384, 2.5519, 1.8173, 1.9444],
|
| 2523 |
+
device='cuda:0'), covar=tensor([0.0244, 0.0231, 0.0930, 0.0923, 0.3535, 0.0217, 0.1301, 0.0867],
|
| 2524 |
+
device='cuda:0'), in_proj_covar=tensor([0.0078, 0.0076, 0.0162, 0.0164, 0.0229, 0.0099, 0.0175, 0.0168],
|
| 2525 |
+
device='cuda:0'), out_proj_covar=tensor([6.8349e-05, 6.7994e-05, 1.4084e-04, 1.4533e-04, 1.9786e-04, 9.1039e-05,
|
| 2526 |
+
1.4997e-04, 1.4696e-04], device='cuda:0')
|
| 2527 |
+
2026-01-13 14:19:50,699 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.9533, 1.7645, 1.6132, 2.3766, 1.4142, 1.0068, 0.5147, 2.6883],
|
| 2528 |
+
device='cuda:0'), covar=tensor([0.1325, 0.1486, 0.1098, 0.0336, 0.1984, 0.1600, 0.2810, 0.0188],
|
| 2529 |
+
device='cuda:0'), in_proj_covar=tensor([0.0135, 0.0113, 0.0098, 0.0072, 0.0134, 0.0123, 0.0127, 0.0059],
|
| 2530 |
+
device='cuda:0'), out_proj_covar=tensor([1.2570e-04, 1.0765e-04, 9.2941e-05, 6.0261e-05, 1.2556e-04, 1.1399e-04,
|
| 2531 |
+
1.2709e-04, 5.0190e-05], device='cuda:0')
|
| 2532 |
+
2026-01-13 14:19:51,456 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([0.9945, 0.9105, 1.0833, 1.4351, 0.8528, 0.9356, 0.5715, 1.0658],
|
| 2533 |
+
device='cuda:0'), covar=tensor([0.0009, 0.0010, 0.0008, 0.0008, 0.0014, 0.0009, 0.0014, 0.0007],
|
| 2534 |
+
device='cuda:0'), in_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004],
|
| 2535 |
+
device='cuda:0'), out_proj_covar=tensor([2.1044e-06, 2.4972e-06, 1.9804e-06, 1.9893e-06, 3.0872e-06, 2.2925e-06,
|
| 2536 |
+
2.2069e-06, 1.9874e-06], device='cuda:0')
|
| 2537 |
+
2026-01-13 14:20:12,974 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.6826, 1.0947, 1.9665, 1.3147, 2.7888, 1.1623, 2.8894, 2.0075],
|
| 2538 |
+
device='cuda:0'), covar=tensor([0.2694, 0.2461, 0.1965, 0.2431, 0.0421, 0.3355, 0.0559, 0.1811],
|
| 2539 |
+
device='cuda:0'), in_proj_covar=tensor([0.0071, 0.0070, 0.0068, 0.0069, 0.0052, 0.0085, 0.0043, 0.0063],
|
| 2540 |
+
device='cuda:0'), out_proj_covar=tensor([9.9151e-05, 9.2898e-05, 9.0328e-05, 9.0101e-05, 6.1147e-05, 1.0806e-04,
|
| 2541 |
+
6.0337e-05, 8.5554e-05], device='cuda:0')
|
| 2542 |
+
2026-01-13 14:20:16,507 INFO [train.py:929] (0/2) Epoch 9, validation: loss=0.6728, simple_loss=0.6428, pruned_loss=0.3513, over 1639044.00 frames.
|
| 2543 |
+
2026-01-13 14:20:16,507 INFO [train.py:930] (0/2) Maximum memory allocated so far is 5712MB
|
| 2544 |
+
2026-01-13 14:20:46,472 INFO [train.py:895] (0/2) Epoch 9, batch 50, loss[loss=0.2421, simple_loss=0.2949, pruned_loss=0.09465, over 2760.00 frames. ], tot_loss[loss=0.2784, simple_loss=0.3252, pruned_loss=0.1158, over 123593.05 frames. ], batch size: 7, lr: 1.88e-02, grad_scale: 16.0
|
| 2545 |
+
2026-01-13 14:21:03,900 INFO [optim.py:365] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 1.648e+02 2.356e+02 3.038e+02 6.133e+02, threshold=4.712e+02, percent-clipped=9.0
|
| 2546 |
+
2026-01-13 14:21:13,043 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=96, metric=2.19 vs. limit=2.0
|
| 2547 |
+
2026-01-13 14:21:15,783 INFO [train.py:895] (0/2) Epoch 9, batch 100, loss[loss=0.2319, simple_loss=0.2811, pruned_loss=0.09135, over 2890.00 frames. ], tot_loss[loss=0.2707, simple_loss=0.3186, pruned_loss=0.1114, over 217501.94 frames. ], batch size: 8, lr: 1.88e-02, grad_scale: 16.0
|
| 2548 |
+
2026-01-13 14:21:19,257 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([3.1692, 1.5831, 1.2613, 3.7359, 2.3049, 3.7928, 3.5123, 2.5079],
|
| 2549 |
+
device='cuda:0'), covar=tensor([0.0651, 0.3848, 0.3800, 0.0716, 0.1444, 0.0511, 0.0779, 0.3288],
|
| 2550 |
+
device='cuda:0'), in_proj_covar=tensor([0.0060, 0.0100, 0.0097, 0.0082, 0.0071, 0.0075, 0.0086, 0.0101],
|
| 2551 |
+
device='cuda:0'), out_proj_covar=tensor([5.9119e-05, 9.5635e-05, 9.3682e-05, 7.4570e-05, 7.4981e-05, 6.9629e-05,
|
| 2552 |
+
8.0637e-05, 9.6769e-05], device='cuda:0')
|
| 2553 |
+
2026-01-13 14:21:41,307 INFO [scaling.py:681] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.83 vs. limit=5.0
|
| 2554 |
+
2026-01-13 14:21:45,204 INFO [train.py:895] (0/2) Epoch 9, batch 150, loss[loss=0.2277, simple_loss=0.2746, pruned_loss=0.09042, over 2769.00 frames. ], tot_loss[loss=0.2626, simple_loss=0.3124, pruned_loss=0.1064, over 291387.72 frames. ], batch size: 7, lr: 1.87e-02, grad_scale: 16.0
|
| 2555 |
+
2026-01-13 14:22:03,243 INFO [optim.py:365] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.894e+01 1.434e+02 1.799e+02 2.533e+02 7.964e+02, threshold=3.597e+02, percent-clipped=3.0
|
| 2556 |
+
2026-01-13 14:22:09,274 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([2.7662, 1.7222, 3.6637, 3.2857, 3.7754, 2.5872, 3.4766, 1.8934],
|
| 2557 |
+
device='cuda:0'), covar=tensor([0.1001, 1.3317, 0.0254, 0.3785, 0.0192, 0.4647, 0.5974, 2.2380],
|
| 2558 |
+
device='cuda:0'), in_proj_covar=tensor([0.0099, 0.0197, 0.0093, 0.0088, 0.0090, 0.0134, 0.0176, 0.0164],
|
| 2559 |
+
device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002],
|
| 2560 |
+
device='cuda:0')
|
| 2561 |
+
2026-01-13 14:22:11,680 INFO [zipformer.py:1188] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=13419.0, num_to_drop=0, layers_to_drop=set()
|
| 2562 |
+
2026-01-13 14:22:15,150 INFO [train.py:895] (0/2) Epoch 9, batch 200, loss[loss=0.3099, simple_loss=0.343, pruned_loss=0.1384, over 2591.00 frames. ], tot_loss[loss=0.2667, simple_loss=0.3144, pruned_loss=0.1095, over 348617.27 frames. ], batch size: 16, lr: 1.87e-02, grad_scale: 16.0
|
| 2563 |
+
2026-01-13 14:22:24,205 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([2.6536, 2.9709, 2.2098, 2.6199, 3.1646, 3.3986, 3.4147, 1.2221],
|
| 2564 |
+
device='cuda:0'), covar=tensor([0.0852, 0.0460, 0.1650, 0.1022, 0.0310, 0.0051, 0.0139, 0.1490],
|
| 2565 |
+
device='cuda:0'), in_proj_covar=tensor([0.0176, 0.0143, 0.0208, 0.0180, 0.0125, 0.0067, 0.0108, 0.0150],
|
| 2566 |
+
device='cuda:0'), out_proj_covar=tensor([1.6523e-04, 1.3617e-04, 1.9761e-04, 1.6774e-04, 1.1806e-04, 6.5236e-05,
|
| 2567 |
+
1.0030e-04, 1.3814e-04], device='cuda:0')
|
| 2568 |
+
2026-01-13 14:22:32,965 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([2.1200, 2.9080, 1.7594, 3.3860, 2.0659, 1.7680, 0.9248, 3.4570],
|
| 2569 |
+
device='cuda:0'), covar=tensor([0.1520, 0.1026, 0.1323, 0.0161, 0.2888, 0.1700, 0.3225, 0.0102],
|
| 2570 |
+
device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0112, 0.0098, 0.0070, 0.0133, 0.0117, 0.0122, 0.0060],
|
| 2571 |
+
device='cuda:0'), out_proj_covar=tensor([1.2320e-04, 1.0805e-04, 9.2213e-05, 5.9169e-05, 1.2461e-04, 1.0957e-04,
|
| 2572 |
+
1.2238e-04, 5.0440e-05], device='cuda:0')
|
log/log-train-2026-01-13-11-44-05-1
CHANGED
|
@@ -2511,3 +2511,52 @@
|
|
| 2511 |
device='cuda:1'), in_proj_covar=tensor([0.0180, 0.0148, 0.0213, 0.0185, 0.0130, 0.0067, 0.0111, 0.0154],
|
| 2512 |
device='cuda:1'), out_proj_covar=tensor([1.7000e-04, 1.4119e-04, 2.0353e-04, 1.7227e-04, 1.2346e-04, 6.7704e-05,
|
| 2513 |
1.0250e-04, 1.4216e-04], device='cuda:1')
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2511 |
device='cuda:1'), in_proj_covar=tensor([0.0180, 0.0148, 0.0213, 0.0185, 0.0130, 0.0067, 0.0111, 0.0154],
|
| 2512 |
device='cuda:1'), out_proj_covar=tensor([1.7000e-04, 1.4119e-04, 2.0353e-04, 1.7227e-04, 1.2346e-04, 6.7704e-05,
|
| 2513 |
1.0250e-04, 1.4216e-04], device='cuda:1')
|
| 2514 |
+
2026-01-13 14:17:46,978 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([0.3519, 0.8884, 1.1128, 1.5761, 1.7578, 1.8505, 1.0071, 0.9184],
|
| 2515 |
+
device='cuda:1'), covar=tensor([0.0572, 0.0338, 0.0238, 0.0282, 0.0144, 0.0145, 0.0508, 0.0394],
|
| 2516 |
+
device='cuda:1'), in_proj_covar=tensor([0.0047, 0.0044, 0.0036, 0.0034, 0.0033, 0.0035, 0.0044, 0.0046],
|
| 2517 |
+
device='cuda:1'), out_proj_covar=tensor([4.9283e-05, 4.5033e-05, 3.3782e-05, 3.2267e-05, 3.0558e-05, 3.4422e-05,
|
| 2518 |
+
4.6994e-05, 4.7524e-05], device='cuda:1')
|
| 2519 |
+
2026-01-13 14:17:54,772 INFO [train.py:929] (1/2) Epoch 8, validation: loss=0.7084, simple_loss=0.668, pruned_loss=0.3744, over 1639044.00 frames.
|
| 2520 |
+
2026-01-13 14:17:54,772 INFO [train.py:930] (1/2) Maximum memory allocated so far is 5734MB
|
| 2521 |
+
2026-01-13 14:18:14,045 INFO [optim.py:365] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.921e+01 1.609e+02 2.212e+02 3.013e+02 8.937e+02, threshold=4.425e+02, percent-clipped=10.0
|
| 2522 |
+
2026-01-13 14:18:18,950 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.60 vs. limit=2.0
|
| 2523 |
+
2026-01-13 14:18:23,282 INFO [train.py:895] (1/2) Epoch 8, batch 1650, loss[loss=0.386, simple_loss=0.4131, pruned_loss=0.1795, over 2368.00 frames. ], tot_loss[loss=0.3013, simple_loss=0.3393, pruned_loss=0.1316, over 540616.60 frames. ], batch size: 26, lr: 1.99e-02, grad_scale: 16.0
|
| 2524 |
+
2026-01-13 14:18:44,251 INFO [train.py:895] (1/2) Epoch 9, batch 0, loss[loss=0.293, simple_loss=0.3513, pruned_loss=0.1174, over 2648.00 frames. ], tot_loss[loss=0.293, simple_loss=0.3513, pruned_loss=0.1174, over 2648.00 frames. ], batch size: 7, lr: 1.88e-02, grad_scale: 16.0
|
| 2525 |
+
2026-01-13 14:18:44,251 INFO [train.py:920] (1/2) Computing validation loss
|
| 2526 |
+
2026-01-13 14:19:35,471 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([0.4673, 0.9049, 1.2575, 1.8098, 1.5705, 1.6194, 0.8669, 1.1054],
|
| 2527 |
+
device='cuda:1'), covar=tensor([0.0750, 0.0498, 0.0298, 0.0273, 0.0199, 0.0220, 0.0672, 0.0468],
|
| 2528 |
+
device='cuda:1'), in_proj_covar=tensor([0.0049, 0.0046, 0.0037, 0.0036, 0.0034, 0.0037, 0.0046, 0.0048],
|
| 2529 |
+
device='cuda:1'), out_proj_covar=tensor([5.1087e-05, 4.7042e-05, 3.5134e-05, 3.4022e-05, 3.1743e-05, 3.6845e-05,
|
| 2530 |
+
4.8657e-05, 4.9294e-05], device='cuda:1')
|
| 2531 |
+
2026-01-13 14:19:43,176 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([2.1137, 2.3183, 2.0827, 2.0028, 1.1444, 2.5583, 1.8748, 2.0237],
|
| 2532 |
+
device='cuda:1'), covar=tensor([0.0228, 0.0212, 0.0855, 0.0843, 0.3404, 0.0190, 0.1208, 0.0824],
|
| 2533 |
+
device='cuda:1'), in_proj_covar=tensor([0.0078, 0.0076, 0.0162, 0.0164, 0.0229, 0.0099, 0.0175, 0.0168],
|
| 2534 |
+
device='cuda:1'), out_proj_covar=tensor([6.8349e-05, 6.7994e-05, 1.4084e-04, 1.4533e-04, 1.9786e-04, 9.1039e-05,
|
| 2535 |
+
1.4997e-04, 1.4696e-04], device='cuda:1')
|
| 2536 |
+
2026-01-13 14:19:50,809 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([1.9058, 1.8094, 1.6258, 2.4406, 1.3940, 0.9934, 0.5194, 2.7016],
|
| 2537 |
+
device='cuda:1'), covar=tensor([0.1542, 0.1521, 0.1209, 0.0407, 0.2216, 0.1709, 0.2995, 0.0193],
|
| 2538 |
+
device='cuda:1'), in_proj_covar=tensor([0.0135, 0.0113, 0.0098, 0.0072, 0.0134, 0.0123, 0.0127, 0.0059],
|
| 2539 |
+
device='cuda:1'), out_proj_covar=tensor([1.2570e-04, 1.0765e-04, 9.2941e-05, 6.0261e-05, 1.2556e-04, 1.1399e-04,
|
| 2540 |
+
1.2709e-04, 5.0190e-05], device='cuda:1')
|
| 2541 |
+
2026-01-13 14:19:51,606 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([0.9719, 0.8931, 1.1117, 1.3888, 0.7809, 0.9196, 0.5686, 1.1047],
|
| 2542 |
+
device='cuda:1'), covar=tensor([0.0010, 0.0011, 0.0009, 0.0007, 0.0014, 0.0013, 0.0014, 0.0012],
|
| 2543 |
+
device='cuda:1'), in_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004],
|
| 2544 |
+
device='cuda:1'), out_proj_covar=tensor([2.1044e-06, 2.4972e-06, 1.9804e-06, 1.9893e-06, 3.0872e-06, 2.2925e-06,
|
| 2545 |
+
2.2069e-06, 1.9874e-06], device='cuda:1')
|
| 2546 |
+
2026-01-13 14:20:13,526 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([1.6488, 1.1196, 1.8573, 1.3550, 2.8543, 1.1029, 2.8927, 2.1134],
|
| 2547 |
+
device='cuda:1'), covar=tensor([0.2863, 0.2630, 0.2274, 0.2659, 0.0478, 0.4024, 0.0611, 0.1903],
|
| 2548 |
+
device='cuda:1'), in_proj_covar=tensor([0.0071, 0.0070, 0.0068, 0.0069, 0.0052, 0.0085, 0.0043, 0.0063],
|
| 2549 |
+
device='cuda:1'), out_proj_covar=tensor([9.9151e-05, 9.2898e-05, 9.0328e-05, 9.0101e-05, 6.1147e-05, 1.0806e-04,
|
| 2550 |
+
6.0337e-05, 8.5554e-05], device='cuda:1')
|
| 2551 |
+
2026-01-13 14:20:16,507 INFO [train.py:929] (1/2) Epoch 9, validation: loss=0.6728, simple_loss=0.6428, pruned_loss=0.3513, over 1639044.00 frames.
|
| 2552 |
+
2026-01-13 14:20:16,507 INFO [train.py:930] (1/2) Maximum memory allocated so far is 5734MB
|
| 2553 |
+
2026-01-13 14:20:41,390 INFO [scaling.py:681] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.80 vs. limit=5.0
|
| 2554 |
+
2026-01-13 14:20:46,473 INFO [train.py:895] (1/2) Epoch 9, batch 50, loss[loss=0.1952, simple_loss=0.2615, pruned_loss=0.06449, over 2764.00 frames. ], tot_loss[loss=0.2803, simple_loss=0.3282, pruned_loss=0.1161, over 123127.70 frames. ], batch size: 7, lr: 1.88e-02, grad_scale: 16.0
|
| 2555 |
+
2026-01-13 14:21:00,257 INFO [scaling.py:681] (1/2) Whitening: num_groups=1, num_channels=384, metric=5.50 vs. limit=5.0
|
| 2556 |
+
2026-01-13 14:21:03,899 INFO [optim.py:365] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 1.648e+02 2.356e+02 3.038e+02 6.133e+02, threshold=4.712e+02, percent-clipped=9.0
|
| 2557 |
+
2026-01-13 14:21:15,780 INFO [train.py:895] (1/2) Epoch 9, batch 100, loss[loss=0.2461, simple_loss=0.2951, pruned_loss=0.09854, over 2902.00 frames. ], tot_loss[loss=0.2785, simple_loss=0.3245, pruned_loss=0.1162, over 217629.67 frames. ], batch size: 8, lr: 1.88e-02, grad_scale: 16.0
|
| 2558 |
+
2026-01-13 14:21:31,880 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.99 vs. limit=2.0
|
| 2559 |
+
2026-01-13 14:21:45,207 INFO [train.py:895] (1/2) Epoch 9, batch 150, loss[loss=0.2321, simple_loss=0.2869, pruned_loss=0.08863, over 2771.00 frames. ], tot_loss[loss=0.2754, simple_loss=0.321, pruned_loss=0.115, over 291481.89 frames. ], batch size: 7, lr: 1.87e-02, grad_scale: 16.0
|
| 2560 |
+
2026-01-13 14:22:03,240 INFO [optim.py:365] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.894e+01 1.434e+02 1.799e+02 2.533e+02 7.964e+02, threshold=3.597e+02, percent-clipped=3.0
|
| 2561 |
+
2026-01-13 14:22:11,679 INFO [zipformer.py:1188] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=13419.0, num_to_drop=0, layers_to_drop=set()
|
| 2562 |
+
2026-01-13 14:22:15,156 INFO [train.py:895] (1/2) Epoch 9, batch 200, loss[loss=0.419, simple_loss=0.4383, pruned_loss=0.1999, over 2699.00 frames. ], tot_loss[loss=0.2749, simple_loss=0.3204, pruned_loss=0.1148, over 349551.41 frames. ], batch size: 16, lr: 1.87e-02, grad_scale: 16.0
|
tensorboard/events.out.tfevents.1768304645.8e64ffbd666a.97184.0
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c71b38963fcd1f448296528640d8d16f102f6978550ab10823053bdad853edee
|
| 3 |
+
size 133547
|