projecti7 commited on
Commit
7263da4
·
verified ·
1 Parent(s): ad15236

Auto-sync latest checkpoint + BPE model

Browse files
best-train-loss.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:87ddbc200560cee27b651e78ae7e7dbd116eedc6817617ee4bdd1d93640cb8ea
3
- size 1141949779
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d0c05fba979c3d507b7e2f7d6ed9b6e5365c18c0d3da7c612542a66fd782577
3
+ size 1141951950
best-valid-loss.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:87ddbc200560cee27b651e78ae7e7dbd116eedc6817617ee4bdd1d93640cb8ea
3
- size 1141949779
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d0c05fba979c3d507b7e2f7d6ed9b6e5365c18c0d3da7c612542a66fd782577
3
+ size 1141951950
epoch-11.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d0c05fba979c3d507b7e2f7d6ed9b6e5365c18c0d3da7c612542a66fd782577
3
+ size 1141951950
log/log-train-2026-01-13-11-44-05-0 CHANGED
@@ -3269,3 +3269,68 @@
3269
  2026-01-13 15:14:46,465 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.98 vs. limit=2.0
3270
  2026-01-13 15:14:51,282 INFO [checkpoint.py:74] (0/2) Saving checkpoint to /kaggle/working/amharic_training/exp_amharic_streaming/checkpoint-18000.pt
3271
  2026-01-13 15:14:57,379 INFO [optim.py:365] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.061e+01 1.580e+02 2.093e+02 2.798e+02 1.380e+03, threshold=4.187e+02, percent-clipped=21.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3269
  2026-01-13 15:14:46,465 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.98 vs. limit=2.0
3270
  2026-01-13 15:14:51,282 INFO [checkpoint.py:74] (0/2) Saving checkpoint to /kaggle/working/amharic_training/exp_amharic_streaming/checkpoint-18000.pt
3271
  2026-01-13 15:14:57,379 INFO [optim.py:365] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.061e+01 1.580e+02 2.093e+02 2.798e+02 1.380e+03, threshold=4.187e+02, percent-clipped=21.0
3272
+ 2026-01-13 15:15:01,097 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([3.3754, 3.2360, 1.9255, 2.6952, 1.3096, 3.3549, 2.6556, 2.4544],
3273
+ device='cuda:0'), covar=tensor([0.0065, 0.0090, 0.1131, 0.0971, 0.3212, 0.0168, 0.1442, 0.1045],
3274
+ device='cuda:0'), in_proj_covar=tensor([0.0081, 0.0080, 0.0175, 0.0189, 0.0265, 0.0109, 0.0197, 0.0194],
3275
+ device='cuda:0'), out_proj_covar=tensor([7.1931e-05, 7.2121e-05, 1.5015e-04, 1.6441e-04, 2.2556e-04, 9.9280e-05,
3276
+ 1.6711e-04, 1.6771e-04], device='cuda:0')
3277
+ 2026-01-13 15:15:03,985 INFO [zipformer.py:1188] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=18018.0, num_to_drop=0, layers_to_drop=set()
3278
+ 2026-01-13 15:15:05,792 INFO [zipformer.py:1188] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=18021.0, num_to_drop=0, layers_to_drop=set()
3279
+ 2026-01-13 15:15:11,529 INFO [train.py:895] (0/2) Epoch 11, batch 1500, loss[loss=0.196, simple_loss=0.2562, pruned_loss=0.06796, over 2863.00 frames. ], tot_loss[loss=0.2443, simple_loss=0.2976, pruned_loss=0.09553, over 551530.58 frames. ], batch size: 9, lr: 1.49e-02, grad_scale: 16.0
3280
+ 2026-01-13 15:15:16,227 INFO [zipformer.py:1188] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=18039.0, num_to_drop=1, layers_to_drop={0}
3281
+ 2026-01-13 15:15:23,224 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([0.8532, 2.8849, 2.8655, 1.8707, 2.7120, 2.9585, 1.3940, 1.0515],
3282
+ device='cuda:0'), covar=tensor([0.1228, 0.0107, 0.0086, 0.0396, 0.0097, 0.0102, 0.1447, 0.0933],
3283
+ device='cuda:0'), in_proj_covar=tensor([0.0173, 0.0099, 0.0099, 0.0137, 0.0098, 0.0097, 0.0226, 0.0167],
3284
+ device='cuda:0'), out_proj_covar=tensor([9.5228e-05, 3.5407e-05, 3.4444e-05, 5.8641e-05, 3.5331e-05, 3.3779e-05,
3285
+ 1.3730e-04, 8.0319e-05], device='cuda:0')
3286
+ 2026-01-13 15:15:26,054 INFO [zipformer.py:1188] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=18056.0, num_to_drop=0, layers_to_drop=set()
3287
+ 2026-01-13 15:15:31,749 INFO [zipformer.py:1188] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=18066.0, num_to_drop=0, layers_to_drop=set()
3288
+ 2026-01-13 15:15:40,459 INFO [train.py:895] (0/2) Epoch 11, batch 1550, loss[loss=0.209, simple_loss=0.2724, pruned_loss=0.07275, over 2916.00 frames. ], tot_loss[loss=0.2478, simple_loss=0.3001, pruned_loss=0.09782, over 549498.64 frames. ], batch size: 10, lr: 1.48e-02, grad_scale: 16.0
3289
+ 2026-01-13 15:15:43,374 INFO [zipformer.py:1188] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=18086.0, num_to_drop=0, layers_to_drop=set()
3290
+ 2026-01-13 15:15:43,927 INFO [zipformer.py:1188] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=18087.0, num_to_drop=1, layers_to_drop={1}
3291
+ 2026-01-13 15:15:55,405 INFO [optim.py:365] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.747e+01 1.493e+02 2.017e+02 2.491e+02 8.387e+02, threshold=4.034e+02, percent-clipped=5.0
3292
+ 2026-01-13 15:15:56,091 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.6401, 1.2222, 1.4903, 1.4113, 1.2600, 1.4453, 1.8040, 1.6032],
3293
+ device='cuda:0'), covar=tensor([0.0037, 0.0039, 0.0040, 0.0081, 0.0044, 0.0054, 0.0024, 0.0032],
3294
+ device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0014, 0.0013, 0.0014, 0.0014, 0.0015, 0.0014, 0.0013],
3295
+ device='cuda:0'), out_proj_covar=tensor([8.6169e-06, 7.0849e-06, 7.1203e-06, 9.0661e-06, 8.1872e-06, 1.0246e-05,
3296
+ 7.5317e-06, 6.6959e-06], device='cuda:0')
3297
+ 2026-01-13 15:16:09,379 INFO [train.py:895] (0/2) Epoch 11, batch 1600, loss[loss=0.2729, simple_loss=0.3238, pruned_loss=0.111, over 2673.00 frames. ], tot_loss[loss=0.249, simple_loss=0.3008, pruned_loss=0.09856, over 546953.23 frames. ], batch size: 10, lr: 1.48e-02, grad_scale: 16.0
3298
+ 2026-01-13 15:16:09,380 INFO [train.py:920] (0/2) Computing validation loss
3299
+ 2026-01-13 15:17:04,169 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.1297, 1.1427, 1.2709, 1.0899, 1.0795, 1.3358, 1.4701, 1.3519],
3300
+ device='cuda:0'), covar=tensor([0.0048, 0.0034, 0.0042, 0.0081, 0.0044, 0.0049, 0.0033, 0.0032],
3301
+ device='cuda:0'), in_proj_covar=tensor([0.0015, 0.0014, 0.0013, 0.0014, 0.0014, 0.0015, 0.0014, 0.0013],
3302
+ device='cuda:0'), out_proj_covar=tensor([8.7921e-06, 7.1547e-06, 7.2195e-06, 9.1911e-06, 8.2711e-06, 1.0419e-05,
3303
+ 7.5945e-06, 6.7800e-06], device='cuda:0')
3304
+ 2026-01-13 15:17:10,297 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([0.8262, 0.4307, 1.0002, 1.1787, 1.1275, 1.0877, 1.0694, 1.0616],
3305
+ device='cuda:0'), covar=tensor([0.0007, 0.0008, 0.0006, 0.0004, 0.0005, 0.0005, 0.0004, 0.0005],
3306
+ device='cuda:0'), in_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002],
3307
+ device='cuda:0'), out_proj_covar=tensor([1.3149e-06, 1.4818e-06, 1.2456e-06, 1.0687e-06, 1.2599e-06, 1.1230e-06,
3308
+ 1.4082e-06, 1.4798e-06], device='cuda:0')
3309
+ 2026-01-13 15:17:35,507 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.7513, 1.2714, 1.6827, 0.8309, 2.2770, 0.7309, 1.1813, 1.5920],
3310
+ device='cuda:0'), covar=tensor([0.0290, 0.0688, 0.0395, 0.0965, 0.0156, 0.1169, 0.0672, 0.0431],
3311
+ device='cuda:0'), in_proj_covar=tensor([0.0036, 0.0054, 0.0047, 0.0065, 0.0031, 0.0075, 0.0054, 0.0037],
3312
+ device='cuda:0'), out_proj_covar=tensor([3.8872e-05, 6.0020e-05, 5.2467e-05, 7.1910e-05, 3.2291e-05, 8.1494e-05,
3313
+ 6.0863e-05, 3.8959e-05], device='cuda:0')
3314
+ 2026-01-13 15:17:37,422 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([1.9756, 2.3014, 1.8785, 1.9919, 2.7530, 2.8748, 2.7916, 0.9127],
3315
+ device='cuda:0'), covar=tensor([0.1004, 0.0395, 0.1483, 0.0872, 0.0212, 0.0069, 0.0156, 0.1064],
3316
+ device='cuda:0'), in_proj_covar=tensor([0.0203, 0.0169, 0.0247, 0.0205, 0.0144, 0.0075, 0.0140, 0.0170],
3317
+ device='cuda:0'), out_proj_covar=tensor([1.8583e-04, 1.5656e-04, 2.2881e-04, 1.8385e-04, 1.3176e-04, 7.1733e-05,
3318
+ 1.2691e-04, 1.5374e-04], device='cuda:0')
3319
+ 2026-01-13 15:17:41,924 INFO [train.py:929] (0/2) Epoch 11, validation: loss=0.6568, simple_loss=0.6351, pruned_loss=0.3392, over 1639044.00 frames.
3320
+ 2026-01-13 15:17:41,925 INFO [train.py:930] (0/2) Maximum memory allocated so far is 5712MB
3321
+ 2026-01-13 15:17:54,459 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([3.1757, 3.3247, 2.0026, 3.4147, 3.0110, 1.6887, 2.7488, 1.0885],
3322
+ device='cuda:0'), covar=tensor([0.0086, 0.0056, 0.1263, 0.0092, 0.0081, 0.1657, 0.0468, 0.2255],
3323
+ device='cuda:0'), in_proj_covar=tensor([0.0085, 0.0084, 0.0199, 0.0088, 0.0087, 0.0197, 0.0157, 0.0223],
3324
+ device='cuda:0'), out_proj_covar=tensor([5.6477e-05, 5.5517e-05, 1.3236e-04, 5.9684e-05, 5.5819e-05, 1.3213e-04,
3325
+ 1.0460e-04, 1.4680e-04], device='cuda:0')
3326
+ 2026-01-13 15:17:58,496 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([0.9359, 3.0348, 2.7558, 2.0987, 2.8597, 3.0138, 1.5498, 1.2321],
3327
+ device='cuda:0'), covar=tensor([0.1110, 0.0089, 0.0090, 0.0316, 0.0067, 0.0076, 0.1274, 0.0731],
3328
+ device='cuda:0'), in_proj_covar=tensor([0.0173, 0.0097, 0.0098, 0.0136, 0.0096, 0.0095, 0.0223, 0.0165],
3329
+ device='cuda:0'), out_proj_covar=tensor([9.5510e-05, 3.5131e-05, 3.4325e-05, 5.8373e-05, 3.4619e-05, 3.3538e-05,
3330
+ 1.3469e-04, 7.9451e-05], device='cuda:0')
3331
+ 2026-01-13 15:18:06,264 INFO [zipformer.py:1188] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=18174.0, num_to_drop=0, layers_to_drop=set()
3332
+ 2026-01-13 15:18:06,779 INFO [zipformer.py:1188] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=18175.0, num_to_drop=0, layers_to_drop=set()
3333
+ 2026-01-13 15:18:09,888 INFO [train.py:895] (0/2) Epoch 11, batch 1650, loss[loss=0.3624, simple_loss=0.3702, pruned_loss=0.1773, over 2639.00 frames. ], tot_loss[loss=0.2624, simple_loss=0.3121, pruned_loss=0.1064, over 541761.87 frames. ], batch size: 26, lr: 1.48e-02, grad_scale: 16.0
3334
+ 2026-01-13 15:18:12,562 INFO [checkpoint.py:74] (0/2) Saving checkpoint to /kaggle/working/amharic_training/exp_amharic_streaming/epoch-11.pt
3335
+ 2026-01-13 15:18:30,639 INFO [train.py:895] (0/2) Epoch 12, batch 0, loss[loss=0.2605, simple_loss=0.3092, pruned_loss=0.1059, over 2652.00 frames. ], tot_loss[loss=0.2605, simple_loss=0.3092, pruned_loss=0.1059, over 2652.00 frames. ], batch size: 7, lr: 1.42e-02, grad_scale: 16.0
3336
+ 2026-01-13 15:18:30,640 INFO [train.py:920] (0/2) Computing validation loss
log/log-train-2026-01-13-11-44-05-1 CHANGED
@@ -3250,3 +3250,43 @@
3250
  2026-01-13 15:14:35,785 INFO [scaling.py:681] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.36 vs. limit=5.0
3251
  2026-01-13 15:14:39,867 INFO [train.py:895] (1/2) Epoch 11, batch 1450, loss[loss=0.2479, simple_loss=0.3073, pruned_loss=0.0943, over 2805.00 frames. ], tot_loss[loss=0.2434, simple_loss=0.2963, pruned_loss=0.09528, over 551718.88 frames. ], batch size: 10, lr: 1.49e-02, grad_scale: 16.0
3252
  2026-01-13 15:14:57,382 INFO [optim.py:365] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.061e+01 1.580e+02 2.093e+02 2.798e+02 1.380e+03, threshold=4.187e+02, percent-clipped=21.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3250
  2026-01-13 15:14:35,785 INFO [scaling.py:681] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.36 vs. limit=5.0
3251
  2026-01-13 15:14:39,867 INFO [train.py:895] (1/2) Epoch 11, batch 1450, loss[loss=0.2479, simple_loss=0.3073, pruned_loss=0.0943, over 2805.00 frames. ], tot_loss[loss=0.2434, simple_loss=0.2963, pruned_loss=0.09528, over 551718.88 frames. ], batch size: 10, lr: 1.49e-02, grad_scale: 16.0
3252
  2026-01-13 15:14:57,382 INFO [optim.py:365] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.061e+01 1.580e+02 2.093e+02 2.798e+02 1.380e+03, threshold=4.187e+02, percent-clipped=21.0
3253
+ 2026-01-13 15:15:04,041 INFO [zipformer.py:1188] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=18018.0, num_to_drop=0, layers_to_drop=set()
3254
+ 2026-01-13 15:15:05,743 INFO [zipformer.py:1188] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=18021.0, num_to_drop=0, layers_to_drop=set()
3255
+ 2026-01-13 15:15:11,527 INFO [train.py:895] (1/2) Epoch 11, batch 1500, loss[loss=0.2662, simple_loss=0.3097, pruned_loss=0.1114, over 2860.00 frames. ], tot_loss[loss=0.2441, simple_loss=0.2973, pruned_loss=0.09551, over 550702.71 frames. ], batch size: 9, lr: 1.49e-02, grad_scale: 16.0
3256
+ 2026-01-13 15:15:16,219 INFO [zipformer.py:1188] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=18039.0, num_to_drop=1, layers_to_drop={1}
3257
+ 2026-01-13 15:15:16,493 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.91 vs. limit=2.0
3258
+ 2026-01-13 15:15:26,062 INFO [zipformer.py:1188] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=18056.0, num_to_drop=0, layers_to_drop=set()
3259
+ 2026-01-13 15:15:31,752 INFO [zipformer.py:1188] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=18066.0, num_to_drop=0, layers_to_drop=set()
3260
+ 2026-01-13 15:15:40,456 INFO [train.py:895] (1/2) Epoch 11, batch 1550, loss[loss=0.2206, simple_loss=0.2821, pruned_loss=0.07955, over 2893.00 frames. ], tot_loss[loss=0.2482, simple_loss=0.2997, pruned_loss=0.09839, over 548151.95 frames. ], batch size: 10, lr: 1.48e-02, grad_scale: 16.0
3261
+ 2026-01-13 15:15:43,371 INFO [zipformer.py:1188] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=18086.0, num_to_drop=0, layers_to_drop=set()
3262
+ 2026-01-13 15:15:43,927 INFO [zipformer.py:1188] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=18087.0, num_to_drop=1, layers_to_drop={0}
3263
+ 2026-01-13 15:15:49,219 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([0.9140, 2.9155, 2.7161, 2.0169, 2.8192, 2.8680, 1.4251, 1.1707],
3264
+ device='cuda:1'), covar=tensor([0.0883, 0.0088, 0.0068, 0.0242, 0.0051, 0.0067, 0.1305, 0.0650],
3265
+ device='cuda:1'), in_proj_covar=tensor([0.0172, 0.0098, 0.0098, 0.0138, 0.0097, 0.0096, 0.0226, 0.0165],
3266
+ device='cuda:1'), out_proj_covar=tensor([9.4836e-05, 3.5444e-05, 3.4363e-05, 5.8766e-05, 3.4963e-05, 3.3682e-05,
3267
+ 1.3675e-04, 7.9486e-05], device='cuda:1')
3268
+ 2026-01-13 15:15:55,401 INFO [optim.py:365] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.747e+01 1.493e+02 2.017e+02 2.491e+02 8.387e+02, threshold=4.034e+02, percent-clipped=5.0
3269
+ 2026-01-13 15:16:09,378 INFO [train.py:895] (1/2) Epoch 11, batch 1600, loss[loss=0.2839, simple_loss=0.3059, pruned_loss=0.131, over 2679.00 frames. ], tot_loss[loss=0.2467, simple_loss=0.2989, pruned_loss=0.09729, over 544995.01 frames. ], batch size: 10, lr: 1.48e-02, grad_scale: 16.0
3270
+ 2026-01-13 15:16:09,379 INFO [train.py:920] (1/2) Computing validation loss
3271
+ 2026-01-13 15:17:08,867 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([2.7241, 2.5972, 1.5264, 2.7454, 2.4414, 1.2272, 2.4893, 0.9374],
3272
+ device='cuda:1'), covar=tensor([0.0170, 0.0143, 0.1735, 0.0185, 0.0199, 0.2289, 0.0527, 0.2682],
3273
+ device='cuda:1'), in_proj_covar=tensor([0.0084, 0.0083, 0.0195, 0.0087, 0.0086, 0.0193, 0.0154, 0.0220],
3274
+ device='cuda:1'), out_proj_covar=tensor([5.5789e-05, 5.4920e-05, 1.3020e-04, 5.8969e-05, 5.5130e-05, 1.2985e-04,
3275
+ 1.0298e-04, 1.4477e-04], device='cuda:1')
3276
+ 2026-01-13 15:17:29,329 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([2.1284, 2.3831, 1.9726, 2.1175, 2.9264, 3.0499, 2.9594, 0.9502],
3277
+ device='cuda:1'), covar=tensor([0.0945, 0.0494, 0.1567, 0.0900, 0.0256, 0.0080, 0.0153, 0.1156],
3278
+ device='cuda:1'), in_proj_covar=tensor([0.0203, 0.0169, 0.0247, 0.0205, 0.0144, 0.0075, 0.0140, 0.0170],
3279
+ device='cuda:1'), out_proj_covar=tensor([1.8583e-04, 1.5656e-04, 2.2881e-04, 1.8385e-04, 1.3176e-04, 7.1733e-05,
3280
+ 1.2691e-04, 1.5374e-04], device='cuda:1')
3281
+ 2026-01-13 15:17:41,924 INFO [train.py:929] (1/2) Epoch 11, validation: loss=0.6568, simple_loss=0.6351, pruned_loss=0.3392, over 1639044.00 frames.
3282
+ 2026-01-13 15:17:41,925 INFO [train.py:930] (1/2) Maximum memory allocated so far is 5734MB
3283
+ 2026-01-13 15:18:06,262 INFO [zipformer.py:1188] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=18174.0, num_to_drop=0, layers_to_drop=set()
3284
+ 2026-01-13 15:18:06,782 INFO [zipformer.py:1188] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=18175.0, num_to_drop=0, layers_to_drop=set()
3285
+ 2026-01-13 15:18:09,883 INFO [train.py:895] (1/2) Epoch 11, batch 1650, loss[loss=0.3172, simple_loss=0.3537, pruned_loss=0.1403, over 2393.00 frames. ], tot_loss[loss=0.2644, simple_loss=0.3137, pruned_loss=0.1076, over 540590.60 frames. ], batch size: 26, lr: 1.48e-02, grad_scale: 16.0
3286
+ 2026-01-13 15:18:30,644 INFO [train.py:895] (1/2) Epoch 12, batch 0, loss[loss=0.3139, simple_loss=0.3377, pruned_loss=0.1451, over 2653.00 frames. ], tot_loss[loss=0.3139, simple_loss=0.3377, pruned_loss=0.1451, over 2653.00 frames. ], batch size: 7, lr: 1.42e-02, grad_scale: 16.0
3287
+ 2026-01-13 15:18:30,644 INFO [train.py:920] (1/2) Computing validation loss
3288
+ 2026-01-13 15:19:02,630 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([1.8779, 2.3499, 1.8779, 1.6737, 2.8885, 3.0297, 2.8685, 0.9130],
3289
+ device='cuda:1'), covar=tensor([0.1259, 0.0434, 0.1722, 0.1344, 0.0237, 0.0085, 0.0178, 0.1386],
3290
+ device='cuda:1'), in_proj_covar=tensor([0.0204, 0.0170, 0.0245, 0.0205, 0.0144, 0.0075, 0.0140, 0.0170],
3291
+ device='cuda:1'), out_proj_covar=tensor([1.8661e-04, 1.5684e-04, 2.2741e-04, 1.8416e-04, 1.3187e-04, 7.2214e-05,
3292
+ 1.2695e-04, 1.5414e-04], device='cuda:1')
tensorboard/events.out.tfevents.1768304645.8e64ffbd666a.97184.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:58d658294c8bd10b242a3a499e39808d8c5fe7b937d5604a34f048734842709d
3
- size 177950
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0aac67798c6f1f04854dcb1f045d635bfc3ac6a7a9e1066cac922e66bbbba046
3
+ size 180527