Auto-sync checkpoint during training
Browse files- log/log-train-2026-01-13-11-39-59-0 +28 -0
- log/log-train-2026-01-13-11-39-59-1 +29 -0
- log/log-train-2026-01-13-11-44-05-0 +109 -0
- log/log-train-2026-01-13-11-44-05-1 +109 -0
- tensorboard/events.out.tfevents.1768304399.8e64ffbd666a.89842.0 +2 -2
- tensorboard/events.out.tfevents.1768304645.8e64ffbd666a.97184.0 +3 -0
log/log-train-2026-01-13-11-39-59-0
CHANGED
|
@@ -130,3 +130,31 @@
|
|
| 130 |
2026-01-13 11:41:52,100 INFO [train.py:929] (0/2) Epoch 1, validation: loss=8.282, simple_loss=7.526, pruned_loss=7.544, over 1639044.00 frames.
|
| 131 |
2026-01-13 11:41:52,101 INFO [train.py:930] (0/2) Maximum memory allocated so far is 2324MB
|
| 132 |
2026-01-13 11:41:53,682 INFO [zipformer.py:1188] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5.0, num_to_drop=2, layers_to_drop={0, 3}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 130 |
2026-01-13 11:41:52,100 INFO [train.py:929] (0/2) Epoch 1, validation: loss=8.282, simple_loss=7.526, pruned_loss=7.544, over 1639044.00 frames.
|
| 131 |
2026-01-13 11:41:52,101 INFO [train.py:930] (0/2) Maximum memory allocated so far is 2324MB
|
| 132 |
2026-01-13 11:41:53,682 INFO [zipformer.py:1188] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5.0, num_to_drop=2, layers_to_drop={0, 3}
|
| 133 |
+
2026-01-13 11:41:59,516 INFO [zipformer.py:1188] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=23.0, num_to_drop=1, layers_to_drop={1}
|
| 134 |
+
2026-01-13 11:41:59,996 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=192, metric=14.47 vs. limit=2.0
|
| 135 |
+
2026-01-13 11:42:08,502 INFO [train.py:895] (0/2) Epoch 1, batch 50, loss[loss=1.09, simple_loss=0.967, pruned_loss=1.097, over 1183.00 frames. ], tot_loss[loss=2.095, simple_loss=1.905, pruned_loss=1.827, over 59743.46 frames. ], batch size: 3, lr: 2.75e-02, grad_scale: 2.0
|
| 136 |
+
2026-01-13 11:42:13,310 INFO [scaling.py:681] (0/2) Whitening: num_groups=1, num_channels=384, metric=92.52 vs. limit=5.0
|
| 137 |
+
2026-01-13 11:42:17,641 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=192, metric=15.96 vs. limit=2.0
|
| 138 |
+
2026-01-13 11:42:18,682 INFO [zipformer.py:1188] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=83.0, num_to_drop=1, layers_to_drop={1}
|
| 139 |
+
2026-01-13 11:42:23,085 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=96, metric=7.36 vs. limit=2.0
|
| 140 |
+
2026-01-13 11:42:24,180 INFO [optim.py:365] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.400e+00 1.757e+01 2.611e+01 8.809e+01 1.032e+03, threshold=5.221e+01, percent-clipped=0.0
|
| 141 |
+
2026-01-13 11:42:24,220 INFO [train.py:895] (0/2) Epoch 1, batch 100, loss[loss=1.012, simple_loss=0.883, pruned_loss=1.038, over 1450.00 frames. ], tot_loss[loss=1.557, simple_loss=1.399, pruned_loss=1.438, over 105524.85 frames. ], batch size: 4, lr: 3.00e-02, grad_scale: 2.0
|
| 142 |
+
2026-01-13 11:42:37,820 INFO [zipformer.py:1188] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=144.0, num_to_drop=2, layers_to_drop={0, 3}
|
| 143 |
+
2026-01-13 11:42:37,888 INFO [scaling.py:681] (0/2) Whitening: num_groups=1, num_channels=384, metric=57.95 vs. limit=5.0
|
| 144 |
+
2026-01-13 11:42:39,992 INFO [train.py:895] (0/2) Epoch 1, batch 150, loss[loss=1.137, simple_loss=0.9777, pruned_loss=1.168, over 1189.00 frames. ], tot_loss[loss=1.345, simple_loss=1.194, pruned_loss=1.291, over 138408.39 frames. ], batch size: 3, lr: 3.25e-02, grad_scale: 2.0
|
| 145 |
+
2026-01-13 11:42:50,418 INFO [scaling.py:681] (0/2) Whitening: num_groups=1, num_channels=384, metric=34.60 vs. limit=5.0
|
| 146 |
+
2026-01-13 11:42:56,336 INFO [optim.py:365] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.056e+01 1.349e+01 1.612e+01 1.843e+01 3.228e+01, threshold=3.224e+01, percent-clipped=0.0
|
| 147 |
+
2026-01-13 11:42:56,376 INFO [train.py:895] (0/2) Epoch 1, batch 200, loss[loss=1.242, simple_loss=1.057, pruned_loss=1.254, over 1331.00 frames. ], tot_loss[loss=1.219, simple_loss=1.069, pruned_loss=1.198, over 165849.01 frames. ], batch size: 8, lr: 3.50e-02, grad_scale: 2.0
|
| 148 |
+
2026-01-13 11:43:12,253 INFO [train.py:895] (0/2) Epoch 1, batch 250, loss[loss=0.9935, simple_loss=0.829, pruned_loss=1.017, over 1239.00 frames. ], tot_loss[loss=1.136, simple_loss=0.9853, pruned_loss=1.132, over 187031.74 frames. ], batch size: 5, lr: 3.75e-02, grad_scale: 2.0
|
| 149 |
+
2026-01-13 11:43:13,949 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([4.0746, 4.0746, 4.0747, 4.0720, 4.0744, 4.0745, 4.0744, 4.0746],
|
| 150 |
+
device='cuda:0'), covar=tensor([0.0007, 0.0007, 0.0005, 0.0007, 0.0010, 0.0007, 0.0006, 0.0006],
|
| 151 |
+
device='cuda:0'), in_proj_covar=tensor([0.0008, 0.0008, 0.0008, 0.0008, 0.0008, 0.0008, 0.0008, 0.0008],
|
| 152 |
+
device='cuda:0'), out_proj_covar=tensor([7.9819e-06, 8.1444e-06, 8.0188e-06, 8.2155e-06, 7.9850e-06, 8.0954e-06,
|
| 153 |
+
7.9568e-06, 8.0921e-06], device='cuda:0')
|
| 154 |
+
2026-01-13 11:43:14,354 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=96, metric=2.34 vs. limit=2.0
|
| 155 |
+
2026-01-13 11:43:22,519 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=96, metric=2.65 vs. limit=2.0
|
| 156 |
+
2026-01-13 11:43:26,539 INFO [zipformer.py:1188] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=296.0, num_to_drop=1, layers_to_drop={1}
|
| 157 |
+
2026-01-13 11:43:27,760 INFO [zipformer.py:1188] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=300.0, num_to_drop=2, layers_to_drop={1, 3}
|
| 158 |
+
2026-01-13 11:43:27,774 INFO [train.py:1204] (0/2) Saving batch to /kaggle/working/amharic_training/exp_amharic_streaming/batch-bdd640fb-0667-1ad1-1c80-317fa3b1799d.pt
|
| 159 |
+
2026-01-13 11:43:27,779 INFO [train.py:1210] (0/2) features shape: torch.Size([5, 1175, 80])
|
| 160 |
+
2026-01-13 11:43:27,781 INFO [train.py:1214] (0/2) num tokens: 215
|
log/log-train-2026-01-13-11-39-59-1
CHANGED
|
@@ -130,3 +130,32 @@
|
|
| 130 |
2026-01-13 11:41:52,102 INFO [train.py:929] (1/2) Epoch 1, validation: loss=8.282, simple_loss=7.526, pruned_loss=7.544, over 1639044.00 frames.
|
| 131 |
2026-01-13 11:41:52,102 INFO [train.py:930] (1/2) Maximum memory allocated so far is 2315MB
|
| 132 |
2026-01-13 11:41:53,682 INFO [zipformer.py:1188] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5.0, num_to_drop=2, layers_to_drop={0, 2}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 130 |
2026-01-13 11:41:52,102 INFO [train.py:929] (1/2) Epoch 1, validation: loss=8.282, simple_loss=7.526, pruned_loss=7.544, over 1639044.00 frames.
|
| 131 |
2026-01-13 11:41:52,102 INFO [train.py:930] (1/2) Maximum memory allocated so far is 2315MB
|
| 132 |
2026-01-13 11:41:53,682 INFO [zipformer.py:1188] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5.0, num_to_drop=2, layers_to_drop={0, 2}
|
| 133 |
+
2026-01-13 11:41:59,517 INFO [zipformer.py:1188] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=23.0, num_to_drop=1, layers_to_drop={0}
|
| 134 |
+
2026-01-13 11:41:59,996 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=192, metric=12.59 vs. limit=2.0
|
| 135 |
+
2026-01-13 11:42:08,502 INFO [train.py:895] (1/2) Epoch 1, batch 50, loss[loss=1.052, simple_loss=0.9326, pruned_loss=1.065, over 1185.00 frames. ], tot_loss[loss=2.132, simple_loss=1.939, pruned_loss=1.863, over 59802.58 frames. ], batch size: 3, lr: 2.75e-02, grad_scale: 2.0
|
| 136 |
+
2026-01-13 11:42:13,276 INFO [scaling.py:681] (1/2) Whitening: num_groups=1, num_channels=384, metric=124.44 vs. limit=5.0
|
| 137 |
+
2026-01-13 11:42:17,628 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=192, metric=16.82 vs. limit=2.0
|
| 138 |
+
2026-01-13 11:42:18,680 INFO [zipformer.py:1188] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=83.0, num_to_drop=1, layers_to_drop={0}
|
| 139 |
+
2026-01-13 11:42:23,167 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=96, metric=6.24 vs. limit=2.0
|
| 140 |
+
2026-01-13 11:42:24,180 INFO [optim.py:365] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.400e+00 1.757e+01 2.611e+01 8.809e+01 1.032e+03, threshold=5.221e+01, percent-clipped=0.0
|
| 141 |
+
2026-01-13 11:42:24,219 INFO [train.py:895] (1/2) Epoch 1, batch 100, loss[loss=0.9711, simple_loss=0.8458, pruned_loss=1.008, over 1447.00 frames. ], tot_loss[loss=1.558, simple_loss=1.4, pruned_loss=1.438, over 105356.72 frames. ], batch size: 4, lr: 3.00e-02, grad_scale: 2.0
|
| 142 |
+
2026-01-13 11:42:33,042 INFO [scaling.py:681] (1/2) Whitening: num_groups=1, num_channels=384, metric=40.38 vs. limit=5.0
|
| 143 |
+
2026-01-13 11:42:35,481 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([4.4032, 4.4030, 4.4031, 4.4031, 4.4032, 4.4032, 4.4032, 4.4032],
|
| 144 |
+
device='cuda:1'), covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0004, 0.0002, 0.0002, 0.0001],
|
| 145 |
+
device='cuda:1'), in_proj_covar=tensor([0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009],
|
| 146 |
+
device='cuda:1'), out_proj_covar=tensor([9.1589e-06, 8.9273e-06, 8.9333e-06, 8.9204e-06, 8.9799e-06, 8.8728e-06,
|
| 147 |
+
8.9491e-06, 9.0549e-06], device='cuda:1')
|
| 148 |
+
2026-01-13 11:42:37,791 INFO [zipformer.py:1188] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=144.0, num_to_drop=2, layers_to_drop={0, 2}
|
| 149 |
+
2026-01-13 11:42:39,994 INFO [train.py:895] (1/2) Epoch 1, batch 150, loss[loss=0.9644, simple_loss=0.826, pruned_loss=1.01, over 1195.00 frames. ], tot_loss[loss=1.351, simple_loss=1.199, pruned_loss=1.296, over 138827.38 frames. ], batch size: 3, lr: 3.25e-02, grad_scale: 2.0
|
| 150 |
+
2026-01-13 11:42:41,382 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=192, metric=7.40 vs. limit=2.0
|
| 151 |
+
2026-01-13 11:42:44,688 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=96, metric=2.97 vs. limit=2.0
|
| 152 |
+
2026-01-13 11:42:56,336 INFO [optim.py:365] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.056e+01 1.349e+01 1.612e+01 1.843e+01 3.228e+01, threshold=3.224e+01, percent-clipped=0.0
|
| 153 |
+
2026-01-13 11:42:56,375 INFO [train.py:895] (1/2) Epoch 1, batch 200, loss[loss=1.097, simple_loss=0.9325, pruned_loss=1.111, over 1328.00 frames. ], tot_loss[loss=1.217, simple_loss=1.068, pruned_loss=1.195, over 165766.02 frames. ], batch size: 8, lr: 3.50e-02, grad_scale: 2.0
|
| 154 |
+
2026-01-13 11:43:06,015 INFO [scaling.py:681] (1/2) Whitening: num_groups=1, num_channels=384, metric=28.31 vs. limit=5.0
|
| 155 |
+
2026-01-13 11:43:12,253 INFO [train.py:895] (1/2) Epoch 1, batch 250, loss[loss=0.9016, simple_loss=0.7567, pruned_loss=0.9034, over 1249.00 frames. ], tot_loss[loss=1.129, simple_loss=0.9798, pruned_loss=1.122, over 187257.17 frames. ], batch size: 5, lr: 3.75e-02, grad_scale: 2.0
|
| 156 |
+
2026-01-13 11:43:20,721 INFO [scaling.py:681] (1/2) Whitening: num_groups=1, num_channels=384, metric=30.61 vs. limit=5.0
|
| 157 |
+
2026-01-13 11:43:26,539 INFO [zipformer.py:1188] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=296.0, num_to_drop=1, layers_to_drop={0}
|
| 158 |
+
2026-01-13 11:43:27,770 INFO [zipformer.py:1188] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=300.0, num_to_drop=2, layers_to_drop={1, 2}
|
| 159 |
+
2026-01-13 11:43:27,774 INFO [train.py:1204] (1/2) Saving batch to /kaggle/working/amharic_training/exp_amharic_streaming/batch-bdd640fb-0667-1ad1-1c80-317fa3b1799d.pt
|
| 160 |
+
2026-01-13 11:43:27,779 INFO [train.py:1210] (1/2) features shape: torch.Size([5, 1166, 80])
|
| 161 |
+
2026-01-13 11:43:27,780 INFO [train.py:1214] (1/2) num tokens: 234
|
log/log-train-2026-01-13-11-44-05-0
ADDED
|
@@ -0,0 +1,109 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
2026-01-13 11:44:05,656 INFO [train.py:967] (0/2) Training started
|
| 2 |
+
2026-01-13 11:44:05,657 INFO [train.py:977] (0/2) Device: cuda:0
|
| 3 |
+
2026-01-13 11:44:05,659 INFO [train.py:986] (0/2) {
|
| 4 |
+
"am_scale": 0.0,
|
| 5 |
+
"attention_dims": "192,192,192,192,192",
|
| 6 |
+
"average_period": 200,
|
| 7 |
+
"base_lr": 0.05,
|
| 8 |
+
"batch_idx_train": 0,
|
| 9 |
+
"best_train_epoch": -1,
|
| 10 |
+
"best_train_loss": Infinity,
|
| 11 |
+
"best_valid_epoch": -1,
|
| 12 |
+
"best_valid_loss": Infinity,
|
| 13 |
+
"blank_id": 0,
|
| 14 |
+
"bpe_model": "/kaggle/working/amharic_training/bpe/bpe.model",
|
| 15 |
+
"bucketing_sampler": true,
|
| 16 |
+
"cnn_module_kernels": "31,31,31,31,31",
|
| 17 |
+
"concatenate_cuts": false,
|
| 18 |
+
"context_size": 2,
|
| 19 |
+
"decode_chunk_len": 32,
|
| 20 |
+
"decoder_dim": 512,
|
| 21 |
+
"drop_last": true,
|
| 22 |
+
"duration_factor": 1.0,
|
| 23 |
+
"enable_musan": false,
|
| 24 |
+
"enable_spec_aug": true,
|
| 25 |
+
"encoder_dims": "384,384,384,384,384",
|
| 26 |
+
"encoder_unmasked_dims": "256,256,256,256,256",
|
| 27 |
+
"env_info": {
|
| 28 |
+
"IP address": "172.19.2.2",
|
| 29 |
+
"hostname": "8e64ffbd666a",
|
| 30 |
+
"icefall-git-branch": "master",
|
| 31 |
+
"icefall-git-date": "Fri Nov 28 03:42:20 2025",
|
| 32 |
+
"icefall-git-sha1": "0904e490-dirty",
|
| 33 |
+
"icefall-path": "/kaggle/working/icefall",
|
| 34 |
+
"k2-build-type": "Release",
|
| 35 |
+
"k2-git-date": "Thu Jul 25 03:34:26 2024",
|
| 36 |
+
"k2-git-sha1": "40e8d1676f6062e46458dc32ad21229c93cc9c50",
|
| 37 |
+
"k2-path": "/usr/local/lib/python3.12/dist-packages/k2/__init__.py",
|
| 38 |
+
"k2-version": "1.24.4",
|
| 39 |
+
"k2-with-cuda": true,
|
| 40 |
+
"lhotse-path": "/usr/local/lib/python3.12/dist-packages/lhotse/__init__.py",
|
| 41 |
+
"lhotse-version": "1.32.1",
|
| 42 |
+
"python-version": "3.12",
|
| 43 |
+
"torch-cuda-available": true,
|
| 44 |
+
"torch-cuda-version": "12.1",
|
| 45 |
+
"torch-version": "2.4.0+cu121"
|
| 46 |
+
},
|
| 47 |
+
"exp_dir": "/kaggle/working/amharic_training/exp_amharic_streaming",
|
| 48 |
+
"feature_dim": 80,
|
| 49 |
+
"feedforward_dims": "1024,1024,2048,2048,1024",
|
| 50 |
+
"full_libri": false,
|
| 51 |
+
"gap": 1.0,
|
| 52 |
+
"inf_check": false,
|
| 53 |
+
"input_strategy": "PrecomputedFeatures",
|
| 54 |
+
"joiner_dim": 512,
|
| 55 |
+
"keep_last_k": 5,
|
| 56 |
+
"lm_scale": 0.25,
|
| 57 |
+
"log_interval": 50,
|
| 58 |
+
"lr_batches": 5000,
|
| 59 |
+
"lr_epochs": 3.5,
|
| 60 |
+
"manifest_dir": "/kaggle/working/amharic_training/manifests",
|
| 61 |
+
"master_port": 12354,
|
| 62 |
+
"max_duration": 120,
|
| 63 |
+
"mini_libri": false,
|
| 64 |
+
"nhead": "8,8,8,8,8",
|
| 65 |
+
"num_buckets": 30,
|
| 66 |
+
"num_encoder_layers": "2,4,3,2,4",
|
| 67 |
+
"num_epochs": 50,
|
| 68 |
+
"num_left_chunks": 4,
|
| 69 |
+
"num_workers": 2,
|
| 70 |
+
"on_the_fly_feats": false,
|
| 71 |
+
"print_diagnostics": false,
|
| 72 |
+
"prune_range": 5,
|
| 73 |
+
"reset_interval": 200,
|
| 74 |
+
"return_cuts": true,
|
| 75 |
+
"save_every_n": 1000,
|
| 76 |
+
"seed": 42,
|
| 77 |
+
"short_chunk_size": 50,
|
| 78 |
+
"shuffle": true,
|
| 79 |
+
"simple_loss_scale": 0.5,
|
| 80 |
+
"spec_aug_time_warp_factor": 80,
|
| 81 |
+
"start_batch": 0,
|
| 82 |
+
"start_epoch": 1,
|
| 83 |
+
"subsampling_factor": 4,
|
| 84 |
+
"tensorboard": true,
|
| 85 |
+
"use_fp16": true,
|
| 86 |
+
"valid_interval": 1600,
|
| 87 |
+
"vocab_size": 1000,
|
| 88 |
+
"warm_step": 2000,
|
| 89 |
+
"world_size": 2,
|
| 90 |
+
"zipformer_downsampling_factors": "1,2,4,8,2"
|
| 91 |
+
}
|
| 92 |
+
2026-01-13 11:44:05,660 INFO [train.py:988] (0/2) About to create model
|
| 93 |
+
2026-01-13 11:44:06,275 INFO [zipformer.py:405] (0/2) At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8.
|
| 94 |
+
2026-01-13 11:44:06,292 INFO [train.py:992] (0/2) Number of model parameters: 71330891
|
| 95 |
+
2026-01-13 11:44:07,086 INFO [train.py:1007] (0/2) Using DDP
|
| 96 |
+
2026-01-13 11:44:08,761 INFO [asr_datamodule.py:422] (0/2) About to get train-clean-100 cuts
|
| 97 |
+
2026-01-13 11:44:08,762 INFO [asr_datamodule.py:239] (0/2) Disable MUSAN
|
| 98 |
+
2026-01-13 11:44:08,762 INFO [asr_datamodule.py:257] (0/2) Enable SpecAugment
|
| 99 |
+
2026-01-13 11:44:08,762 INFO [asr_datamodule.py:258] (0/2) Time warp factor: 80
|
| 100 |
+
2026-01-13 11:44:08,762 INFO [asr_datamodule.py:268] (0/2) Num frame mask: 10
|
| 101 |
+
2026-01-13 11:44:08,762 INFO [asr_datamodule.py:281] (0/2) About to create train dataset
|
| 102 |
+
2026-01-13 11:44:08,762 INFO [asr_datamodule.py:308] (0/2) Using DynamicBucketingSampler.
|
| 103 |
+
2026-01-13 11:44:09,150 INFO [asr_datamodule.py:324] (0/2) About to create train dataloader
|
| 104 |
+
2026-01-13 11:44:09,151 INFO [asr_datamodule.py:460] (0/2) About to get dev-clean cuts
|
| 105 |
+
2026-01-13 11:44:09,151 INFO [asr_datamodule.py:467] (0/2) About to get dev-other cuts
|
| 106 |
+
2026-01-13 11:44:09,152 INFO [asr_datamodule.py:355] (0/2) About to create dev dataset
|
| 107 |
+
2026-01-13 11:44:09,528 INFO [asr_datamodule.py:372] (0/2) About to create dev dataloader
|
| 108 |
+
2026-01-13 11:44:25,314 INFO [train.py:895] (0/2) Epoch 1, batch 0, loss[loss=8.165, simple_loss=7.427, pruned_loss=7.363, over 2638.00 frames. ], tot_loss[loss=8.165, simple_loss=7.427, pruned_loss=7.363, over 2638.00 frames. ], batch size: 7, lr: 2.50e-02, grad_scale: 2.0
|
| 109 |
+
2026-01-13 11:44:25,315 INFO [train.py:920] (0/2) Computing validation loss
|
log/log-train-2026-01-13-11-44-05-1
ADDED
|
@@ -0,0 +1,109 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
2026-01-13 11:44:05,744 INFO [train.py:967] (1/2) Training started
|
| 2 |
+
2026-01-13 11:44:05,744 INFO [train.py:977] (1/2) Device: cuda:1
|
| 3 |
+
2026-01-13 11:44:05,746 INFO [train.py:986] (1/2) {
|
| 4 |
+
"am_scale": 0.0,
|
| 5 |
+
"attention_dims": "192,192,192,192,192",
|
| 6 |
+
"average_period": 200,
|
| 7 |
+
"base_lr": 0.05,
|
| 8 |
+
"batch_idx_train": 0,
|
| 9 |
+
"best_train_epoch": -1,
|
| 10 |
+
"best_train_loss": Infinity,
|
| 11 |
+
"best_valid_epoch": -1,
|
| 12 |
+
"best_valid_loss": Infinity,
|
| 13 |
+
"blank_id": 0,
|
| 14 |
+
"bpe_model": "/kaggle/working/amharic_training/bpe/bpe.model",
|
| 15 |
+
"bucketing_sampler": true,
|
| 16 |
+
"cnn_module_kernels": "31,31,31,31,31",
|
| 17 |
+
"concatenate_cuts": false,
|
| 18 |
+
"context_size": 2,
|
| 19 |
+
"decode_chunk_len": 32,
|
| 20 |
+
"decoder_dim": 512,
|
| 21 |
+
"drop_last": true,
|
| 22 |
+
"duration_factor": 1.0,
|
| 23 |
+
"enable_musan": false,
|
| 24 |
+
"enable_spec_aug": true,
|
| 25 |
+
"encoder_dims": "384,384,384,384,384",
|
| 26 |
+
"encoder_unmasked_dims": "256,256,256,256,256",
|
| 27 |
+
"env_info": {
|
| 28 |
+
"IP address": "172.19.2.2",
|
| 29 |
+
"hostname": "8e64ffbd666a",
|
| 30 |
+
"icefall-git-branch": "master",
|
| 31 |
+
"icefall-git-date": "Fri Nov 28 03:42:20 2025",
|
| 32 |
+
"icefall-git-sha1": "0904e490-dirty",
|
| 33 |
+
"icefall-path": "/kaggle/working/icefall",
|
| 34 |
+
"k2-build-type": "Release",
|
| 35 |
+
"k2-git-date": "Thu Jul 25 03:34:26 2024",
|
| 36 |
+
"k2-git-sha1": "40e8d1676f6062e46458dc32ad21229c93cc9c50",
|
| 37 |
+
"k2-path": "/usr/local/lib/python3.12/dist-packages/k2/__init__.py",
|
| 38 |
+
"k2-version": "1.24.4",
|
| 39 |
+
"k2-with-cuda": true,
|
| 40 |
+
"lhotse-path": "/usr/local/lib/python3.12/dist-packages/lhotse/__init__.py",
|
| 41 |
+
"lhotse-version": "1.32.1",
|
| 42 |
+
"python-version": "3.12",
|
| 43 |
+
"torch-cuda-available": true,
|
| 44 |
+
"torch-cuda-version": "12.1",
|
| 45 |
+
"torch-version": "2.4.0+cu121"
|
| 46 |
+
},
|
| 47 |
+
"exp_dir": "/kaggle/working/amharic_training/exp_amharic_streaming",
|
| 48 |
+
"feature_dim": 80,
|
| 49 |
+
"feedforward_dims": "1024,1024,2048,2048,1024",
|
| 50 |
+
"full_libri": false,
|
| 51 |
+
"gap": 1.0,
|
| 52 |
+
"inf_check": false,
|
| 53 |
+
"input_strategy": "PrecomputedFeatures",
|
| 54 |
+
"joiner_dim": 512,
|
| 55 |
+
"keep_last_k": 5,
|
| 56 |
+
"lm_scale": 0.25,
|
| 57 |
+
"log_interval": 50,
|
| 58 |
+
"lr_batches": 5000,
|
| 59 |
+
"lr_epochs": 3.5,
|
| 60 |
+
"manifest_dir": "/kaggle/working/amharic_training/manifests",
|
| 61 |
+
"master_port": 12354,
|
| 62 |
+
"max_duration": 120,
|
| 63 |
+
"mini_libri": false,
|
| 64 |
+
"nhead": "8,8,8,8,8",
|
| 65 |
+
"num_buckets": 30,
|
| 66 |
+
"num_encoder_layers": "2,4,3,2,4",
|
| 67 |
+
"num_epochs": 50,
|
| 68 |
+
"num_left_chunks": 4,
|
| 69 |
+
"num_workers": 2,
|
| 70 |
+
"on_the_fly_feats": false,
|
| 71 |
+
"print_diagnostics": false,
|
| 72 |
+
"prune_range": 5,
|
| 73 |
+
"reset_interval": 200,
|
| 74 |
+
"return_cuts": true,
|
| 75 |
+
"save_every_n": 1000,
|
| 76 |
+
"seed": 42,
|
| 77 |
+
"short_chunk_size": 50,
|
| 78 |
+
"shuffle": true,
|
| 79 |
+
"simple_loss_scale": 0.5,
|
| 80 |
+
"spec_aug_time_warp_factor": 80,
|
| 81 |
+
"start_batch": 0,
|
| 82 |
+
"start_epoch": 1,
|
| 83 |
+
"subsampling_factor": 4,
|
| 84 |
+
"tensorboard": true,
|
| 85 |
+
"use_fp16": true,
|
| 86 |
+
"valid_interval": 1600,
|
| 87 |
+
"vocab_size": 1000,
|
| 88 |
+
"warm_step": 2000,
|
| 89 |
+
"world_size": 2,
|
| 90 |
+
"zipformer_downsampling_factors": "1,2,4,8,2"
|
| 91 |
+
}
|
| 92 |
+
2026-01-13 11:44:05,747 INFO [train.py:988] (1/2) About to create model
|
| 93 |
+
2026-01-13 11:44:06,369 INFO [zipformer.py:405] (1/2) At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8.
|
| 94 |
+
2026-01-13 11:44:06,387 INFO [train.py:992] (1/2) Number of model parameters: 71330891
|
| 95 |
+
2026-01-13 11:44:06,499 INFO [train.py:1007] (1/2) Using DDP
|
| 96 |
+
2026-01-13 11:44:08,475 INFO [asr_datamodule.py:422] (1/2) About to get train-clean-100 cuts
|
| 97 |
+
2026-01-13 11:44:08,477 INFO [asr_datamodule.py:239] (1/2) Disable MUSAN
|
| 98 |
+
2026-01-13 11:44:08,477 INFO [asr_datamodule.py:257] (1/2) Enable SpecAugment
|
| 99 |
+
2026-01-13 11:44:08,477 INFO [asr_datamodule.py:258] (1/2) Time warp factor: 80
|
| 100 |
+
2026-01-13 11:44:08,477 INFO [asr_datamodule.py:268] (1/2) Num frame mask: 10
|
| 101 |
+
2026-01-13 11:44:08,477 INFO [asr_datamodule.py:281] (1/2) About to create train dataset
|
| 102 |
+
2026-01-13 11:44:08,477 INFO [asr_datamodule.py:308] (1/2) Using DynamicBucketingSampler.
|
| 103 |
+
2026-01-13 11:44:08,786 INFO [asr_datamodule.py:324] (1/2) About to create train dataloader
|
| 104 |
+
2026-01-13 11:44:08,787 INFO [asr_datamodule.py:460] (1/2) About to get dev-clean cuts
|
| 105 |
+
2026-01-13 11:44:08,787 INFO [asr_datamodule.py:467] (1/2) About to get dev-other cuts
|
| 106 |
+
2026-01-13 11:44:08,788 INFO [asr_datamodule.py:355] (1/2) About to create dev dataset
|
| 107 |
+
2026-01-13 11:44:08,987 INFO [asr_datamodule.py:372] (1/2) About to create dev dataloader
|
| 108 |
+
2026-01-13 11:44:25,300 INFO [train.py:895] (1/2) Epoch 1, batch 0, loss[loss=8.191, simple_loss=7.455, pruned_loss=7.342, over 2645.00 frames. ], tot_loss[loss=8.191, simple_loss=7.455, pruned_loss=7.342, over 2645.00 frames. ], batch size: 7, lr: 2.50e-02, grad_scale: 2.0
|
| 109 |
+
2026-01-13 11:44:25,301 INFO [train.py:920] (1/2) Computing validation loss
|
tensorboard/events.out.tfevents.1768304399.8e64ffbd666a.89842.0
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f2cefcf8eee1857f93ae1490b5b219b6444c207845afdfa964db6184e5d16fcf
|
| 3 |
+
size 774
|
tensorboard/events.out.tfevents.1768304645.8e64ffbd666a.97184.0
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8455acb370010541dec439561e446c3686a62027b987a5c0a492e45cc6facc6d
|
| 3 |
+
size 88
|