projecti7 commited on
Commit
c8a9742
·
verified ·
1 Parent(s): 67fe1c3

Auto-sync checkpoint during training

Browse files
log/log-train-2026-01-13-11-39-59-0 CHANGED
@@ -130,3 +130,31 @@
130
  2026-01-13 11:41:52,100 INFO [train.py:929] (0/2) Epoch 1, validation: loss=8.282, simple_loss=7.526, pruned_loss=7.544, over 1639044.00 frames.
131
  2026-01-13 11:41:52,101 INFO [train.py:930] (0/2) Maximum memory allocated so far is 2324MB
132
  2026-01-13 11:41:53,682 INFO [zipformer.py:1188] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5.0, num_to_drop=2, layers_to_drop={0, 3}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
130
  2026-01-13 11:41:52,100 INFO [train.py:929] (0/2) Epoch 1, validation: loss=8.282, simple_loss=7.526, pruned_loss=7.544, over 1639044.00 frames.
131
  2026-01-13 11:41:52,101 INFO [train.py:930] (0/2) Maximum memory allocated so far is 2324MB
132
  2026-01-13 11:41:53,682 INFO [zipformer.py:1188] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5.0, num_to_drop=2, layers_to_drop={0, 3}
133
+ 2026-01-13 11:41:59,516 INFO [zipformer.py:1188] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=23.0, num_to_drop=1, layers_to_drop={1}
134
+ 2026-01-13 11:41:59,996 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=192, metric=14.47 vs. limit=2.0
135
+ 2026-01-13 11:42:08,502 INFO [train.py:895] (0/2) Epoch 1, batch 50, loss[loss=1.09, simple_loss=0.967, pruned_loss=1.097, over 1183.00 frames. ], tot_loss[loss=2.095, simple_loss=1.905, pruned_loss=1.827, over 59743.46 frames. ], batch size: 3, lr: 2.75e-02, grad_scale: 2.0
136
+ 2026-01-13 11:42:13,310 INFO [scaling.py:681] (0/2) Whitening: num_groups=1, num_channels=384, metric=92.52 vs. limit=5.0
137
+ 2026-01-13 11:42:17,641 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=192, metric=15.96 vs. limit=2.0
138
+ 2026-01-13 11:42:18,682 INFO [zipformer.py:1188] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=83.0, num_to_drop=1, layers_to_drop={1}
139
+ 2026-01-13 11:42:23,085 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=96, metric=7.36 vs. limit=2.0
140
+ 2026-01-13 11:42:24,180 INFO [optim.py:365] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.400e+00 1.757e+01 2.611e+01 8.809e+01 1.032e+03, threshold=5.221e+01, percent-clipped=0.0
141
+ 2026-01-13 11:42:24,220 INFO [train.py:895] (0/2) Epoch 1, batch 100, loss[loss=1.012, simple_loss=0.883, pruned_loss=1.038, over 1450.00 frames. ], tot_loss[loss=1.557, simple_loss=1.399, pruned_loss=1.438, over 105524.85 frames. ], batch size: 4, lr: 3.00e-02, grad_scale: 2.0
142
+ 2026-01-13 11:42:37,820 INFO [zipformer.py:1188] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=144.0, num_to_drop=2, layers_to_drop={0, 3}
143
+ 2026-01-13 11:42:37,888 INFO [scaling.py:681] (0/2) Whitening: num_groups=1, num_channels=384, metric=57.95 vs. limit=5.0
144
+ 2026-01-13 11:42:39,992 INFO [train.py:895] (0/2) Epoch 1, batch 150, loss[loss=1.137, simple_loss=0.9777, pruned_loss=1.168, over 1189.00 frames. ], tot_loss[loss=1.345, simple_loss=1.194, pruned_loss=1.291, over 138408.39 frames. ], batch size: 3, lr: 3.25e-02, grad_scale: 2.0
145
+ 2026-01-13 11:42:50,418 INFO [scaling.py:681] (0/2) Whitening: num_groups=1, num_channels=384, metric=34.60 vs. limit=5.0
146
+ 2026-01-13 11:42:56,336 INFO [optim.py:365] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.056e+01 1.349e+01 1.612e+01 1.843e+01 3.228e+01, threshold=3.224e+01, percent-clipped=0.0
147
+ 2026-01-13 11:42:56,376 INFO [train.py:895] (0/2) Epoch 1, batch 200, loss[loss=1.242, simple_loss=1.057, pruned_loss=1.254, over 1331.00 frames. ], tot_loss[loss=1.219, simple_loss=1.069, pruned_loss=1.198, over 165849.01 frames. ], batch size: 8, lr: 3.50e-02, grad_scale: 2.0
148
+ 2026-01-13 11:43:12,253 INFO [train.py:895] (0/2) Epoch 1, batch 250, loss[loss=0.9935, simple_loss=0.829, pruned_loss=1.017, over 1239.00 frames. ], tot_loss[loss=1.136, simple_loss=0.9853, pruned_loss=1.132, over 187031.74 frames. ], batch size: 5, lr: 3.75e-02, grad_scale: 2.0
149
+ 2026-01-13 11:43:13,949 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([4.0746, 4.0746, 4.0747, 4.0720, 4.0744, 4.0745, 4.0744, 4.0746],
150
+ device='cuda:0'), covar=tensor([0.0007, 0.0007, 0.0005, 0.0007, 0.0010, 0.0007, 0.0006, 0.0006],
151
+ device='cuda:0'), in_proj_covar=tensor([0.0008, 0.0008, 0.0008, 0.0008, 0.0008, 0.0008, 0.0008, 0.0008],
152
+ device='cuda:0'), out_proj_covar=tensor([7.9819e-06, 8.1444e-06, 8.0188e-06, 8.2155e-06, 7.9850e-06, 8.0954e-06,
153
+ 7.9568e-06, 8.0921e-06], device='cuda:0')
154
+ 2026-01-13 11:43:14,354 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=96, metric=2.34 vs. limit=2.0
155
+ 2026-01-13 11:43:22,519 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=96, metric=2.65 vs. limit=2.0
156
+ 2026-01-13 11:43:26,539 INFO [zipformer.py:1188] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=296.0, num_to_drop=1, layers_to_drop={1}
157
+ 2026-01-13 11:43:27,760 INFO [zipformer.py:1188] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=300.0, num_to_drop=2, layers_to_drop={1, 3}
158
+ 2026-01-13 11:43:27,774 INFO [train.py:1204] (0/2) Saving batch to /kaggle/working/amharic_training/exp_amharic_streaming/batch-bdd640fb-0667-1ad1-1c80-317fa3b1799d.pt
159
+ 2026-01-13 11:43:27,779 INFO [train.py:1210] (0/2) features shape: torch.Size([5, 1175, 80])
160
+ 2026-01-13 11:43:27,781 INFO [train.py:1214] (0/2) num tokens: 215
log/log-train-2026-01-13-11-39-59-1 CHANGED
@@ -130,3 +130,32 @@
130
  2026-01-13 11:41:52,102 INFO [train.py:929] (1/2) Epoch 1, validation: loss=8.282, simple_loss=7.526, pruned_loss=7.544, over 1639044.00 frames.
131
  2026-01-13 11:41:52,102 INFO [train.py:930] (1/2) Maximum memory allocated so far is 2315MB
132
  2026-01-13 11:41:53,682 INFO [zipformer.py:1188] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5.0, num_to_drop=2, layers_to_drop={0, 2}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
130
  2026-01-13 11:41:52,102 INFO [train.py:929] (1/2) Epoch 1, validation: loss=8.282, simple_loss=7.526, pruned_loss=7.544, over 1639044.00 frames.
131
  2026-01-13 11:41:52,102 INFO [train.py:930] (1/2) Maximum memory allocated so far is 2315MB
132
  2026-01-13 11:41:53,682 INFO [zipformer.py:1188] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5.0, num_to_drop=2, layers_to_drop={0, 2}
133
+ 2026-01-13 11:41:59,517 INFO [zipformer.py:1188] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=23.0, num_to_drop=1, layers_to_drop={0}
134
+ 2026-01-13 11:41:59,996 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=192, metric=12.59 vs. limit=2.0
135
+ 2026-01-13 11:42:08,502 INFO [train.py:895] (1/2) Epoch 1, batch 50, loss[loss=1.052, simple_loss=0.9326, pruned_loss=1.065, over 1185.00 frames. ], tot_loss[loss=2.132, simple_loss=1.939, pruned_loss=1.863, over 59802.58 frames. ], batch size: 3, lr: 2.75e-02, grad_scale: 2.0
136
+ 2026-01-13 11:42:13,276 INFO [scaling.py:681] (1/2) Whitening: num_groups=1, num_channels=384, metric=124.44 vs. limit=5.0
137
+ 2026-01-13 11:42:17,628 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=192, metric=16.82 vs. limit=2.0
138
+ 2026-01-13 11:42:18,680 INFO [zipformer.py:1188] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=83.0, num_to_drop=1, layers_to_drop={0}
139
+ 2026-01-13 11:42:23,167 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=96, metric=6.24 vs. limit=2.0
140
+ 2026-01-13 11:42:24,180 INFO [optim.py:365] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.400e+00 1.757e+01 2.611e+01 8.809e+01 1.032e+03, threshold=5.221e+01, percent-clipped=0.0
141
+ 2026-01-13 11:42:24,219 INFO [train.py:895] (1/2) Epoch 1, batch 100, loss[loss=0.9711, simple_loss=0.8458, pruned_loss=1.008, over 1447.00 frames. ], tot_loss[loss=1.558, simple_loss=1.4, pruned_loss=1.438, over 105356.72 frames. ], batch size: 4, lr: 3.00e-02, grad_scale: 2.0
142
+ 2026-01-13 11:42:33,042 INFO [scaling.py:681] (1/2) Whitening: num_groups=1, num_channels=384, metric=40.38 vs. limit=5.0
143
+ 2026-01-13 11:42:35,481 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([4.4032, 4.4030, 4.4031, 4.4031, 4.4032, 4.4032, 4.4032, 4.4032],
144
+ device='cuda:1'), covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0004, 0.0002, 0.0002, 0.0001],
145
+ device='cuda:1'), in_proj_covar=tensor([0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009],
146
+ device='cuda:1'), out_proj_covar=tensor([9.1589e-06, 8.9273e-06, 8.9333e-06, 8.9204e-06, 8.9799e-06, 8.8728e-06,
147
+ 8.9491e-06, 9.0549e-06], device='cuda:1')
148
+ 2026-01-13 11:42:37,791 INFO [zipformer.py:1188] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=144.0, num_to_drop=2, layers_to_drop={0, 2}
149
+ 2026-01-13 11:42:39,994 INFO [train.py:895] (1/2) Epoch 1, batch 150, loss[loss=0.9644, simple_loss=0.826, pruned_loss=1.01, over 1195.00 frames. ], tot_loss[loss=1.351, simple_loss=1.199, pruned_loss=1.296, over 138827.38 frames. ], batch size: 3, lr: 3.25e-02, grad_scale: 2.0
150
+ 2026-01-13 11:42:41,382 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=192, metric=7.40 vs. limit=2.0
151
+ 2026-01-13 11:42:44,688 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=96, metric=2.97 vs. limit=2.0
152
+ 2026-01-13 11:42:56,336 INFO [optim.py:365] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.056e+01 1.349e+01 1.612e+01 1.843e+01 3.228e+01, threshold=3.224e+01, percent-clipped=0.0
153
+ 2026-01-13 11:42:56,375 INFO [train.py:895] (1/2) Epoch 1, batch 200, loss[loss=1.097, simple_loss=0.9325, pruned_loss=1.111, over 1328.00 frames. ], tot_loss[loss=1.217, simple_loss=1.068, pruned_loss=1.195, over 165766.02 frames. ], batch size: 8, lr: 3.50e-02, grad_scale: 2.0
154
+ 2026-01-13 11:43:06,015 INFO [scaling.py:681] (1/2) Whitening: num_groups=1, num_channels=384, metric=28.31 vs. limit=5.0
155
+ 2026-01-13 11:43:12,253 INFO [train.py:895] (1/2) Epoch 1, batch 250, loss[loss=0.9016, simple_loss=0.7567, pruned_loss=0.9034, over 1249.00 frames. ], tot_loss[loss=1.129, simple_loss=0.9798, pruned_loss=1.122, over 187257.17 frames. ], batch size: 5, lr: 3.75e-02, grad_scale: 2.0
156
+ 2026-01-13 11:43:20,721 INFO [scaling.py:681] (1/2) Whitening: num_groups=1, num_channels=384, metric=30.61 vs. limit=5.0
157
+ 2026-01-13 11:43:26,539 INFO [zipformer.py:1188] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=296.0, num_to_drop=1, layers_to_drop={0}
158
+ 2026-01-13 11:43:27,770 INFO [zipformer.py:1188] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=300.0, num_to_drop=2, layers_to_drop={1, 2}
159
+ 2026-01-13 11:43:27,774 INFO [train.py:1204] (1/2) Saving batch to /kaggle/working/amharic_training/exp_amharic_streaming/batch-bdd640fb-0667-1ad1-1c80-317fa3b1799d.pt
160
+ 2026-01-13 11:43:27,779 INFO [train.py:1210] (1/2) features shape: torch.Size([5, 1166, 80])
161
+ 2026-01-13 11:43:27,780 INFO [train.py:1214] (1/2) num tokens: 234
log/log-train-2026-01-13-11-44-05-0 ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-01-13 11:44:05,656 INFO [train.py:967] (0/2) Training started
2
+ 2026-01-13 11:44:05,657 INFO [train.py:977] (0/2) Device: cuda:0
3
+ 2026-01-13 11:44:05,659 INFO [train.py:986] (0/2) {
4
+ "am_scale": 0.0,
5
+ "attention_dims": "192,192,192,192,192",
6
+ "average_period": 200,
7
+ "base_lr": 0.05,
8
+ "batch_idx_train": 0,
9
+ "best_train_epoch": -1,
10
+ "best_train_loss": Infinity,
11
+ "best_valid_epoch": -1,
12
+ "best_valid_loss": Infinity,
13
+ "blank_id": 0,
14
+ "bpe_model": "/kaggle/working/amharic_training/bpe/bpe.model",
15
+ "bucketing_sampler": true,
16
+ "cnn_module_kernels": "31,31,31,31,31",
17
+ "concatenate_cuts": false,
18
+ "context_size": 2,
19
+ "decode_chunk_len": 32,
20
+ "decoder_dim": 512,
21
+ "drop_last": true,
22
+ "duration_factor": 1.0,
23
+ "enable_musan": false,
24
+ "enable_spec_aug": true,
25
+ "encoder_dims": "384,384,384,384,384",
26
+ "encoder_unmasked_dims": "256,256,256,256,256",
27
+ "env_info": {
28
+ "IP address": "172.19.2.2",
29
+ "hostname": "8e64ffbd666a",
30
+ "icefall-git-branch": "master",
31
+ "icefall-git-date": "Fri Nov 28 03:42:20 2025",
32
+ "icefall-git-sha1": "0904e490-dirty",
33
+ "icefall-path": "/kaggle/working/icefall",
34
+ "k2-build-type": "Release",
35
+ "k2-git-date": "Thu Jul 25 03:34:26 2024",
36
+ "k2-git-sha1": "40e8d1676f6062e46458dc32ad21229c93cc9c50",
37
+ "k2-path": "/usr/local/lib/python3.12/dist-packages/k2/__init__.py",
38
+ "k2-version": "1.24.4",
39
+ "k2-with-cuda": true,
40
+ "lhotse-path": "/usr/local/lib/python3.12/dist-packages/lhotse/__init__.py",
41
+ "lhotse-version": "1.32.1",
42
+ "python-version": "3.12",
43
+ "torch-cuda-available": true,
44
+ "torch-cuda-version": "12.1",
45
+ "torch-version": "2.4.0+cu121"
46
+ },
47
+ "exp_dir": "/kaggle/working/amharic_training/exp_amharic_streaming",
48
+ "feature_dim": 80,
49
+ "feedforward_dims": "1024,1024,2048,2048,1024",
50
+ "full_libri": false,
51
+ "gap": 1.0,
52
+ "inf_check": false,
53
+ "input_strategy": "PrecomputedFeatures",
54
+ "joiner_dim": 512,
55
+ "keep_last_k": 5,
56
+ "lm_scale": 0.25,
57
+ "log_interval": 50,
58
+ "lr_batches": 5000,
59
+ "lr_epochs": 3.5,
60
+ "manifest_dir": "/kaggle/working/amharic_training/manifests",
61
+ "master_port": 12354,
62
+ "max_duration": 120,
63
+ "mini_libri": false,
64
+ "nhead": "8,8,8,8,8",
65
+ "num_buckets": 30,
66
+ "num_encoder_layers": "2,4,3,2,4",
67
+ "num_epochs": 50,
68
+ "num_left_chunks": 4,
69
+ "num_workers": 2,
70
+ "on_the_fly_feats": false,
71
+ "print_diagnostics": false,
72
+ "prune_range": 5,
73
+ "reset_interval": 200,
74
+ "return_cuts": true,
75
+ "save_every_n": 1000,
76
+ "seed": 42,
77
+ "short_chunk_size": 50,
78
+ "shuffle": true,
79
+ "simple_loss_scale": 0.5,
80
+ "spec_aug_time_warp_factor": 80,
81
+ "start_batch": 0,
82
+ "start_epoch": 1,
83
+ "subsampling_factor": 4,
84
+ "tensorboard": true,
85
+ "use_fp16": true,
86
+ "valid_interval": 1600,
87
+ "vocab_size": 1000,
88
+ "warm_step": 2000,
89
+ "world_size": 2,
90
+ "zipformer_downsampling_factors": "1,2,4,8,2"
91
+ }
92
+ 2026-01-13 11:44:05,660 INFO [train.py:988] (0/2) About to create model
93
+ 2026-01-13 11:44:06,275 INFO [zipformer.py:405] (0/2) At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8.
94
+ 2026-01-13 11:44:06,292 INFO [train.py:992] (0/2) Number of model parameters: 71330891
95
+ 2026-01-13 11:44:07,086 INFO [train.py:1007] (0/2) Using DDP
96
+ 2026-01-13 11:44:08,761 INFO [asr_datamodule.py:422] (0/2) About to get train-clean-100 cuts
97
+ 2026-01-13 11:44:08,762 INFO [asr_datamodule.py:239] (0/2) Disable MUSAN
98
+ 2026-01-13 11:44:08,762 INFO [asr_datamodule.py:257] (0/2) Enable SpecAugment
99
+ 2026-01-13 11:44:08,762 INFO [asr_datamodule.py:258] (0/2) Time warp factor: 80
100
+ 2026-01-13 11:44:08,762 INFO [asr_datamodule.py:268] (0/2) Num frame mask: 10
101
+ 2026-01-13 11:44:08,762 INFO [asr_datamodule.py:281] (0/2) About to create train dataset
102
+ 2026-01-13 11:44:08,762 INFO [asr_datamodule.py:308] (0/2) Using DynamicBucketingSampler.
103
+ 2026-01-13 11:44:09,150 INFO [asr_datamodule.py:324] (0/2) About to create train dataloader
104
+ 2026-01-13 11:44:09,151 INFO [asr_datamodule.py:460] (0/2) About to get dev-clean cuts
105
+ 2026-01-13 11:44:09,151 INFO [asr_datamodule.py:467] (0/2) About to get dev-other cuts
106
+ 2026-01-13 11:44:09,152 INFO [asr_datamodule.py:355] (0/2) About to create dev dataset
107
+ 2026-01-13 11:44:09,528 INFO [asr_datamodule.py:372] (0/2) About to create dev dataloader
108
+ 2026-01-13 11:44:25,314 INFO [train.py:895] (0/2) Epoch 1, batch 0, loss[loss=8.165, simple_loss=7.427, pruned_loss=7.363, over 2638.00 frames. ], tot_loss[loss=8.165, simple_loss=7.427, pruned_loss=7.363, over 2638.00 frames. ], batch size: 7, lr: 2.50e-02, grad_scale: 2.0
109
+ 2026-01-13 11:44:25,315 INFO [train.py:920] (0/2) Computing validation loss
log/log-train-2026-01-13-11-44-05-1 ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-01-13 11:44:05,744 INFO [train.py:967] (1/2) Training started
2
+ 2026-01-13 11:44:05,744 INFO [train.py:977] (1/2) Device: cuda:1
3
+ 2026-01-13 11:44:05,746 INFO [train.py:986] (1/2) {
4
+ "am_scale": 0.0,
5
+ "attention_dims": "192,192,192,192,192",
6
+ "average_period": 200,
7
+ "base_lr": 0.05,
8
+ "batch_idx_train": 0,
9
+ "best_train_epoch": -1,
10
+ "best_train_loss": Infinity,
11
+ "best_valid_epoch": -1,
12
+ "best_valid_loss": Infinity,
13
+ "blank_id": 0,
14
+ "bpe_model": "/kaggle/working/amharic_training/bpe/bpe.model",
15
+ "bucketing_sampler": true,
16
+ "cnn_module_kernels": "31,31,31,31,31",
17
+ "concatenate_cuts": false,
18
+ "context_size": 2,
19
+ "decode_chunk_len": 32,
20
+ "decoder_dim": 512,
21
+ "drop_last": true,
22
+ "duration_factor": 1.0,
23
+ "enable_musan": false,
24
+ "enable_spec_aug": true,
25
+ "encoder_dims": "384,384,384,384,384",
26
+ "encoder_unmasked_dims": "256,256,256,256,256",
27
+ "env_info": {
28
+ "IP address": "172.19.2.2",
29
+ "hostname": "8e64ffbd666a",
30
+ "icefall-git-branch": "master",
31
+ "icefall-git-date": "Fri Nov 28 03:42:20 2025",
32
+ "icefall-git-sha1": "0904e490-dirty",
33
+ "icefall-path": "/kaggle/working/icefall",
34
+ "k2-build-type": "Release",
35
+ "k2-git-date": "Thu Jul 25 03:34:26 2024",
36
+ "k2-git-sha1": "40e8d1676f6062e46458dc32ad21229c93cc9c50",
37
+ "k2-path": "/usr/local/lib/python3.12/dist-packages/k2/__init__.py",
38
+ "k2-version": "1.24.4",
39
+ "k2-with-cuda": true,
40
+ "lhotse-path": "/usr/local/lib/python3.12/dist-packages/lhotse/__init__.py",
41
+ "lhotse-version": "1.32.1",
42
+ "python-version": "3.12",
43
+ "torch-cuda-available": true,
44
+ "torch-cuda-version": "12.1",
45
+ "torch-version": "2.4.0+cu121"
46
+ },
47
+ "exp_dir": "/kaggle/working/amharic_training/exp_amharic_streaming",
48
+ "feature_dim": 80,
49
+ "feedforward_dims": "1024,1024,2048,2048,1024",
50
+ "full_libri": false,
51
+ "gap": 1.0,
52
+ "inf_check": false,
53
+ "input_strategy": "PrecomputedFeatures",
54
+ "joiner_dim": 512,
55
+ "keep_last_k": 5,
56
+ "lm_scale": 0.25,
57
+ "log_interval": 50,
58
+ "lr_batches": 5000,
59
+ "lr_epochs": 3.5,
60
+ "manifest_dir": "/kaggle/working/amharic_training/manifests",
61
+ "master_port": 12354,
62
+ "max_duration": 120,
63
+ "mini_libri": false,
64
+ "nhead": "8,8,8,8,8",
65
+ "num_buckets": 30,
66
+ "num_encoder_layers": "2,4,3,2,4",
67
+ "num_epochs": 50,
68
+ "num_left_chunks": 4,
69
+ "num_workers": 2,
70
+ "on_the_fly_feats": false,
71
+ "print_diagnostics": false,
72
+ "prune_range": 5,
73
+ "reset_interval": 200,
74
+ "return_cuts": true,
75
+ "save_every_n": 1000,
76
+ "seed": 42,
77
+ "short_chunk_size": 50,
78
+ "shuffle": true,
79
+ "simple_loss_scale": 0.5,
80
+ "spec_aug_time_warp_factor": 80,
81
+ "start_batch": 0,
82
+ "start_epoch": 1,
83
+ "subsampling_factor": 4,
84
+ "tensorboard": true,
85
+ "use_fp16": true,
86
+ "valid_interval": 1600,
87
+ "vocab_size": 1000,
88
+ "warm_step": 2000,
89
+ "world_size": 2,
90
+ "zipformer_downsampling_factors": "1,2,4,8,2"
91
+ }
92
+ 2026-01-13 11:44:05,747 INFO [train.py:988] (1/2) About to create model
93
+ 2026-01-13 11:44:06,369 INFO [zipformer.py:405] (1/2) At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8.
94
+ 2026-01-13 11:44:06,387 INFO [train.py:992] (1/2) Number of model parameters: 71330891
95
+ 2026-01-13 11:44:06,499 INFO [train.py:1007] (1/2) Using DDP
96
+ 2026-01-13 11:44:08,475 INFO [asr_datamodule.py:422] (1/2) About to get train-clean-100 cuts
97
+ 2026-01-13 11:44:08,477 INFO [asr_datamodule.py:239] (1/2) Disable MUSAN
98
+ 2026-01-13 11:44:08,477 INFO [asr_datamodule.py:257] (1/2) Enable SpecAugment
99
+ 2026-01-13 11:44:08,477 INFO [asr_datamodule.py:258] (1/2) Time warp factor: 80
100
+ 2026-01-13 11:44:08,477 INFO [asr_datamodule.py:268] (1/2) Num frame mask: 10
101
+ 2026-01-13 11:44:08,477 INFO [asr_datamodule.py:281] (1/2) About to create train dataset
102
+ 2026-01-13 11:44:08,477 INFO [asr_datamodule.py:308] (1/2) Using DynamicBucketingSampler.
103
+ 2026-01-13 11:44:08,786 INFO [asr_datamodule.py:324] (1/2) About to create train dataloader
104
+ 2026-01-13 11:44:08,787 INFO [asr_datamodule.py:460] (1/2) About to get dev-clean cuts
105
+ 2026-01-13 11:44:08,787 INFO [asr_datamodule.py:467] (1/2) About to get dev-other cuts
106
+ 2026-01-13 11:44:08,788 INFO [asr_datamodule.py:355] (1/2) About to create dev dataset
107
+ 2026-01-13 11:44:08,987 INFO [asr_datamodule.py:372] (1/2) About to create dev dataloader
108
+ 2026-01-13 11:44:25,300 INFO [train.py:895] (1/2) Epoch 1, batch 0, loss[loss=8.191, simple_loss=7.455, pruned_loss=7.342, over 2645.00 frames. ], tot_loss[loss=8.191, simple_loss=7.455, pruned_loss=7.342, over 2645.00 frames. ], batch size: 7, lr: 2.50e-02, grad_scale: 2.0
109
+ 2026-01-13 11:44:25,301 INFO [train.py:920] (1/2) Computing validation loss
tensorboard/events.out.tfevents.1768304399.8e64ffbd666a.89842.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c442e4b3caa961aabf0cf0648922497b41c46a5d62275c905e25c979200de295
3
- size 88
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f2cefcf8eee1857f93ae1490b5b219b6444c207845afdfa964db6184e5d16fcf
3
+ size 774
tensorboard/events.out.tfevents.1768304645.8e64ffbd666a.97184.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8455acb370010541dec439561e446c3686a62027b987a5c0a492e45cc6facc6d
3
+ size 88