Auto-sync checkpoint during training

Browse files

Files changed (6) hide show

log/log-train-2026-01-13-11-39-59-0 +28 -0
log/log-train-2026-01-13-11-39-59-1 +29 -0
log/log-train-2026-01-13-11-44-05-0 +109 -0
log/log-train-2026-01-13-11-44-05-1 +109 -0
tensorboard/events.out.tfevents.1768304399.8e64ffbd666a.89842.0 +2 -2
tensorboard/events.out.tfevents.1768304645.8e64ffbd666a.97184.0 +3 -0

log/log-train-2026-01-13-11-39-59-0 CHANGED Viewed

@@ -130,3 +130,31 @@
 2026-01-13 11:41:52,100 INFO [train.py:929] (0/2) Epoch 1, validation: loss=8.282, simple_loss=7.526, pruned_loss=7.544, over 1639044.00 frames.
 2026-01-13 11:41:52,101 INFO [train.py:930] (0/2) Maximum memory allocated so far is 2324MB
 2026-01-13 11:41:53,682 INFO [zipformer.py:1188] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5.0, num_to_drop=2, layers_to_drop={0, 3}

 2026-01-13 11:41:52,100 INFO [train.py:929] (0/2) Epoch 1, validation: loss=8.282, simple_loss=7.526, pruned_loss=7.544, over 1639044.00 frames.
 2026-01-13 11:41:52,101 INFO [train.py:930] (0/2) Maximum memory allocated so far is 2324MB
 2026-01-13 11:41:53,682 INFO [zipformer.py:1188] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5.0, num_to_drop=2, layers_to_drop={0, 3}
+2026-01-13 11:41:59,516 INFO [zipformer.py:1188] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=23.0, num_to_drop=1, layers_to_drop={1}
+2026-01-13 11:41:59,996 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=192, metric=14.47 vs. limit=2.0
+2026-01-13 11:42:08,502 INFO [train.py:895] (0/2) Epoch 1, batch 50, loss[loss=1.09, simple_loss=0.967, pruned_loss=1.097, over 1183.00 frames. ], tot_loss[loss=2.095, simple_loss=1.905, pruned_loss=1.827, over 59743.46 frames. ], batch size: 3, lr: 2.75e-02, grad_scale: 2.0
+2026-01-13 11:42:13,310 INFO [scaling.py:681] (0/2) Whitening: num_groups=1, num_channels=384, metric=92.52 vs. limit=5.0
+2026-01-13 11:42:17,641 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=192, metric=15.96 vs. limit=2.0
+2026-01-13 11:42:18,682 INFO [zipformer.py:1188] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=83.0, num_to_drop=1, layers_to_drop={1}
+2026-01-13 11:42:23,085 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=96, metric=7.36 vs. limit=2.0
+2026-01-13 11:42:24,180 INFO [optim.py:365] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.400e+00 1.757e+01 2.611e+01 8.809e+01 1.032e+03, threshold=5.221e+01, percent-clipped=0.0
+2026-01-13 11:42:24,220 INFO [train.py:895] (0/2) Epoch 1, batch 100, loss[loss=1.012, simple_loss=0.883, pruned_loss=1.038, over 1450.00 frames. ], tot_loss[loss=1.557, simple_loss=1.399, pruned_loss=1.438, over 105524.85 frames. ], batch size: 4, lr: 3.00e-02, grad_scale: 2.0
+2026-01-13 11:42:37,820 INFO [zipformer.py:1188] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=144.0, num_to_drop=2, layers_to_drop={0, 3}
+2026-01-13 11:42:37,888 INFO [scaling.py:681] (0/2) Whitening: num_groups=1, num_channels=384, metric=57.95 vs. limit=5.0
+2026-01-13 11:42:39,992 INFO [train.py:895] (0/2) Epoch 1, batch 150, loss[loss=1.137, simple_loss=0.9777, pruned_loss=1.168, over 1189.00 frames. ], tot_loss[loss=1.345, simple_loss=1.194, pruned_loss=1.291, over 138408.39 frames. ], batch size: 3, lr: 3.25e-02, grad_scale: 2.0
+2026-01-13 11:42:50,418 INFO [scaling.py:681] (0/2) Whitening: num_groups=1, num_channels=384, metric=34.60 vs. limit=5.0
+2026-01-13 11:42:56,336 INFO [optim.py:365] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.056e+01 1.349e+01 1.612e+01 1.843e+01 3.228e+01, threshold=3.224e+01, percent-clipped=0.0
+2026-01-13 11:42:56,376 INFO [train.py:895] (0/2) Epoch 1, batch 200, loss[loss=1.242, simple_loss=1.057, pruned_loss=1.254, over 1331.00 frames. ], tot_loss[loss=1.219, simple_loss=1.069, pruned_loss=1.198, over 165849.01 frames. ], batch size: 8, lr: 3.50e-02, grad_scale: 2.0
+2026-01-13 11:43:12,253 INFO [train.py:895] (0/2) Epoch 1, batch 250, loss[loss=0.9935, simple_loss=0.829, pruned_loss=1.017, over 1239.00 frames. ], tot_loss[loss=1.136, simple_loss=0.9853, pruned_loss=1.132, over 187031.74 frames. ], batch size: 5, lr: 3.75e-02, grad_scale: 2.0
+2026-01-13 11:43:13,949 INFO [zipformer.py:2441] (0/2) attn_weights_entropy = tensor([4.0746, 4.0746, 4.0747, 4.0720, 4.0744, 4.0745, 4.0744, 4.0746],
+       device='cuda:0'), covar=tensor([0.0007, 0.0007, 0.0005, 0.0007, 0.0010, 0.0007, 0.0006, 0.0006],
+       device='cuda:0'), in_proj_covar=tensor([0.0008, 0.0008, 0.0008, 0.0008, 0.0008, 0.0008, 0.0008, 0.0008],
+       device='cuda:0'), out_proj_covar=tensor([7.9819e-06, 8.1444e-06, 8.0188e-06, 8.2155e-06, 7.9850e-06, 8.0954e-06,
+        7.9568e-06, 8.0921e-06], device='cuda:0')
+2026-01-13 11:43:14,354 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=96, metric=2.34 vs. limit=2.0
+2026-01-13 11:43:22,519 INFO [scaling.py:681] (0/2) Whitening: num_groups=8, num_channels=96, metric=2.65 vs. limit=2.0
+2026-01-13 11:43:26,539 INFO [zipformer.py:1188] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=296.0, num_to_drop=1, layers_to_drop={1}
+2026-01-13 11:43:27,760 INFO [zipformer.py:1188] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=300.0, num_to_drop=2, layers_to_drop={1, 3}
+2026-01-13 11:43:27,774 INFO [train.py:1204] (0/2) Saving batch to /kaggle/working/amharic_training/exp_amharic_streaming/batch-bdd640fb-0667-1ad1-1c80-317fa3b1799d.pt
+2026-01-13 11:43:27,779 INFO [train.py:1210] (0/2) features shape: torch.Size([5, 1175, 80])
+2026-01-13 11:43:27,781 INFO [train.py:1214] (0/2) num tokens: 215

log/log-train-2026-01-13-11-39-59-1 CHANGED Viewed

@@ -130,3 +130,32 @@
 2026-01-13 11:41:52,102 INFO [train.py:929] (1/2) Epoch 1, validation: loss=8.282, simple_loss=7.526, pruned_loss=7.544, over 1639044.00 frames.
 2026-01-13 11:41:52,102 INFO [train.py:930] (1/2) Maximum memory allocated so far is 2315MB
 2026-01-13 11:41:53,682 INFO [zipformer.py:1188] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5.0, num_to_drop=2, layers_to_drop={0, 2}

 2026-01-13 11:41:52,102 INFO [train.py:929] (1/2) Epoch 1, validation: loss=8.282, simple_loss=7.526, pruned_loss=7.544, over 1639044.00 frames.
 2026-01-13 11:41:52,102 INFO [train.py:930] (1/2) Maximum memory allocated so far is 2315MB
 2026-01-13 11:41:53,682 INFO [zipformer.py:1188] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5.0, num_to_drop=2, layers_to_drop={0, 2}
+2026-01-13 11:41:59,517 INFO [zipformer.py:1188] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=23.0, num_to_drop=1, layers_to_drop={0}
+2026-01-13 11:41:59,996 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=192, metric=12.59 vs. limit=2.0
+2026-01-13 11:42:08,502 INFO [train.py:895] (1/2) Epoch 1, batch 50, loss[loss=1.052, simple_loss=0.9326, pruned_loss=1.065, over 1185.00 frames. ], tot_loss[loss=2.132, simple_loss=1.939, pruned_loss=1.863, over 59802.58 frames. ], batch size: 3, lr: 2.75e-02, grad_scale: 2.0
+2026-01-13 11:42:13,276 INFO [scaling.py:681] (1/2) Whitening: num_groups=1, num_channels=384, metric=124.44 vs. limit=5.0
+2026-01-13 11:42:17,628 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=192, metric=16.82 vs. limit=2.0
+2026-01-13 11:42:18,680 INFO [zipformer.py:1188] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=83.0, num_to_drop=1, layers_to_drop={0}
+2026-01-13 11:42:23,167 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=96, metric=6.24 vs. limit=2.0
+2026-01-13 11:42:24,180 INFO [optim.py:365] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.400e+00 1.757e+01 2.611e+01 8.809e+01 1.032e+03, threshold=5.221e+01, percent-clipped=0.0
+2026-01-13 11:42:24,219 INFO [train.py:895] (1/2) Epoch 1, batch 100, loss[loss=0.9711, simple_loss=0.8458, pruned_loss=1.008, over 1447.00 frames. ], tot_loss[loss=1.558, simple_loss=1.4, pruned_loss=1.438, over 105356.72 frames. ], batch size: 4, lr: 3.00e-02, grad_scale: 2.0
+2026-01-13 11:42:33,042 INFO [scaling.py:681] (1/2) Whitening: num_groups=1, num_channels=384, metric=40.38 vs. limit=5.0
+2026-01-13 11:42:35,481 INFO [zipformer.py:2441] (1/2) attn_weights_entropy = tensor([4.4032, 4.4030, 4.4031, 4.4031, 4.4032, 4.4032, 4.4032, 4.4032],
+       device='cuda:1'), covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0004, 0.0002, 0.0002, 0.0001],
+       device='cuda:1'), in_proj_covar=tensor([0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009],
+       device='cuda:1'), out_proj_covar=tensor([9.1589e-06, 8.9273e-06, 8.9333e-06, 8.9204e-06, 8.9799e-06, 8.8728e-06,
+        8.9491e-06, 9.0549e-06], device='cuda:1')
+2026-01-13 11:42:37,791 INFO [zipformer.py:1188] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=144.0, num_to_drop=2, layers_to_drop={0, 2}
+2026-01-13 11:42:39,994 INFO [train.py:895] (1/2) Epoch 1, batch 150, loss[loss=0.9644, simple_loss=0.826, pruned_loss=1.01, over 1195.00 frames. ], tot_loss[loss=1.351, simple_loss=1.199, pruned_loss=1.296, over 138827.38 frames. ], batch size: 3, lr: 3.25e-02, grad_scale: 2.0
+2026-01-13 11:42:41,382 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=192, metric=7.40 vs. limit=2.0
+2026-01-13 11:42:44,688 INFO [scaling.py:681] (1/2) Whitening: num_groups=8, num_channels=96, metric=2.97 vs. limit=2.0
+2026-01-13 11:42:56,336 INFO [optim.py:365] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.056e+01 1.349e+01 1.612e+01 1.843e+01 3.228e+01, threshold=3.224e+01, percent-clipped=0.0
+2026-01-13 11:42:56,375 INFO [train.py:895] (1/2) Epoch 1, batch 200, loss[loss=1.097, simple_loss=0.9325, pruned_loss=1.111, over 1328.00 frames. ], tot_loss[loss=1.217, simple_loss=1.068, pruned_loss=1.195, over 165766.02 frames. ], batch size: 8, lr: 3.50e-02, grad_scale: 2.0
+2026-01-13 11:43:06,015 INFO [scaling.py:681] (1/2) Whitening: num_groups=1, num_channels=384, metric=28.31 vs. limit=5.0
+2026-01-13 11:43:12,253 INFO [train.py:895] (1/2) Epoch 1, batch 250, loss[loss=0.9016, simple_loss=0.7567, pruned_loss=0.9034, over 1249.00 frames. ], tot_loss[loss=1.129, simple_loss=0.9798, pruned_loss=1.122, over 187257.17 frames. ], batch size: 5, lr: 3.75e-02, grad_scale: 2.0
+2026-01-13 11:43:20,721 INFO [scaling.py:681] (1/2) Whitening: num_groups=1, num_channels=384, metric=30.61 vs. limit=5.0
+2026-01-13 11:43:26,539 INFO [zipformer.py:1188] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=296.0, num_to_drop=1, layers_to_drop={0}
+2026-01-13 11:43:27,770 INFO [zipformer.py:1188] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=300.0, num_to_drop=2, layers_to_drop={1, 2}
+2026-01-13 11:43:27,774 INFO [train.py:1204] (1/2) Saving batch to /kaggle/working/amharic_training/exp_amharic_streaming/batch-bdd640fb-0667-1ad1-1c80-317fa3b1799d.pt
+2026-01-13 11:43:27,779 INFO [train.py:1210] (1/2) features shape: torch.Size([5, 1166, 80])
+2026-01-13 11:43:27,780 INFO [train.py:1214] (1/2) num tokens: 234

log/log-train-2026-01-13-11-44-05-0 ADDED Viewed

	@@ -0,0 +1,109 @@

+2026-01-13 11:44:05,656 INFO [train.py:967] (0/2) Training started
+2026-01-13 11:44:05,657 INFO [train.py:977] (0/2) Device: cuda:0
+2026-01-13 11:44:05,659 INFO [train.py:986] (0/2) {
+  "am_scale": 0.0,
+  "attention_dims": "192,192,192,192,192",
+  "average_period": 200,
+  "base_lr": 0.05,
+  "batch_idx_train": 0,
+  "best_train_epoch": -1,
+  "best_train_loss": Infinity,
+  "best_valid_epoch": -1,
+  "best_valid_loss": Infinity,
+  "blank_id": 0,
+  "bpe_model": "/kaggle/working/amharic_training/bpe/bpe.model",
+  "bucketing_sampler": true,
+  "cnn_module_kernels": "31,31,31,31,31",
+  "concatenate_cuts": false,
+  "context_size": 2,
+  "decode_chunk_len": 32,
+  "decoder_dim": 512,
+  "drop_last": true,
+  "duration_factor": 1.0,
+  "enable_musan": false,
+  "enable_spec_aug": true,
+  "encoder_dims": "384,384,384,384,384",
+  "encoder_unmasked_dims": "256,256,256,256,256",
+  "env_info": {
+    "IP address": "172.19.2.2",
+    "hostname": "8e64ffbd666a",
+    "icefall-git-branch": "master",
+    "icefall-git-date": "Fri Nov 28 03:42:20 2025",
+    "icefall-git-sha1": "0904e490-dirty",
+    "icefall-path": "/kaggle/working/icefall",
+    "k2-build-type": "Release",
+    "k2-git-date": "Thu Jul 25 03:34:26 2024",
+    "k2-git-sha1": "40e8d1676f6062e46458dc32ad21229c93cc9c50",
+    "k2-path": "/usr/local/lib/python3.12/dist-packages/k2/__init__.py",
+    "k2-version": "1.24.4",
+    "k2-with-cuda": true,
+    "lhotse-path": "/usr/local/lib/python3.12/dist-packages/lhotse/__init__.py",
+    "lhotse-version": "1.32.1",
+    "python-version": "3.12",
+    "torch-cuda-available": true,
+    "torch-cuda-version": "12.1",
+    "torch-version": "2.4.0+cu121"
+  },
+  "exp_dir": "/kaggle/working/amharic_training/exp_amharic_streaming",
+  "feature_dim": 80,
+  "feedforward_dims": "1024,1024,2048,2048,1024",
+  "full_libri": false,
+  "gap": 1.0,
+  "inf_check": false,
+  "input_strategy": "PrecomputedFeatures",
+  "joiner_dim": 512,
+  "keep_last_k": 5,
+  "lm_scale": 0.25,
+  "log_interval": 50,
+  "lr_batches": 5000,
+  "lr_epochs": 3.5,
+  "manifest_dir": "/kaggle/working/amharic_training/manifests",
+  "master_port": 12354,
+  "max_duration": 120,
+  "mini_libri": false,
+  "nhead": "8,8,8,8,8",
+  "num_buckets": 30,
+  "num_encoder_layers": "2,4,3,2,4",
+  "num_epochs": 50,
+  "num_left_chunks": 4,
+  "num_workers": 2,
+  "on_the_fly_feats": false,
+  "print_diagnostics": false,
+  "prune_range": 5,
+  "reset_interval": 200,
+  "return_cuts": true,
+  "save_every_n": 1000,
+  "seed": 42,
+  "short_chunk_size": 50,
+  "shuffle": true,
+  "simple_loss_scale": 0.5,
+  "spec_aug_time_warp_factor": 80,
+  "start_batch": 0,
+  "start_epoch": 1,
+  "subsampling_factor": 4,
+  "tensorboard": true,
+  "use_fp16": true,
+  "valid_interval": 1600,
+  "vocab_size": 1000,
+  "warm_step": 2000,
+  "world_size": 2,
+  "zipformer_downsampling_factors": "1,2,4,8,2"
+}
+2026-01-13 11:44:05,660 INFO [train.py:988] (0/2) About to create model
+2026-01-13 11:44:06,275 INFO [zipformer.py:405] (0/2) At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8.
+2026-01-13 11:44:06,292 INFO [train.py:992] (0/2) Number of model parameters: 71330891
+2026-01-13 11:44:07,086 INFO [train.py:1007] (0/2) Using DDP
+2026-01-13 11:44:08,761 INFO [asr_datamodule.py:422] (0/2) About to get train-clean-100 cuts
+2026-01-13 11:44:08,762 INFO [asr_datamodule.py:239] (0/2) Disable MUSAN
+2026-01-13 11:44:08,762 INFO [asr_datamodule.py:257] (0/2) Enable SpecAugment
+2026-01-13 11:44:08,762 INFO [asr_datamodule.py:258] (0/2) Time warp factor: 80
+2026-01-13 11:44:08,762 INFO [asr_datamodule.py:268] (0/2) Num frame mask: 10
+2026-01-13 11:44:08,762 INFO [asr_datamodule.py:281] (0/2) About to create train dataset
+2026-01-13 11:44:08,762 INFO [asr_datamodule.py:308] (0/2) Using DynamicBucketingSampler.
+2026-01-13 11:44:09,150 INFO [asr_datamodule.py:324] (0/2) About to create train dataloader
+2026-01-13 11:44:09,151 INFO [asr_datamodule.py:460] (0/2) About to get dev-clean cuts
+2026-01-13 11:44:09,151 INFO [asr_datamodule.py:467] (0/2) About to get dev-other cuts
+2026-01-13 11:44:09,152 INFO [asr_datamodule.py:355] (0/2) About to create dev dataset
+2026-01-13 11:44:09,528 INFO [asr_datamodule.py:372] (0/2) About to create dev dataloader
+2026-01-13 11:44:25,314 INFO [train.py:895] (0/2) Epoch 1, batch 0, loss[loss=8.165, simple_loss=7.427, pruned_loss=7.363, over 2638.00 frames. ], tot_loss[loss=8.165, simple_loss=7.427, pruned_loss=7.363, over 2638.00 frames. ], batch size: 7, lr: 2.50e-02, grad_scale: 2.0
+2026-01-13 11:44:25,315 INFO [train.py:920] (0/2) Computing validation loss

log/log-train-2026-01-13-11-44-05-1 ADDED Viewed

	@@ -0,0 +1,109 @@

+2026-01-13 11:44:05,744 INFO [train.py:967] (1/2) Training started
+2026-01-13 11:44:05,744 INFO [train.py:977] (1/2) Device: cuda:1
+2026-01-13 11:44:05,746 INFO [train.py:986] (1/2) {
+  "am_scale": 0.0,
+  "attention_dims": "192,192,192,192,192",
+  "average_period": 200,
+  "base_lr": 0.05,
+  "batch_idx_train": 0,
+  "best_train_epoch": -1,
+  "best_train_loss": Infinity,
+  "best_valid_epoch": -1,
+  "best_valid_loss": Infinity,
+  "blank_id": 0,
+  "bpe_model": "/kaggle/working/amharic_training/bpe/bpe.model",
+  "bucketing_sampler": true,
+  "cnn_module_kernels": "31,31,31,31,31",
+  "concatenate_cuts": false,
+  "context_size": 2,
+  "decode_chunk_len": 32,
+  "decoder_dim": 512,
+  "drop_last": true,
+  "duration_factor": 1.0,
+  "enable_musan": false,
+  "enable_spec_aug": true,
+  "encoder_dims": "384,384,384,384,384",
+  "encoder_unmasked_dims": "256,256,256,256,256",
+  "env_info": {
+    "IP address": "172.19.2.2",
+    "hostname": "8e64ffbd666a",
+    "icefall-git-branch": "master",
+    "icefall-git-date": "Fri Nov 28 03:42:20 2025",
+    "icefall-git-sha1": "0904e490-dirty",
+    "icefall-path": "/kaggle/working/icefall",
+    "k2-build-type": "Release",
+    "k2-git-date": "Thu Jul 25 03:34:26 2024",
+    "k2-git-sha1": "40e8d1676f6062e46458dc32ad21229c93cc9c50",
+    "k2-path": "/usr/local/lib/python3.12/dist-packages/k2/__init__.py",
+    "k2-version": "1.24.4",
+    "k2-with-cuda": true,
+    "lhotse-path": "/usr/local/lib/python3.12/dist-packages/lhotse/__init__.py",
+    "lhotse-version": "1.32.1",
+    "python-version": "3.12",
+    "torch-cuda-available": true,
+    "torch-cuda-version": "12.1",
+    "torch-version": "2.4.0+cu121"
+  },
+  "exp_dir": "/kaggle/working/amharic_training/exp_amharic_streaming",
+  "feature_dim": 80,
+  "feedforward_dims": "1024,1024,2048,2048,1024",
+  "full_libri": false,
+  "gap": 1.0,
+  "inf_check": false,
+  "input_strategy": "PrecomputedFeatures",
+  "joiner_dim": 512,
+  "keep_last_k": 5,
+  "lm_scale": 0.25,
+  "log_interval": 50,
+  "lr_batches": 5000,
+  "lr_epochs": 3.5,
+  "manifest_dir": "/kaggle/working/amharic_training/manifests",
+  "master_port": 12354,
+  "max_duration": 120,
+  "mini_libri": false,
+  "nhead": "8,8,8,8,8",
+  "num_buckets": 30,
+  "num_encoder_layers": "2,4,3,2,4",
+  "num_epochs": 50,
+  "num_left_chunks": 4,
+  "num_workers": 2,
+  "on_the_fly_feats": false,
+  "print_diagnostics": false,
+  "prune_range": 5,
+  "reset_interval": 200,
+  "return_cuts": true,
+  "save_every_n": 1000,
+  "seed": 42,
+  "short_chunk_size": 50,
+  "shuffle": true,
+  "simple_loss_scale": 0.5,
+  "spec_aug_time_warp_factor": 80,
+  "start_batch": 0,
+  "start_epoch": 1,
+  "subsampling_factor": 4,
+  "tensorboard": true,
+  "use_fp16": true,
+  "valid_interval": 1600,
+  "vocab_size": 1000,
+  "warm_step": 2000,
+  "world_size": 2,
+  "zipformer_downsampling_factors": "1,2,4,8,2"
+}
+2026-01-13 11:44:05,747 INFO [train.py:988] (1/2) About to create model
+2026-01-13 11:44:06,369 INFO [zipformer.py:405] (1/2) At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8.
+2026-01-13 11:44:06,387 INFO [train.py:992] (1/2) Number of model parameters: 71330891
+2026-01-13 11:44:06,499 INFO [train.py:1007] (1/2) Using DDP
+2026-01-13 11:44:08,475 INFO [asr_datamodule.py:422] (1/2) About to get train-clean-100 cuts
+2026-01-13 11:44:08,477 INFO [asr_datamodule.py:239] (1/2) Disable MUSAN
+2026-01-13 11:44:08,477 INFO [asr_datamodule.py:257] (1/2) Enable SpecAugment
+2026-01-13 11:44:08,477 INFO [asr_datamodule.py:258] (1/2) Time warp factor: 80
+2026-01-13 11:44:08,477 INFO [asr_datamodule.py:268] (1/2) Num frame mask: 10
+2026-01-13 11:44:08,477 INFO [asr_datamodule.py:281] (1/2) About to create train dataset
+2026-01-13 11:44:08,477 INFO [asr_datamodule.py:308] (1/2) Using DynamicBucketingSampler.
+2026-01-13 11:44:08,786 INFO [asr_datamodule.py:324] (1/2) About to create train dataloader
+2026-01-13 11:44:08,787 INFO [asr_datamodule.py:460] (1/2) About to get dev-clean cuts
+2026-01-13 11:44:08,787 INFO [asr_datamodule.py:467] (1/2) About to get dev-other cuts
+2026-01-13 11:44:08,788 INFO [asr_datamodule.py:355] (1/2) About to create dev dataset
+2026-01-13 11:44:08,987 INFO [asr_datamodule.py:372] (1/2) About to create dev dataloader
+2026-01-13 11:44:25,300 INFO [train.py:895] (1/2) Epoch 1, batch 0, loss[loss=8.191, simple_loss=7.455, pruned_loss=7.342, over 2645.00 frames. ], tot_loss[loss=8.191, simple_loss=7.455, pruned_loss=7.342, over 2645.00 frames. ], batch size: 7, lr: 2.50e-02, grad_scale: 2.0
+2026-01-13 11:44:25,301 INFO [train.py:920] (1/2) Computing validation loss

tensorboard/events.out.tfevents.1768304399.8e64ffbd666a.89842.0 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c442e4b3caa961aabf0cf0648922497b41c46a5d62275c905e25c979200de295
-size 88

 version https://git-lfs.github.com/spec/v1
+oid sha256:f2cefcf8eee1857f93ae1490b5b219b6444c207845afdfa964db6184e5d16fcf
+size 774

tensorboard/events.out.tfevents.1768304645.8e64ffbd666a.97184.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8455acb370010541dec439561e446c3686a62027b987a5c0a492e45cc6facc6d
+size 88