| # NB2 Kaggle kernel-death fix | |
| Version 5/6 died before the first epoch print. The data/label fixes are correct (`soundscape positive labels: 3122`), so the remaining issue is memory pressure during the first training epoch. | |
| Use these safer NB2 settings before running: | |
| ```python | |
| class CFG: | |
| epochs = 2 | |
| model_name = "b0" | |
| folds_to_run = [0] # train ONE fold per Kaggle run first | |
| batch_size = 4 # micro-batch | |
| grad_accum_steps = 3 # effective batch 12 | |
| num_workers = 0 | |
| use_data_parallel = False # DataParallel caused kernel death on T4x2 | |
| max_train_audio_samples = None | |
| max_sc_train_samples = None | |
| ``` | |
| Then repeat runs: | |
| ```python | |
| # B0 | |
| folds_to_run = [0] | |
| folds_to_run = [1] | |
| folds_to_run = [2] | |
| folds_to_run = [3] | |
| folds_to_run = [4] | |
| # B3, even safer | |
| model_name = "b3" | |
| folds_to_run = [0] | |
| batch_size = 2 | |
| grad_accum_steps = 6 | |
| ``` | |
| Also patch the optimizer loop: divide loss by `grad_accum_steps`, step only every N batches, and print every 100 batches. | |