File size: 1,012 Bytes
0a614fa | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | # NB2 Kaggle kernel-death fix
Version 5/6 died before the first epoch print. The data/label fixes are correct (`soundscape positive labels: 3122`), so the remaining issue is memory pressure during the first training epoch.
Use these safer NB2 settings before running:
```python
class CFG:
epochs = 2
model_name = "b0"
folds_to_run = [0] # train ONE fold per Kaggle run first
batch_size = 4 # micro-batch
grad_accum_steps = 3 # effective batch 12
num_workers = 0
use_data_parallel = False # DataParallel caused kernel death on T4x2
max_train_audio_samples = None
max_sc_train_samples = None
```
Then repeat runs:
```python
# B0
folds_to_run = [0]
folds_to_run = [1]
folds_to_run = [2]
folds_to_run = [3]
folds_to_run = [4]
# B3, even safer
model_name = "b3"
folds_to_run = [0]
batch_size = 2
grad_accum_steps = 6
```
Also patch the optimizer loop: divide loss by `grad_accum_steps`, step only every N batches, and print every 100 batches.
|