Hyperparameters — Whisper ATC Fine-tune

Model

Key	Value
Optimizer	AdamW (bitsandbytes 8-bit)
Learning rate	1e-05
LR scheduler	Linear
Warmup ratio	0.05
Adam β₁ / β₂ / ε	0.9 / 0.999 / 1e-8
Weight decay	0.01
Per-device train batch size	1
Per-device eval batch size	8
Gradient accumulation steps	16
Effective batch size	16
Gradient checkpointing	Yes (use_reentrant=False)
Mixed precision	fp16
Max grad norm	1.0
Max epochs (configured)	25
Early stop patience	5 epochs
Label smoothing	0.0
Freeze encoder	No
Seed	42

Gaussian noise (p=0.4, amplitude 0.001–0.015)
Time stretch (p=0.3, rate 0.9–1.1)
Random silence padding (p=0.5, 0–0.7s each end)
BandPassFilter (p=0.75, 300–3400 Hz, VHF radio simulation)
Clip (p=0.2, ±0.8)
Mp3Compression (p=0.3, 32–64 kbps)
SpecAugment: FrequencyMasking(freq_mask_param=27) + TimeMasking(time_mask_param=100, p=0.05)

Best checkpoint: training/output_run8/checkpoint-3774 (epoch 6, WER 0.66%)