astra-atc-models / ASR /hyperparameters.md
RanenSim's picture
feat: update ASR model, mark LLM as legacy
f338e91

Hyperparameters — Whisper ATC Fine-tune

Model

Key Value
Base model openai/whisper-large-v3
Architecture Whisper Large v3
d_model 1280
Encoder layers 32
Decoder layers 32
Encoder attention heads 20
Decoder attention heads 20
Mel bins 128

Training

Key Value
Optimizer AdamW (bitsandbytes 8-bit)
Learning rate 1e-05
LR scheduler Linear
Warmup ratio 0.05
Adam β₁ / β₂ / ε 0.9 / 0.999 / 1e-8
Weight decay 0.01
Per-device train batch size 1
Per-device eval batch size 8
Gradient accumulation steps 16
Effective batch size 16
Gradient checkpointing Yes (use_reentrant=False)
Mixed precision fp16
Max grad norm 1.0
Max epochs (configured) 25
Early stop patience 5 epochs
Label smoothing 0.0
Freeze encoder No
Seed 42

Augmentation

  • Gaussian noise (p=0.4, amplitude 0.001–0.015)
  • Time stretch (p=0.3, rate 0.9–1.1)
  • Random silence padding (p=0.5, 0–0.7s each end)
  • BandPassFilter (p=0.75, 300–3400 Hz, VHF radio simulation)
  • Clip (p=0.2, ±0.8)
  • Mp3Compression (p=0.3, 32–64 kbps)
  • SpecAugment: FrequencyMasking(freq_mask_param=27) + TimeMasking(time_mask_param=100, p=0.05)

Early stopping

Key Value
Metric WER (lower is better)
Stopped at Step 6919 / Epoch 11
Patience 5 epochs

Results

Epoch Eval loss WER
1.0 0.0496 3.46%
2.0 0.0288 1.84%
3.0 0.0239 0.82%
4.0 0.0245 1.55%
5.0 0.0195 0.92%
6.0 0.0231 0.66% ← best
7.0 0.0199 0.70%
8.0 0.0211 2.62%
9.0 0.0191 0.72%
10.0 0.0186 4.43%
11.0 0.0172 0.69%

Best checkpoint: training/output_run8/checkpoint-3774 (epoch 6, WER 0.66%)

Output

Key Value
Best HF checkpoint training/output_run8/best/
CTranslate2 model training/saved_models/ct2_run8/
Quantization float16
Inference backend faster-whisper