astra-atc-models / ASR /whisper /hyperparameters.md
RanenSim's picture
feat: rename model
01f9953

Hyperparameters — Whisper ATC Fine-tune (Run 9)

Model

Key Value
Base model openai/whisper-large-v3
Architecture Whisper Large v3
d_model 1280
Encoder layers 32
Decoder layers 32
Encoder attention heads 20
Decoder attention heads 20
Mel bins 128

Training

Key Value
Optimizer AdamW (bitsandbytes 8-bit)
Learning rate 1e-05
LR scheduler Linear
Warmup ratio 0.05
Adam β₁ / β₂ / ε 0.9 / 0.999 / 1e-8
Weight decay 0.01
Per-device train batch size 1
Per-device eval batch size 8
Gradient accumulation steps 16
Effective batch size 16
Gradient checkpointing Yes (use_reentrant=False)
Mixed precision fp16
Max grad norm 1.0
Max epochs (configured) 30
Early stop patience 7 epochs
Label smoothing 0.0
Freeze encoder No
Seed 42

Data Sources

Source Role Size
axite_all.json SG military ATC synthetic (4 voices + human) ~15,716
deepdml/conversations Real Singapore Changi ATC VHF radio ~1,443
mnsc-part1-test MNSC SG-accented read speech ~3,000

Augmentation

  • Gaussian noise (p=0.4, amplitude 0.001–0.015)
  • Time stretch (p=0.3, rate 0.9–1.1)
  • Random silence padding (p=0.5, 0–0.7s each end)
  • BandPassFilter (p=0.75, 300–3400 Hz, VHF radio simulation)
  • Clip (p=0.2, ±0.8)
  • Mp3Compression (p=0.3, 32–64 kbps)
  • SpecAugment: FrequencyMasking(freq_mask_param=27) + TimeMasking(time_mask_param=100, p=0.05)

Early stopping

Key Value
Metric WER (lower is better)
Stopped at Step 21185 / Epoch 19
Patience 7 epochs

Results

Epoch Eval loss WER
1.0 0.0838 11.46%
2.0 0.0550 4.28%
3.0 0.0406 2.79%
4.0 0.0417 6.58%
5.0 0.0381 5.46%
6.0 0.0372 3.27%
7.0 0.0375 1.39%
8.0 0.0381 5.52%
9.0 0.0188 0.83%
10.0 0.0202 0.84%
11.0 0.0185 1.05%
12.0 0.0189 0.82% ← best
13.0 0.0189 0.95%
14.0 0.0202 1.19%
15.0 0.0206 0.91%
16.0 0.0191 1.16%
17.0 0.0169 1.12%
18.0 0.0176 1.19%
19.0 0.0185 1.19%

Best checkpoint: training/output_run9/checkpoint-13380 (epoch 12, WER 0.82%)

Output

Key Value
Best HF checkpoint training/output_run9/best/
CTranslate2 model training/saved_models/ct2_run9/
Quantization float16
Inference backend faster-whisper