aether-raid
/

astra-atc-models

Automatic Speech Recognition

text-generation

air-traffic-control

Model card Files Files and versions

astra-atc-models / ASR /whisper /hyperparameters.md

RanenSim's picture

feat: rename model

01f9953 16 days ago

|

history blame contribute delete

2.62 kB

Hyperparameters — Whisper ATC Fine-tune (Run 9)

Model

Key	Value
Base model	`openai/whisper-large-v3`
Architecture	Whisper Large v3
d_model	1280
Encoder layers	32
Decoder layers	32
Encoder attention heads	20
Decoder attention heads	20
Mel bins	128

Training

Key	Value
Optimizer	AdamW (bitsandbytes 8-bit)
Learning rate	1e-05
LR scheduler	Linear
Warmup ratio	0.05
Adam β₁ / β₂ / ε	0.9 / 0.999 / 1e-8
Weight decay	0.01
Per-device train batch size	1
Per-device eval batch size	8
Gradient accumulation steps	16
Effective batch size	16
Gradient checkpointing	Yes (use_reentrant=False)
Mixed precision	fp16
Max grad norm	1.0
Max epochs (configured)	30
Early stop patience	7 epochs
Label smoothing	0.0
Freeze encoder	No
Seed	42

Data Sources

Source	Role	Size
axite_all.json	SG military ATC synthetic (4 voices + human)	~15,716
deepdml/conversations	Real Singapore Changi ATC VHF radio	~1,443
mnsc-part1-test	MNSC SG-accented read speech	~3,000

Augmentation

Gaussian noise (p=0.4, amplitude 0.001–0.015)
Time stretch (p=0.3, rate 0.9–1.1)
Random silence padding (p=0.5, 0–0.7s each end)
BandPassFilter (p=0.75, 300–3400 Hz, VHF radio simulation)
Clip (p=0.2, ±0.8)
Mp3Compression (p=0.3, 32–64 kbps)
SpecAugment: FrequencyMasking(freq_mask_param=27) + TimeMasking(time_mask_param=100, p=0.05)

Early stopping

Key	Value
Metric	WER (lower is better)
Stopped at	Step 21185 / Epoch 19
Patience	7 epochs

Results

Epoch	Eval loss	WER
1.0	0.0838	11.46%
2.0	0.0550	4.28%
3.0	0.0406	2.79%
4.0	0.0417	6.58%
5.0	0.0381	5.46%
6.0	0.0372	3.27%
7.0	0.0375	1.39%
8.0	0.0381	5.52%
9.0	0.0188	0.83%
10.0	0.0202	0.84%
11.0	0.0185	1.05%
12.0	0.0189	0.82% ← best
13.0	0.0189	0.95%
14.0	0.0202	1.19%
15.0	0.0206	0.91%
16.0	0.0191	1.16%
17.0	0.0169	1.12%
18.0	0.0176	1.19%
19.0	0.0185	1.19%

Best checkpoint: training/output_run9/checkpoint-13380 (epoch 12, WER 0.82%)

Output

Key	Value
Best HF checkpoint	`training/output_run9/best/`
CTranslate2 model	`training/saved_models/ct2_run9/`
Quantization	float16
Inference backend	faster-whisper