Whisper American ATC - Fine-tuned for US Air Traffic Control

Model Description

This model is a fine-tuned version of jlvdoorn/whisper-large-v3-atco2-asr adapted for American Air Traffic Control (ATC) radio communications. The base model was trained on European ATCO2 data; this version has been specialized for American ATC accents, phraseology, and radio characteristics.

Developed by: Jeffrey Suu
Model type: Automatic Speech Recognition (ASR)
Language: English (American ATC)
License: Apache 2.0
Finetuned from: jlvdoorn/whisper-large-v3-atco2-asr

Intended Use

Direct Use

This model is designed for transcribing American Air Traffic Control radio communications, including:

LiveATC recordings from US airports (IAH, JFK, SFO, etc.)
Pilot-controller communications
Ground control, tower, and approach frequencies

Out-of-Scope Use

Non-ATC aviation audio
Non-American English accents
General-purpose speech recognition
Safety-critical real-time ATC systems without human oversight

Training Details

Training Data

Source: LiveATC recordings from Houston IAH, New York JFK, San Francisco SFO
Size: 55 original clips (~~6 minutes), augmented to 275 samples (~~31 minutes)
Preprocessing:
- Bandpass filtered (300-3400 Hz) to simulate ATC radio frequency response
- Volume normalized
- 5x data augmentation (time stretch, pitch shift, noise, gain)

Training Procedure

Training regime: Full fine-tuning (fp32)
Learning rate: 5e-6
Batch size: 4 (effective: 16 with gradient accumulation)
Epochs: 5
Hardware: Google Colab Tesla T4 (15GB VRAM)
Training time: ~25 minutes

Evaluation

Metrics

Word Error Rate (WER) on validation set:

Model	WER
Base (European ATCO2)	30.3%
This model (American ATC)	13.7%
Improvement	-16.6%

Key Improvements

✅ Correctly transcribes American number formatting (e.g., "1503" not "Fifteen Zero Three")
✅ Better handling of American accents and speech patterns
✅ Improved recognition of US-specific callsigns and airports
✅ Preserves numeric frequencies (e.g., "135.15" not "one three five one five")

How to Use

from transformers import pipeline

# Load model
transcriber = pipeline(
    "automatic-speech-recognition",
    model="jeffreysuu/whisper-american-atc",
    device=0  # Use GPU if available
)

# Transcribe audio
result = transcriber("path/to/atc_audio.wav")
print(result["text"])

Limitations and Bias

Limited training data: Fine-tuned on only 275 samples from 3 US airports
Airport bias: Best performance on IAH, JFK, SFO; may vary on other airports
Accent coverage: Primarily trained on American controllers; performance on non-American accents unknown
Not production-ready: Requires human verification for safety-critical applications

Technical Specifications

Model Architecture

Base: OpenAI Whisper Large v3 (1.5B parameters)
Encoder: Audio → Log-mel spectrogram
Decoder: Transformer-based text generation

Compute Infrastructure

Hardware: NVIDIA Tesla T4 (Google Colab)
Software:
- Hugging Face Transformers 4.57.3
- Python 3.12
- PyTorch

Citation

If you use this model, please cite the original Whisper paper and ATCO2 work:

@misc{radford2022whisper,
  title={Robust Speech Recognition via Large-Scale Weak Supervision},
  author={Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  year={2022}
}

Contact

Model Card Author: Jeffrey Suu
GitHub: [Your GitHub]
Email: [Your Email]

For issues or questions about this model, please open an issue on the model repository.

Downloads last month: 30

Safetensors

Model size

2B params

Tensor type

F32

Evaluation results

Word Error Rate
self-reported

13.700