File size: 3,923 Bytes
319d77e f338e91 319d77e f338e91 319d77e f338e91 319d77e f338e91 319d77e f338e91 319d77e f338e91 319d77e f338e91 319d77e f338e91 319d77e f338e91 319d77e f338e91 319d77e f338e91 319d77e f338e91 319d77e f338e91 319d77e f338e91 319d77e f338e91 319d77e f338e91 319d77e f338e91 319d77e f338e91 319d77e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 | ---
language:
- en
license: other
tags:
- whisper
- qwen3
- ctranslate2
- automatic-speech-recognition
- text-generation
- air-traffic-control
- atc
- singapore
- military
pipeline_tag: automatic-speech-recognition
---
# ASTRA ATC Models
Fine-tuned models for Singapore military air traffic control, built for the [ASTRA](https://github.com/aether-raid) training simulator.
## Pipeline
```
Audio --> VAD (Silero) --> ASR (Whisper) --> Rule Formatter --> Display Text
"camel climb flight level zero nine zero"
"CAMEL climb FL090"
```
The production pipeline uses a **rule-based formatter** (23 deterministic rules, <1ms, 0 VRAM) instead of the LLM. The LLM is retained for reference.
## Models
### [ASR/](./ASR) — Whisper Large v3 (CTranslate2 float16)
Fine-tuned for Singapore military ATC speech. Uses CTranslate2 float16 format for fast inference with [faster-whisper](https://github.com/SYSTRAN/faster-whisper).
| Metric | Value |
|--------|-------|
| WER | **0.66%** |
| Base model | `openai/whisper-large-v3` |
| Size | 2.9 GB |
| Training | Full fine-tune with enhanced VHF radio augmentation |
### [LLM/](./LLM) — Qwen3-1.7B Display Formatter (Legacy)
> **Legacy.** Superseded by a deterministic rule-based formatter. Retained for reference.
Converts normalized ASR output into structured ATC display text.
| Metric | Value |
|--------|-------|
| Exact match | **100%** (161/161) |
| Base model | `unsloth/Qwen3-1.7B` |
| Size | 3.3 GB |
## Architecture
```
Audio --> VAD (Silero) --> ASR (Whisper ct2) --> Post-processing --> Rule Formatter --> Display Text
```
| Component | Technology | Latency | VRAM |
|-----------|-----------|---------|------|
| VAD | Silero VAD (ONNX) | ~50ms | <100 MB |
| ASR | Whisper Large v3 (CTranslate2) | ~500ms-2s | ~2 GB |
| Formatter | 23 deterministic rules | <1ms | 0 MB |
Total VRAM: ~2 GB (ASR only).
## Domain
Singapore military ATC covering:
- **Airbases**: Tengah (WSAT, runway 18/36), Paya Lebar (WSAP, runway 02/20)
- **Aircraft**: F-16C/D, F-15SG, C-130, Hercules
- **Approaches**: ILS, GCA, PAR, TACAN, DVOR/DME, VOR/DME, Visual Straight-in
- **100+ callsigns**: CAMEL, NINJA, BEETLE, TAIPAN, MAVERICK, JAGUAR, LANCER, etc.
- **Categories**: departure, approach, handoff, maneuver, landing, emergency, ground, recovery, pilot reports, military-specific ops
## Training History
### ASR
| Run | WER | Base | Key Change |
|-----|-----|------|------------|
| ct2_run5 | 0.48% | jacktol/whisper-large-v3-finetuned-for-ATC | Initial fine-tune |
| ct2_run6 | 0.40% | jacktol/whisper-large-v3-finetuned-for-ATC | +augmentation, weight decay |
| ct2_run7 | 0.24% | jacktol/whisper-large-v3-finetuned-for-ATC | Frozen encoder, +50 real recordings |
| **ct2_run8** | **0.66%** | openai/whisper-large-v3 | Full retrain from base, enhanced augmentation |
> ct2_run8 trains from the original Whisper base for better generalisation to real-world ATC audio.
### LLM (Legacy)
| Run | Accuracy | Key Change |
|-----|----------|------------|
| llm_run3 | 98.1% (Qwen3-8B) | QLoRA 4-bit, 871 examples |
| llm_run4 | 100% (Qwen3-1.7B) | bf16 LoRA, 1,915 examples with ASR noise augmentation |
## Quick Start
### ASR
```python
from faster_whisper import WhisperModel
model = WhisperModel("./ASR", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.wav", language="en", beam_size=5)
text = " ".join(seg.text.strip() for seg in segments)
```
### Download
```bash
# Full repo (ASR + LLM)
huggingface-cli download aether-raid/astra-atc-models --local-dir ./models
# ASR only (recommended)
huggingface-cli download aether-raid/astra-atc-models --include "ASR/*" --local-dir ./models
# LLM only (legacy)
huggingface-cli download aether-raid/astra-atc-models --include "LLM/*" --local-dir ./models
```
|