File size: 3,923 Bytes
319d77e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f338e91
 
 
319d77e
 
f338e91
 
 
319d77e
 
f338e91
 
319d77e
 
 
 
 
 
 
 
f338e91
 
319d77e
f338e91
 
 
319d77e
f338e91
319d77e
f338e91
319d77e
 
 
 
 
 
 
f338e91
319d77e
 
f338e91
319d77e
 
f338e91
 
 
 
 
 
 
319d77e
 
 
 
 
f338e91
 
 
319d77e
 
 
 
 
 
f338e91
 
 
 
 
 
319d77e
f338e91
 
 
319d77e
 
 
 
f338e91
319d77e
 
 
 
 
 
 
 
 
 
 
 
 
f338e91
319d77e
 
f338e91
319d77e
 
f338e91
319d77e
 
f338e91
319d77e
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
language:
  - en
license: other
tags:
  - whisper
  - qwen3
  - ctranslate2
  - automatic-speech-recognition
  - text-generation
  - air-traffic-control
  - atc
  - singapore
  - military
pipeline_tag: automatic-speech-recognition
---

# ASTRA ATC Models

Fine-tuned models for Singapore military air traffic control, built for the [ASTRA](https://github.com/aether-raid) training simulator.

## Pipeline

```
Audio  -->  VAD (Silero)  -->  ASR (Whisper)  -->  Rule Formatter  -->  Display Text
                               "camel climb flight level zero nine zero"
                                                                        "CAMEL climb FL090"
```

The production pipeline uses a **rule-based formatter** (23 deterministic rules, <1ms, 0 VRAM) instead of the LLM. The LLM is retained for reference.

## Models

### [ASR/](./ASR) — Whisper Large v3 (CTranslate2 float16)

Fine-tuned for Singapore military ATC speech. Uses CTranslate2 float16 format for fast inference with [faster-whisper](https://github.com/SYSTRAN/faster-whisper).

| Metric | Value |
|--------|-------|
| WER | **0.66%** |
| Base model | `openai/whisper-large-v3` |
| Size | 2.9 GB |
| Training | Full fine-tune with enhanced VHF radio augmentation |

### [LLM/](./LLM) — Qwen3-1.7B Display Formatter (Legacy)

> **Legacy.** Superseded by a deterministic rule-based formatter. Retained for reference.

Converts normalized ASR output into structured ATC display text.

| Metric | Value |
|--------|-------|
| Exact match | **100%** (161/161) |
| Base model | `unsloth/Qwen3-1.7B` |
| Size | 3.3 GB |

## Architecture

```
Audio --> VAD (Silero) --> ASR (Whisper ct2) --> Post-processing --> Rule Formatter --> Display Text
```

| Component | Technology | Latency | VRAM |
|-----------|-----------|---------|------|
| VAD | Silero VAD (ONNX) | ~50ms | <100 MB |
| ASR | Whisper Large v3 (CTranslate2) | ~500ms-2s | ~2 GB |
| Formatter | 23 deterministic rules | <1ms | 0 MB |

Total VRAM: ~2 GB (ASR only).

## Domain

Singapore military ATC covering:
- **Airbases**: Tengah (WSAT, runway 18/36), Paya Lebar (WSAP, runway 02/20)
- **Aircraft**: F-16C/D, F-15SG, C-130, Hercules
- **Approaches**: ILS, GCA, PAR, TACAN, DVOR/DME, VOR/DME, Visual Straight-in
- **100+ callsigns**: CAMEL, NINJA, BEETLE, TAIPAN, MAVERICK, JAGUAR, LANCER, etc.
- **Categories**: departure, approach, handoff, maneuver, landing, emergency, ground, recovery, pilot reports, military-specific ops

## Training History

### ASR

| Run | WER | Base | Key Change |
|-----|-----|------|------------|
| ct2_run5 | 0.48% | jacktol/whisper-large-v3-finetuned-for-ATC | Initial fine-tune |
| ct2_run6 | 0.40% | jacktol/whisper-large-v3-finetuned-for-ATC | +augmentation, weight decay |
| ct2_run7 | 0.24% | jacktol/whisper-large-v3-finetuned-for-ATC | Frozen encoder, +50 real recordings |
| **ct2_run8** | **0.66%** | openai/whisper-large-v3 | Full retrain from base, enhanced augmentation |

> ct2_run8 trains from the original Whisper base for better generalisation to real-world ATC audio.

### LLM (Legacy)

| Run | Accuracy | Key Change |
|-----|----------|------------|
| llm_run3 | 98.1% (Qwen3-8B) | QLoRA 4-bit, 871 examples |
| llm_run4 | 100% (Qwen3-1.7B) | bf16 LoRA, 1,915 examples with ASR noise augmentation |

## Quick Start

### ASR

```python
from faster_whisper import WhisperModel

model = WhisperModel("./ASR", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.wav", language="en", beam_size=5)
text = " ".join(seg.text.strip() for seg in segments)
```

### Download

```bash
# Full repo (ASR + LLM)
huggingface-cli download aether-raid/astra-atc-models --local-dir ./models

# ASR only (recommended)
huggingface-cli download aether-raid/astra-atc-models --include "ASR/*" --local-dir ./models

# LLM only (legacy)
huggingface-cli download aether-raid/astra-atc-models --include "LLM/*" --local-dir ./models
```