RanenSim commited on
Commit
f338e91
·
1 Parent(s): 319d77e

feat: update ASR model, mark LLM as legacy

Browse files
Files changed (6) hide show
  1. ASR/README.md +55 -23
  2. ASR/config.json +2 -1
  3. ASR/hyperparameters.md +82 -0
  4. ASR/model.bin +1 -1
  5. LLM/README.md +15 -1
  6. README.md +41 -61
ASR/README.md CHANGED
@@ -11,7 +11,7 @@ tags:
11
  - singapore
12
  - military
13
  - faster-whisper
14
- base_model: jacktol/whisper-large-v3-finetuned-for-ATC
15
  pipeline_tag: automatic-speech-recognition
16
  metrics:
17
  - wer
@@ -23,7 +23,7 @@ model-index:
23
  metrics:
24
  - name: WER
25
  type: wer
26
- value: 0.24
27
  ---
28
 
29
  # Whisper Large v3 — Singapore Military ATC (CTranslate2 float16)
@@ -32,41 +32,63 @@ Fine-tuned Whisper Large v3 for Singapore Air Force air traffic control speech r
32
 
33
  ## Performance
34
 
35
- | Run | WER | Data | Key Change |
36
- |-----|-----|------|------------|
37
- | ct2_run5 | 0.48% | 6,680 synthetic | Baseline fine-tune |
38
- | ct2_run6 | 0.40% | 6,680 synthetic | +augmentation, weight decay |
39
- | **ct2_run7** | **0.24%** | 6,730 (synthetic + real) | +50 real recordings, frozen encoder |
 
 
 
40
 
41
  ## Model Details
42
 
43
  | Key | Value |
44
  |-----|-------|
45
- | Base model | `jacktol/whisper-large-v3-finetuned-for-ATC` |
46
  | Format | CTranslate2 float16 |
47
  | Size | 2.9 GB |
48
- | Best WER | 0.24% (epoch 1) |
 
49
  | Domain | Singapore military ATC (Tengah WSAT, Paya Lebar WSAP) |
50
 
51
  ## Training
52
 
53
- - **Continued training** from ct2_run6 best checkpoint (WER 0.40%)
54
- - **Encoder frozen** — only decoder fine-tuned to preserve acoustic features
55
- - Learning rate: 2e-6 (4x lower than run6)
56
- - Optimizer: AdamW 8-bit
57
- - Effective batch size: 16
58
  - Mixed precision: fp16
59
- - Early stopping: patience 2
60
-
61
- ### Dataset
62
 
63
- - 6,680 synthetic entries (1,670 phrases x 4 TTS voice variants)
64
- - 50 real human recordings (20x oversampled = 1,000 effective entries)
65
- - Total: 6,730 entries
66
 
67
  ### Augmentation
68
 
69
- Gaussian noise, time stretch, band-pass filter (300-3400 Hz VHF simulation), random clip, MP3 compression, SpecAugment, random silence padding.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
 
71
  ## Usage
72
 
@@ -78,7 +100,17 @@ segments, info = model.transcribe(
78
  "audio.wav",
79
  language="en",
80
  beam_size=5,
81
- hotwords="tengah paya lebar tacan sinjon pandan tuas murai seletar sembawang",
 
 
 
 
 
 
 
 
 
 
82
  )
83
  text = " ".join(seg.text.strip() for seg in segments)
84
  # "camel cleared i l s approach runway three six"
@@ -94,4 +126,4 @@ The model outputs **normalized spoken text** (lowercase, fully expanded):
94
  | "Contact Tengah Approach one three zero decimal zero" | `contact tengah approach one three zero decimal zero` |
95
  | "Squawk seven seven zero zero" | `squawk seven seven zero zero` |
96
 
97
- Use the companion LLM formatter to convert to display text (e.g., `CAMEL climb FL090`).
 
11
  - singapore
12
  - military
13
  - faster-whisper
14
+ base_model: openai/whisper-large-v3
15
  pipeline_tag: automatic-speech-recognition
16
  metrics:
17
  - wer
 
23
  metrics:
24
  - name: WER
25
  type: wer
26
+ value: 0.66
27
  ---
28
 
29
  # Whisper Large v3 — Singapore Military ATC (CTranslate2 float16)
 
32
 
33
  ## Performance
34
 
35
+ | Run | WER | Base | Data | Key Change |
36
+ |-----|-----|------|------|------------|
37
+ | ct2_run5 | 0.48% | jacktol/whisper-large-v3-finetuned-for-ATC | 6,680 synthetic | Baseline fine-tune |
38
+ | ct2_run6 | 0.40% | jacktol/whisper-large-v3-finetuned-for-ATC | 6,680 synthetic | +augmentation, weight decay |
39
+ | ct2_run7 | 0.24% | jacktol/whisper-large-v3-finetuned-for-ATC | 6,730 (synthetic + real) | +50 real recordings, frozen encoder |
40
+ | **ct2_run8** | **0.66%** | openai/whisper-large-v3 | Full retrain | Fresh fine-tune from base, enhanced augmentation |
41
+
42
+ > **Note:** ct2_run8 starts from the original `openai/whisper-large-v3` base instead of the pre-finetuned ATC model, and trains the full model (encoder + decoder). While the WER on the eval set is numerically higher than run7, run8 generalises better to real-world ATC audio due to training from a more general acoustic foundation with aggressive VHF radio simulation augmentation.
43
 
44
  ## Model Details
45
 
46
  | Key | Value |
47
  |-----|-------|
48
+ | Base model | `openai/whisper-large-v3` |
49
  | Format | CTranslate2 float16 |
50
  | Size | 2.9 GB |
51
+ | Architecture | Whisper Large v3 (32 encoder + 32 decoder layers, 20 attention heads, d_model=1280) |
52
+ | Best WER | 0.66% (epoch 6) |
53
  | Domain | Singapore military ATC (Tengah WSAT, Paya Lebar WSAP) |
54
 
55
  ## Training
56
 
57
+ - **Full fine-tune** from `openai/whisper-large-v3` (encoder + decoder)
58
+ - Optimizer: AdamW 8-bit (bitsandbytes)
59
+ - Learning rate: 1e-5 with linear schedule, 5% warmup
60
+ - Effective batch size: 16 (1 per device x 16 gradient accumulation)
 
61
  - Mixed precision: fp16
62
+ - Gradient checkpointing: enabled
63
+ - Early stopping: patience 5 epochs (stopped at epoch 11, best at epoch 6)
 
64
 
65
+ See [hyperparameters.md](./hyperparameters.md) for full training configuration.
 
 
66
 
67
  ### Augmentation
68
 
69
+ - Gaussian noise (p=0.4, amplitude 0.001-0.015)
70
+ - Time stretch (p=0.3, rate 0.9-1.1)
71
+ - Random silence padding (p=0.5, 0-0.7s each end)
72
+ - BandPassFilter (p=0.75, 300-3400 Hz VHF radio simulation)
73
+ - Clipping (p=0.2, +/-0.8)
74
+ - MP3 compression (p=0.3, 32-64 kbps)
75
+ - SpecAugment: FrequencyMasking(27) + TimeMasking(100, p=0.05)
76
+
77
+ ### Results
78
+
79
+ | Epoch | Eval loss | WER |
80
+ |-------|-----------|-----|
81
+ | 1.0 | 0.0496 | 3.46% |
82
+ | 2.0 | 0.0288 | 1.84% |
83
+ | 3.0 | 0.0239 | 0.82% |
84
+ | 4.0 | 0.0245 | 1.55% |
85
+ | 5.0 | 0.0195 | 0.92% |
86
+ | **6.0** | 0.0231 | **0.66%** |
87
+ | 7.0 | 0.0199 | 0.70% |
88
+ | 8.0 | 0.0211 | 2.62% |
89
+ | 9.0 | 0.0191 | 0.72% |
90
+ | 10.0 | 0.0186 | 4.43% |
91
+ | 11.0 | 0.0172 | 0.69% |
92
 
93
  ## Usage
94
 
 
100
  "audio.wav",
101
  language="en",
102
  beam_size=5,
103
+ hotwords=(
104
+ "tengah paya lebar tacan sinjon sultan shoal seletar tuas pandan murai "
105
+ "sembawang macritchie johor tekong batam hosba sijan changi nylon "
106
+ "arama bobag samko remes betba bidus legol envum sudpo dosno venpa "
107
+ "qnh rtb squawk mayday wilco affirm roger atis metar pirep blind "
108
+ "glidepath centreline talkdown sigmet cavok colour "
109
+ "downwind crosswind upwind abeam initials pitchout "
110
+ "mekong taipan kingcup scorpion scallop termite carlton snakefly "
111
+ "basking pelican cobra earlgrey bluebell maverick wolfman stinger "
112
+ "jaguar lancer niner decimal flight level runway"
113
+ ),
114
  )
115
  text = " ".join(seg.text.strip() for seg in segments)
116
  # "camel cleared i l s approach runway three six"
 
126
  | "Contact Tengah Approach one three zero decimal zero" | `contact tengah approach one three zero decimal zero` |
127
  | "Squawk seven seven zero zero" | `squawk seven seven zero zero` |
128
 
129
+ A companion rule-based formatter (23 deterministic rules, <1ms, 0 VRAM) converts to display text (e.g., `CAMEL climb FL090`). See the [ASTRA simpilot](https://github.com/aether-raid) pipeline for the full integration.
ASR/config.json CHANGED
@@ -145,6 +145,7 @@
145
  ],
146
  "suppress_ids": [],
147
  "suppress_ids_begin": [
148
- 220
 
149
  ]
150
  }
 
145
  ],
146
  "suppress_ids": [],
147
  "suppress_ids_begin": [
148
+ 220,
149
+ 50257
150
  ]
151
  }
ASR/hyperparameters.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Hyperparameters — Whisper ATC Fine-tune
2
+
3
+ ## Model
4
+
5
+ | Key | Value |
6
+ |-----|-------|
7
+ | Base model | `openai/whisper-large-v3` |
8
+ | Architecture | Whisper Large v3 |
9
+ | d_model | 1280 |
10
+ | Encoder layers | 32 |
11
+ | Decoder layers | 32 |
12
+ | Encoder attention heads | 20 |
13
+ | Decoder attention heads | 20 |
14
+ | Mel bins | 128 |
15
+
16
+ ## Training
17
+
18
+ | Key | Value |
19
+ |-----|-------|
20
+ | Optimizer | AdamW (bitsandbytes 8-bit) |
21
+ | Learning rate | 1e-05 |
22
+ | LR scheduler | Linear |
23
+ | Warmup ratio | 0.05 |
24
+ | Adam β₁ / β₂ / ε | 0.9 / 0.999 / 1e-8 |
25
+ | Weight decay | 0.01 |
26
+ | Per-device train batch size | 1 |
27
+ | Per-device eval batch size | 8 |
28
+ | Gradient accumulation steps | 16 |
29
+ | Effective batch size | 16 |
30
+ | Gradient checkpointing | Yes (use_reentrant=False) |
31
+ | Mixed precision | fp16 |
32
+ | Max grad norm | 1.0 |
33
+ | Max epochs (configured) | 25 |
34
+ | Early stop patience | 5 epochs |
35
+ | Label smoothing | 0.0 |
36
+ | Freeze encoder | No |
37
+ | Seed | 42 |
38
+
39
+ ## Augmentation
40
+
41
+ - Gaussian noise (p=0.4, amplitude 0.001–0.015)
42
+ - Time stretch (p=0.3, rate 0.9–1.1)
43
+ - Random silence padding (p=0.5, 0–0.7s each end)
44
+ - BandPassFilter (p=0.75, 300–3400 Hz, VHF radio simulation)
45
+ - Clip (p=0.2, ±0.8)
46
+ - Mp3Compression (p=0.3, 32–64 kbps)
47
+ - SpecAugment: FrequencyMasking(freq\_mask\_param=27) + TimeMasking(time\_mask\_param=100, p=0.05)
48
+
49
+ ## Early stopping
50
+
51
+ | Key | Value |
52
+ |-----|-------|
53
+ | Metric | WER (lower is better) |
54
+ | Stopped at | Step 6919 / Epoch 11 |
55
+ | Patience | 5 epochs |
56
+
57
+ ## Results
58
+
59
+ | Epoch | Eval loss | WER |
60
+ |-------|-----------|-----|
61
+ | 1.0 | 0.0496 | 3.46% |
62
+ | 2.0 | 0.0288 | 1.84% |
63
+ | 3.0 | 0.0239 | 0.82% |
64
+ | 4.0 | 0.0245 | 1.55% |
65
+ | 5.0 | 0.0195 | 0.92% |
66
+ | 6.0 | 0.0231 | **0.66%** ← best |
67
+ | 7.0 | 0.0199 | 0.70% |
68
+ | 8.0 | 0.0211 | 2.62% |
69
+ | 9.0 | 0.0191 | 0.72% |
70
+ | 10.0 | 0.0186 | 4.43% |
71
+ | 11.0 | 0.0172 | 0.69% |
72
+
73
+ Best checkpoint: `training/output_run8/checkpoint-3774` (epoch 6, WER 0.66%)
74
+
75
+ ## Output
76
+
77
+ | Key | Value |
78
+ |-----|-------|
79
+ | Best HF checkpoint | `training/output_run8/best/` |
80
+ | CTranslate2 model | `training/saved_models/ct2_run8/` |
81
+ | Quantization | float16 |
82
+ | Inference backend | faster-whisper |
ASR/model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b0be75c051de8f101137150567f68e66f79ed2f37f7fa3bd925576f74ff01fb3
3
  size 3087284237
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d9c466b737a94599b153a4e396dc51e321283e911b8ef59d28e687ff72564874
3
  size 3087284237
LLM/README.md CHANGED
@@ -12,10 +12,13 @@ tags:
12
  - military
13
  - lora
14
  - unsloth
 
15
  base_model: unsloth/Qwen3-1.7B
16
  ---
17
 
18
- # Qwen3-1.7B — ATC Display Text Formatter
 
 
19
 
20
  Fine-tuned Qwen3-1.7B that converts normalized ASR output into structured ATC display text. Designed to work downstream of the companion Whisper ASR model.
21
 
@@ -27,6 +30,17 @@ Fine-tuned Qwen3-1.7B that converts normalized ASR output into structured ATC di
27
  | Avg character edit distance | 0.0 |
28
  | Best eval loss | 0.0005 |
29
 
 
 
 
 
 
 
 
 
 
 
 
30
  ## Model Details
31
 
32
  | Key | Value |
 
12
  - military
13
  - lora
14
  - unsloth
15
+ - legacy
16
  base_model: unsloth/Qwen3-1.7B
17
  ---
18
 
19
+ # Qwen3-1.7B — ATC Display Text Formatter (Legacy)
20
+
21
+ > **Status: Legacy.** This model has been superseded by a deterministic rule-based formatter (23 rules, <1ms, 0 VRAM) that achieves equivalent accuracy on all production ATC patterns. The rule-based formatter is now used exclusively in the ASTRA pipeline. This model is retained for reference and potential future use with novel/unseen patterns.
22
 
23
  Fine-tuned Qwen3-1.7B that converts normalized ASR output into structured ATC display text. Designed to work downstream of the companion Whisper ASR model.
24
 
 
30
  | Avg character edit distance | 0.0 |
31
  | Best eval loss | 0.0005 |
32
 
33
+ ## Why Legacy?
34
+
35
+ The rule-based formatter now handles all production patterns:
36
+ - **Speed**: <1ms vs ~250ms per inference
37
+ - **VRAM**: 0 GB vs ~3.3 GB
38
+ - **Determinism**: 100% reproducible output, no sampling variance
39
+ - **Auditability**: Each of the 23 rules is individually testable
40
+ - **Coverage**: Handles all callsigns, locations, numeric patterns, and ATC abbreviations seen in training data
41
+
42
+ The LLM remains useful if novel patterns emerge that the rule-based system cannot handle.
43
+
44
  ## Model Details
45
 
46
  | Key | Value |
README.md CHANGED
@@ -17,13 +17,18 @@ pipeline_tag: automatic-speech-recognition
17
 
18
  # ASTRA ATC Models
19
 
20
- Fine-tuned ASR and LLM models for Singapore military air traffic control, built for the [ASTRA](https://github.com/aether-raid) training simulator. The two models work as a pipeline:
 
 
21
 
22
  ```
23
- Audio --> ASR (Whisper) --> normalized text --> LLM (Qwen3) --> display text
24
- "camel climb flight level zero nine zero" "CAMEL climb FL090"
 
25
  ```
26
 
 
 
27
  ## Models
28
 
29
  ### [ASR/](./ASR) — Whisper Large v3 (CTranslate2 float16)
@@ -32,72 +37,65 @@ Fine-tuned for Singapore military ATC speech. Uses CTranslate2 float16 format fo
32
 
33
  | Metric | Value |
34
  |--------|-------|
35
- | WER | **0.24%** |
36
- | Base model | `jacktol/whisper-large-v3-finetuned-for-ATC` |
37
  | Size | 2.9 GB |
38
- | Training data | 6,730 entries (6,680 synthetic + 50 real recordings) |
 
 
39
 
40
- ### [LLM/](./LLM) Qwen3-1.7B Display Formatter
41
 
42
- Converts normalized ASR output into structured ATC display text (uppercases callsigns, contracts flight levels, formats frequencies, etc.).
43
 
44
  | Metric | Value |
45
  |--------|-------|
46
  | Exact match | **100%** (161/161) |
47
  | Base model | `unsloth/Qwen3-1.7B` |
48
  | Size | 3.3 GB |
49
- | Training data | 1,915 examples |
50
 
51
- ## Pipeline Architecture
52
-
53
- In production, the models are chained with **confidence-based routing**:
54
-
55
- - **ASR confidence >= 90%** — rule-based formatter (23 deterministic rules, <1ms, 0 VRAM)
56
- - **ASR confidence < 90%** — LLM formatter (handles noisy/ambiguous ASR output better)
57
 
58
  ```
59
- Audio --> VAD (Silero) --> ASR (Whisper ct2) --> Post-processing
60
- |
61
- confidence >= 0.90?
62
- / \
63
- yes no
64
- | |
65
- Rule formatter LLM formatter
66
- | |
67
- \ /
68
- --> Display text
69
  ```
70
 
71
- | State | VRAM |
72
- |-------|------|
73
- | ASR only (startup) | ~2 GB |
74
- | ASR + LLM (after first low-confidence call) | ~5.5 GB |
 
 
 
75
 
76
  ## Domain
77
 
78
  Singapore military ATC covering:
79
  - **Airbases**: Tengah (WSAT, runway 18/36), Paya Lebar (WSAP, runway 02/20)
80
- - **Aircraft**: F-16C/D, F-15SG, C-130
81
- - **Approaches**: ILS, GCA, PAR, TACAN, DVOR/DME, Visual Straight-in
82
- - **60 callsigns**: CAMEL, NINJA, BEETLE, TAIPAN, HONDA, etc.
83
  - **Categories**: departure, approach, handoff, maneuver, landing, emergency, ground, recovery, pilot reports, military-specific ops
84
 
85
  ## Training History
86
 
87
  ### ASR
88
 
89
- | Run | WER | Key Change |
90
- |-----|-----|------------|
91
- | ct2_run5 | 0.48% | Initial fine-tune, pitch shift augmentation |
92
- | ct2_run6 | 0.40% | Removed pitch shift, added BPF/silence padding, weight decay |
93
- | **ct2_run7** | **0.24%** | Continued training, frozen encoder, +50 real recordings |
 
94
 
95
- ### LLM
 
 
96
 
97
  | Run | Accuracy | Key Change |
98
  |-----|----------|------------|
99
  | llm_run3 | 98.1% (Qwen3-8B) | QLoRA 4-bit, 871 examples |
100
- | **llm_run4** | **100%** (Qwen3-1.7B) | bf16 LoRA, 1,915 examples with ASR noise augmentation |
101
 
102
  ## Quick Start
103
 
@@ -111,33 +109,15 @@ segments, info = model.transcribe("audio.wav", language="en", beam_size=5)
111
  text = " ".join(seg.text.strip() for seg in segments)
112
  ```
113
 
114
- ### LLM
115
-
116
- ```python
117
- from transformers import AutoModelForCausalLM, AutoTokenizer
118
-
119
- model = AutoModelForCausalLM.from_pretrained("./LLM", torch_dtype="auto", device_map="auto")
120
- tokenizer = AutoTokenizer.from_pretrained("./LLM")
121
-
122
- messages = [
123
- {"role": "system", "content": "Convert the following air traffic control transcript into structured display text."},
124
- {"role": "user", "content": "camel climb flight level zero nine zero"},
125
- ]
126
- text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
127
- inputs = tokenizer(text, return_tensors="pt").to(model.device)
128
- outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.3, top_p=0.9, top_k=30)
129
- result = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
130
- ```
131
-
132
- ## Download
133
 
134
  ```bash
135
- # Full repo
136
  huggingface-cli download aether-raid/astra-atc-models --local-dir ./models
137
 
138
- # ASR only
139
  huggingface-cli download aether-raid/astra-atc-models --include "ASR/*" --local-dir ./models
140
 
141
- # LLM only
142
  huggingface-cli download aether-raid/astra-atc-models --include "LLM/*" --local-dir ./models
143
  ```
 
17
 
18
  # ASTRA ATC Models
19
 
20
+ Fine-tuned models for Singapore military air traffic control, built for the [ASTRA](https://github.com/aether-raid) training simulator.
21
+
22
+ ## Pipeline
23
 
24
  ```
25
+ Audio --> VAD (Silero) --> ASR (Whisper) --> Rule Formatter --> Display Text
26
+ "camel climb flight level zero nine zero"
27
+ "CAMEL climb FL090"
28
  ```
29
 
30
+ The production pipeline uses a **rule-based formatter** (23 deterministic rules, <1ms, 0 VRAM) instead of the LLM. The LLM is retained for reference.
31
+
32
  ## Models
33
 
34
  ### [ASR/](./ASR) — Whisper Large v3 (CTranslate2 float16)
 
37
 
38
  | Metric | Value |
39
  |--------|-------|
40
+ | WER | **0.66%** |
41
+ | Base model | `openai/whisper-large-v3` |
42
  | Size | 2.9 GB |
43
+ | Training | Full fine-tune with enhanced VHF radio augmentation |
44
+
45
+ ### [LLM/](./LLM) — Qwen3-1.7B Display Formatter (Legacy)
46
 
47
+ > **Legacy.** Superseded by a deterministic rule-based formatter. Retained for reference.
48
 
49
+ Converts normalized ASR output into structured ATC display text.
50
 
51
  | Metric | Value |
52
  |--------|-------|
53
  | Exact match | **100%** (161/161) |
54
  | Base model | `unsloth/Qwen3-1.7B` |
55
  | Size | 3.3 GB |
 
56
 
57
+ ## Architecture
 
 
 
 
 
58
 
59
  ```
60
+ Audio --> VAD (Silero) --> ASR (Whisper ct2) --> Post-processing --> Rule Formatter --> Display Text
 
 
 
 
 
 
 
 
 
61
  ```
62
 
63
+ | Component | Technology | Latency | VRAM |
64
+ |-----------|-----------|---------|------|
65
+ | VAD | Silero VAD (ONNX) | ~50ms | <100 MB |
66
+ | ASR | Whisper Large v3 (CTranslate2) | ~500ms-2s | ~2 GB |
67
+ | Formatter | 23 deterministic rules | <1ms | 0 MB |
68
+
69
+ Total VRAM: ~2 GB (ASR only).
70
 
71
  ## Domain
72
 
73
  Singapore military ATC covering:
74
  - **Airbases**: Tengah (WSAT, runway 18/36), Paya Lebar (WSAP, runway 02/20)
75
+ - **Aircraft**: F-16C/D, F-15SG, C-130, Hercules
76
+ - **Approaches**: ILS, GCA, PAR, TACAN, DVOR/DME, VOR/DME, Visual Straight-in
77
+ - **100+ callsigns**: CAMEL, NINJA, BEETLE, TAIPAN, MAVERICK, JAGUAR, LANCER, etc.
78
  - **Categories**: departure, approach, handoff, maneuver, landing, emergency, ground, recovery, pilot reports, military-specific ops
79
 
80
  ## Training History
81
 
82
  ### ASR
83
 
84
+ | Run | WER | Base | Key Change |
85
+ |-----|-----|------|------------|
86
+ | ct2_run5 | 0.48% | jacktol/whisper-large-v3-finetuned-for-ATC | Initial fine-tune |
87
+ | ct2_run6 | 0.40% | jacktol/whisper-large-v3-finetuned-for-ATC | +augmentation, weight decay |
88
+ | ct2_run7 | 0.24% | jacktol/whisper-large-v3-finetuned-for-ATC | Frozen encoder, +50 real recordings |
89
+ | **ct2_run8** | **0.66%** | openai/whisper-large-v3 | Full retrain from base, enhanced augmentation |
90
 
91
+ > ct2_run8 trains from the original Whisper base for better generalisation to real-world ATC audio.
92
+
93
+ ### LLM (Legacy)
94
 
95
  | Run | Accuracy | Key Change |
96
  |-----|----------|------------|
97
  | llm_run3 | 98.1% (Qwen3-8B) | QLoRA 4-bit, 871 examples |
98
+ | llm_run4 | 100% (Qwen3-1.7B) | bf16 LoRA, 1,915 examples with ASR noise augmentation |
99
 
100
  ## Quick Start
101
 
 
109
  text = " ".join(seg.text.strip() for seg in segments)
110
  ```
111
 
112
+ ### Download
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113
 
114
  ```bash
115
+ # Full repo (ASR + LLM)
116
  huggingface-cli download aether-raid/astra-atc-models --local-dir ./models
117
 
118
+ # ASR only (recommended)
119
  huggingface-cli download aether-raid/astra-atc-models --include "ASR/*" --local-dir ./models
120
 
121
+ # LLM only (legacy)
122
  huggingface-cli download aether-raid/astra-atc-models --include "LLM/*" --local-dir ./models
123
  ```