feat: update ASR model, mark LLM as legacy

f338e91 4 days ago

3.92 kB

	---
	language:
	- en
	license: other
	tags:
	- whisper
	- qwen3
	- ctranslate2
	- automatic-speech-recognition
	- text-generation
	- air-traffic-control
	- atc
	- singapore
	- military
	pipeline_tag: automatic-speech-recognition
	---

	# ASTRA ATC Models

	Fine-tuned models for Singapore military air traffic control, built for the [ASTRA](https://github.com/aether-raid) training simulator.

	## Pipeline

	```
	Audio --> VAD (Silero) --> ASR (Whisper) --> Rule Formatter --> Display Text
	"camel climb flight level zero nine zero"
	"CAMEL climb FL090"
	```

	The production pipeline uses a rule-based formatter (23 deterministic rules, <1ms, 0 VRAM) instead of the LLM. The LLM is retained for reference.

	## Models

	### [ASR/](./ASR) — Whisper Large v3 (CTranslate2 float16)

	Fine-tuned for Singapore military ATC speech. Uses CTranslate2 float16 format for fast inference with [faster-whisper](https://github.com/SYSTRAN/faster-whisper).

	\| Metric \| Value \|
	\|--------\|-------\|
	\| WER \| 0.66% \|
	\| Base model \| `openai/whisper-large-v3` \|
	\| Size \| 2.9 GB \|
	\| Training \| Full fine-tune with enhanced VHF radio augmentation \|

	### [LLM/](./LLM) — Qwen3-1.7B Display Formatter (Legacy)

	> Legacy. Superseded by a deterministic rule-based formatter. Retained for reference.

	Converts normalized ASR output into structured ATC display text.

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Exact match \| 100% (161/161) \|
	\| Base model \| `unsloth/Qwen3-1.7B` \|
	\| Size \| 3.3 GB \|

	## Architecture

	```
	Audio --> VAD (Silero) --> ASR (Whisper ct2) --> Post-processing --> Rule Formatter --> Display Text
	```

	\| Component \| Technology \| Latency \| VRAM \|
	\|-----------\|-----------\|---------\|------\|
	\| VAD \| Silero VAD (ONNX) \| ~50ms \| <100 MB \|
	\| ASR \| Whisper Large v3 (CTranslate2) \| ~500ms-2s \| ~2 GB \|
	\| Formatter \| 23 deterministic rules \| <1ms \| 0 MB \|

	Total VRAM: ~2 GB (ASR only).

	## Domain

	Singapore military ATC covering:
	- Airbases: Tengah (WSAT, runway 18/36), Paya Lebar (WSAP, runway 02/20)
	- Aircraft: F-16C/D, F-15SG, C-130, Hercules
	- Approaches: ILS, GCA, PAR, TACAN, DVOR/DME, VOR/DME, Visual Straight-in
	- 100+ callsigns: CAMEL, NINJA, BEETLE, TAIPAN, MAVERICK, JAGUAR, LANCER, etc.
	- Categories: departure, approach, handoff, maneuver, landing, emergency, ground, recovery, pilot reports, military-specific ops

	## Training History

	### ASR

	\| Run \| WER \| Base \| Key Change \|
	\|-----\|-----\|------\|------------\|
	\| ct2_run5 \| 0.48% \| jacktol/whisper-large-v3-finetuned-for-ATC \| Initial fine-tune \|
	\| ct2_run6 \| 0.40% \| jacktol/whisper-large-v3-finetuned-for-ATC \| +augmentation, weight decay \|
	\| ct2_run7 \| 0.24% \| jacktol/whisper-large-v3-finetuned-for-ATC \| Frozen encoder, +50 real recordings \|
	\| ct2_run8 \| 0.66% \| openai/whisper-large-v3 \| Full retrain from base, enhanced augmentation \|

	> ct2_run8 trains from the original Whisper base for better generalisation to real-world ATC audio.

	### LLM (Legacy)

	\| Run \| Accuracy \| Key Change \|
	\|-----\|----------\|------------\|
	\| llm_run3 \| 98.1% (Qwen3-8B) \| QLoRA 4-bit, 871 examples \|
	\| llm_run4 \| 100% (Qwen3-1.7B) \| bf16 LoRA, 1,915 examples with ASR noise augmentation \|

	## Quick Start

	### ASR

	```python
	from faster_whisper import WhisperModel

	model = WhisperModel("./ASR", device="cuda", compute_type="float16")
	segments, info = model.transcribe("audio.wav", language="en", beam_size=5)
	text = " ".join(seg.text.strip() for seg in segments)
	```

	### Download

	```bash
	# Full repo (ASR + LLM)
	huggingface-cli download aether-raid/astra-atc-models --local-dir ./models

	# ASR only (recommended)
	huggingface-cli download aether-raid/astra-atc-models --include "ASR/*" --local-dir ./models

	# LLM only (legacy)
	huggingface-cli download aether-raid/astra-atc-models --include "LLM/*" --local-dir ./models
	```