Automatic Speech Recognition
NeMo
Safetensors
English
parakeet
whisper
qwen3
ctranslate2
text-generation
air-traffic-control
atc
singapore
military
Instructions to use aether-raid/astra-atc-models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use aether-raid/astra-atc-models with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("aether-raid/astra-atc-models") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
feat: update ASR model to ct2_run9
Browse files- ASR/hyperparameters.md +35 -19
- ASR/model.bin +1 -1
- README.md +3 -4
ASR/hyperparameters.md
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
# Hyperparameters — Whisper ATC Fine-tune
|
| 2 |
|
| 3 |
## Model
|
| 4 |
|
|
@@ -30,12 +30,20 @@
|
|
| 30 |
| Gradient checkpointing | Yes (use_reentrant=False) |
|
| 31 |
| Mixed precision | fp16 |
|
| 32 |
| Max grad norm | 1.0 |
|
| 33 |
-
| Max epochs (configured) |
|
| 34 |
-
| Early stop patience |
|
| 35 |
| Label smoothing | 0.0 |
|
| 36 |
| Freeze encoder | No |
|
| 37 |
| Seed | 42 |
|
| 38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
## Augmentation
|
| 40 |
|
| 41 |
- Gaussian noise (p=0.4, amplitude 0.001–0.015)
|
|
@@ -51,32 +59,40 @@
|
|
| 51 |
| Key | Value |
|
| 52 |
|-----|-------|
|
| 53 |
| Metric | WER (lower is better) |
|
| 54 |
-
| Stopped at | Step
|
| 55 |
-
| Patience |
|
| 56 |
|
| 57 |
## Results
|
| 58 |
|
| 59 |
| Epoch | Eval loss | WER |
|
| 60 |
|-------|-----------|-----|
|
| 61 |
-
| 1.0 | 0.
|
| 62 |
-
| 2.0 | 0.
|
| 63 |
-
| 3.0 | 0.
|
| 64 |
-
| 4.0 | 0.
|
| 65 |
-
| 5.0 | 0.
|
| 66 |
-
| 6.0 | 0.
|
| 67 |
-
| 7.0 | 0.
|
| 68 |
-
| 8.0 | 0.
|
| 69 |
-
| 9.0 | 0.
|
| 70 |
-
| 10.0 | 0.
|
| 71 |
-
| 11.0 | 0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
|
| 73 |
-
Best checkpoint: `training/
|
| 74 |
|
| 75 |
## Output
|
| 76 |
|
| 77 |
| Key | Value |
|
| 78 |
|-----|-------|
|
| 79 |
-
| Best HF checkpoint | `training/
|
| 80 |
-
| CTranslate2 model | `training/saved_models/
|
| 81 |
| Quantization | float16 |
|
| 82 |
| Inference backend | faster-whisper |
|
|
|
|
| 1 |
+
# Hyperparameters — Whisper ATC Fine-tune (Run 9)
|
| 2 |
|
| 3 |
## Model
|
| 4 |
|
|
|
|
| 30 |
| Gradient checkpointing | Yes (use_reentrant=False) |
|
| 31 |
| Mixed precision | fp16 |
|
| 32 |
| Max grad norm | 1.0 |
|
| 33 |
+
| Max epochs (configured) | 30 |
|
| 34 |
+
| Early stop patience | 7 epochs |
|
| 35 |
| Label smoothing | 0.0 |
|
| 36 |
| Freeze encoder | No |
|
| 37 |
| Seed | 42 |
|
| 38 |
|
| 39 |
+
## Data Sources
|
| 40 |
+
|
| 41 |
+
| Source | Role | Size |
|
| 42 |
+
|--------|------|------|
|
| 43 |
+
| axite_all.json | SG military ATC synthetic (4 voices + human) | ~15,716 |
|
| 44 |
+
| deepdml/conversations | Real Singapore Changi ATC VHF radio | ~1,443 |
|
| 45 |
+
| mnsc-part1-test | MNSC SG-accented read speech | ~3,000 |
|
| 46 |
+
|
| 47 |
## Augmentation
|
| 48 |
|
| 49 |
- Gaussian noise (p=0.4, amplitude 0.001–0.015)
|
|
|
|
| 59 |
| Key | Value |
|
| 60 |
|-----|-------|
|
| 61 |
| Metric | WER (lower is better) |
|
| 62 |
+
| Stopped at | Step 21185 / Epoch 19 |
|
| 63 |
+
| Patience | 7 epochs |
|
| 64 |
|
| 65 |
## Results
|
| 66 |
|
| 67 |
| Epoch | Eval loss | WER |
|
| 68 |
|-------|-----------|-----|
|
| 69 |
+
| 1.0 | 0.0838 | 11.46% |
|
| 70 |
+
| 2.0 | 0.0550 | 4.28% |
|
| 71 |
+
| 3.0 | 0.0406 | 2.79% |
|
| 72 |
+
| 4.0 | 0.0417 | 6.58% |
|
| 73 |
+
| 5.0 | 0.0381 | 5.46% |
|
| 74 |
+
| 6.0 | 0.0372 | 3.27% |
|
| 75 |
+
| 7.0 | 0.0375 | 1.39% |
|
| 76 |
+
| 8.0 | 0.0381 | 5.52% |
|
| 77 |
+
| 9.0 | 0.0188 | 0.83% |
|
| 78 |
+
| 10.0 | 0.0202 | 0.84% |
|
| 79 |
+
| 11.0 | 0.0185 | 1.05% |
|
| 80 |
+
| 12.0 | 0.0189 | **0.82%** ← best |
|
| 81 |
+
| 13.0 | 0.0189 | 0.95% |
|
| 82 |
+
| 14.0 | 0.0202 | 1.19% |
|
| 83 |
+
| 15.0 | 0.0206 | 0.91% |
|
| 84 |
+
| 16.0 | 0.0191 | 1.16% |
|
| 85 |
+
| 17.0 | 0.0169 | 1.12% |
|
| 86 |
+
| 18.0 | 0.0176 | 1.19% |
|
| 87 |
+
| 19.0 | 0.0185 | 1.19% |
|
| 88 |
|
| 89 |
+
Best checkpoint: `training/output_run9/checkpoint-13380` (epoch 12, WER 0.82%)
|
| 90 |
|
| 91 |
## Output
|
| 92 |
|
| 93 |
| Key | Value |
|
| 94 |
|-----|-------|
|
| 95 |
+
| Best HF checkpoint | `training/output_run9/best/` |
|
| 96 |
+
| CTranslate2 model | `training/saved_models/ct2_run9/` |
|
| 97 |
| Quantization | float16 |
|
| 98 |
| Inference backend | faster-whisper |
|
ASR/model.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 3087284237
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:aead4ca85c75d51f0386ffd4b7469f0ec4cb9fd1eba04c3d23960196a3bc851e
|
| 3 |
size 3087284237
|
README.md
CHANGED
|
@@ -37,7 +37,7 @@ Fine-tuned for Singapore military ATC speech. Uses CTranslate2 float16 format fo
|
|
| 37 |
|
| 38 |
| Metric | Value |
|
| 39 |
|--------|-------|
|
| 40 |
-
| WER | **0.
|
| 41 |
| Base model | `openai/whisper-large-v3` |
|
| 42 |
| Size | 2.9 GB |
|
| 43 |
| Training | Full fine-tune with enhanced VHF radio augmentation |
|
|
@@ -86,9 +86,8 @@ Singapore military ATC covering:
|
|
| 86 |
| ct2_run5 | 0.48% | jacktol/whisper-large-v3-finetuned-for-ATC | Initial fine-tune |
|
| 87 |
| ct2_run6 | 0.40% | jacktol/whisper-large-v3-finetuned-for-ATC | +augmentation, weight decay |
|
| 88 |
| ct2_run7 | 0.24% | jacktol/whisper-large-v3-finetuned-for-ATC | Frozen encoder, +50 real recordings |
|
| 89 |
-
|
|
| 90 |
-
|
| 91 |
-
> ct2_run8 trains from the original Whisper base for better generalisation to real-world ATC audio.
|
| 92 |
|
| 93 |
### LLM (Legacy)
|
| 94 |
|
|
|
|
| 37 |
|
| 38 |
| Metric | Value |
|
| 39 |
|--------|-------|
|
| 40 |
+
| WER | **0.82%** |
|
| 41 |
| Base model | `openai/whisper-large-v3` |
|
| 42 |
| Size | 2.9 GB |
|
| 43 |
| Training | Full fine-tune with enhanced VHF radio augmentation |
|
|
|
|
| 86 |
| ct2_run5 | 0.48% | jacktol/whisper-large-v3-finetuned-for-ATC | Initial fine-tune |
|
| 87 |
| ct2_run6 | 0.40% | jacktol/whisper-large-v3-finetuned-for-ATC | +augmentation, weight decay |
|
| 88 |
| ct2_run7 | 0.24% | jacktol/whisper-large-v3-finetuned-for-ATC | Frozen encoder, +50 real recordings |
|
| 89 |
+
| ct2_run8 | 0.66% | openai/whisper-large-v3 | Full retrain from base, enhanced augmentation |
|
| 90 |
+
| **ct2_run9** | **0.82%** | openai/whisper-large-v3 | Expanded dataset (+MNSC, +deepdml, 17.8k train), 19 epochs |
|
|
|
|
| 91 |
|
| 92 |
### LLM (Legacy)
|
| 93 |
|