aether-raid
/

astra-atc-models

@@ -1,4 +1,4 @@
-# Hyperparameters — Whisper ATC Fine-tune
 ## Model
@@ -30,12 +30,20 @@
 | Gradient checkpointing | Yes (use_reentrant=False) |
 | Mixed precision | fp16 |
 | Max grad norm | 1.0 |
-| Max epochs (configured) | 25 |
-| Early stop patience | 5 epochs |
 | Label smoothing | 0.0 |
 | Freeze encoder | No |
 | Seed | 42 |
 ## Augmentation
 - Gaussian noise (p=0.4, amplitude 0.001–0.015)
@@ -51,32 +59,40 @@
 | Key | Value |
 |-----|-------|
 | Metric | WER (lower is better) |
-| Stopped at | Step 6919 / Epoch 11 |
-| Patience | 5 epochs |
 ## Results
 | Epoch | Eval loss | WER |
 |-------|-----------|-----|
-| 1.0 | 0.0496 | 3.46% |
-| 2.0 | 0.0288 | 1.84% |
-| 3.0 | 0.0239 | 0.82% |
-| 4.0 | 0.0245 | 1.55% |
-| 5.0 | 0.0195 | 0.92% |
-| 6.0 | 0.0231 | **0.66%** ← best |
-| 7.0 | 0.0199 | 0.70% |
-| 8.0 | 0.0211 | 2.62% |
-| 9.0 | 0.0191 | 0.72% |
-| 10.0 | 0.0186 | 4.43% |
-| 11.0 | 0.0172 | 0.69% |
-Best checkpoint: `training/output_run8/checkpoint-3774` (epoch 6, WER 0.66%)
 ## Output
 | Key | Value |
 |-----|-------|
-| Best HF checkpoint | `training/output_run8/best/` |
-| CTranslate2 model | `training/saved_models/ct2_run8/` |
 | Quantization | float16 |
 | Inference backend | faster-whisper |

+# Hyperparameters — Whisper ATC Fine-tune (Run 9)
 ## Model
 | Gradient checkpointing | Yes (use_reentrant=False) |
 | Mixed precision | fp16 |
 | Max grad norm | 1.0 |
+| Max epochs (configured) | 30 |
+| Early stop patience | 7 epochs |
 | Label smoothing | 0.0 |
 | Freeze encoder | No |
 | Seed | 42 |
+## Data Sources
+| Source | Role | Size |
+|--------|------|------|
+| axite_all.json | SG military ATC synthetic (4 voices + human) | ~15,716 |
+| deepdml/conversations | Real Singapore Changi ATC VHF radio | ~1,443 |
+| mnsc-part1-test | MNSC SG-accented read speech | ~3,000 |
 ## Augmentation
 - Gaussian noise (p=0.4, amplitude 0.001–0.015)
 | Key | Value |
 |-----|-------|
 | Metric | WER (lower is better) |
+| Stopped at | Step 21185 / Epoch 19 |
+| Patience | 7 epochs |
 ## Results
 | Epoch | Eval loss | WER |
 |-------|-----------|-----|
+| 1.0 | 0.0838 | 11.46% |
+| 2.0 | 0.0550 | 4.28% |
+| 3.0 | 0.0406 | 2.79% |
+| 4.0 | 0.0417 | 6.58% |
+| 5.0 | 0.0381 | 5.46% |
+| 6.0 | 0.0372 | 3.27% |
+| 7.0 | 0.0375 | 1.39% |
+| 8.0 | 0.0381 | 5.52% |
+| 9.0 | 0.0188 | 0.83% |
+| 10.0 | 0.0202 | 0.84% |
+| 11.0 | 0.0185 | 1.05% |
+| 12.0 | 0.0189 | **0.82%** ← best |
+| 13.0 | 0.0189 | 0.95% |
+| 14.0 | 0.0202 | 1.19% |
+| 15.0 | 0.0206 | 0.91% |
+| 16.0 | 0.0191 | 1.16% |
+| 17.0 | 0.0169 | 1.12% |
+| 18.0 | 0.0176 | 1.19% |
+| 19.0 | 0.0185 | 1.19% |
+Best checkpoint: `training/output_run9/checkpoint-13380` (epoch 12, WER 0.82%)
 ## Output
 | Key | Value |
 |-----|-------|
+| Best HF checkpoint | `training/output_run9/best/` |
+| CTranslate2 model | `training/saved_models/ct2_run9/` |
 | Quantization | float16 |
 | Inference backend | faster-whisper |

ASR/model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d9c466b737a94599b153a4e396dc51e321283e911b8ef59d28e687ff72564874
 size 3087284237

 version https://git-lfs.github.com/spec/v1
+oid sha256:aead4ca85c75d51f0386ffd4b7469f0ec4cb9fd1eba04c3d23960196a3bc851e
 size 3087284237

README.md CHANGED Viewed

@@ -37,7 +37,7 @@ Fine-tuned for Singapore military ATC speech. Uses CTranslate2 float16 format fo
 | Metric | Value |
 |--------|-------|
-| WER | **0.66%** |
 | Base model | `openai/whisper-large-v3` |
 | Size | 2.9 GB |
 | Training | Full fine-tune with enhanced VHF radio augmentation |
@@ -86,9 +86,8 @@ Singapore military ATC covering:
 | ct2_run5 | 0.48% | jacktol/whisper-large-v3-finetuned-for-ATC | Initial fine-tune |
 | ct2_run6 | 0.40% | jacktol/whisper-large-v3-finetuned-for-ATC | +augmentation, weight decay |
 | ct2_run7 | 0.24% | jacktol/whisper-large-v3-finetuned-for-ATC | Frozen encoder, +50 real recordings |
-| **ct2_run8** | **0.66%** | openai/whisper-large-v3 | Full retrain from base, enhanced augmentation |
-> ct2_run8 trains from the original Whisper base for better generalisation to real-world ATC audio.
 ### LLM (Legacy)

 | Metric | Value |
 |--------|-------|
+| WER | **0.82%** |
 | Base model | `openai/whisper-large-v3` |
 | Size | 2.9 GB |
 | Training | Full fine-tune with enhanced VHF radio augmentation |
 | ct2_run5 | 0.48% | jacktol/whisper-large-v3-finetuned-for-ATC | Initial fine-tune |
 | ct2_run6 | 0.40% | jacktol/whisper-large-v3-finetuned-for-ATC | +augmentation, weight decay |
 | ct2_run7 | 0.24% | jacktol/whisper-large-v3-finetuned-for-ATC | Frozen encoder, +50 real recordings |
+| ct2_run8 | 0.66% | openai/whisper-large-v3 | Full retrain from base, enhanced augmentation |
+| **ct2_run9** | **0.82%** | openai/whisper-large-v3 | Expanded dataset (+MNSC, +deepdml, 17.8k train), 19 epochs |
 ### LLM (Legacy)