RanenSim commited on
Commit
6d47469
·
1 Parent(s): f338e91

feat: update ASR model to ct2_run9

Browse files
Files changed (3) hide show
  1. ASR/hyperparameters.md +35 -19
  2. ASR/model.bin +1 -1
  3. README.md +3 -4
ASR/hyperparameters.md CHANGED
@@ -1,4 +1,4 @@
1
- # Hyperparameters — Whisper ATC Fine-tune
2
 
3
  ## Model
4
 
@@ -30,12 +30,20 @@
30
  | Gradient checkpointing | Yes (use_reentrant=False) |
31
  | Mixed precision | fp16 |
32
  | Max grad norm | 1.0 |
33
- | Max epochs (configured) | 25 |
34
- | Early stop patience | 5 epochs |
35
  | Label smoothing | 0.0 |
36
  | Freeze encoder | No |
37
  | Seed | 42 |
38
 
 
 
 
 
 
 
 
 
39
  ## Augmentation
40
 
41
  - Gaussian noise (p=0.4, amplitude 0.001–0.015)
@@ -51,32 +59,40 @@
51
  | Key | Value |
52
  |-----|-------|
53
  | Metric | WER (lower is better) |
54
- | Stopped at | Step 6919 / Epoch 11 |
55
- | Patience | 5 epochs |
56
 
57
  ## Results
58
 
59
  | Epoch | Eval loss | WER |
60
  |-------|-----------|-----|
61
- | 1.0 | 0.0496 | 3.46% |
62
- | 2.0 | 0.0288 | 1.84% |
63
- | 3.0 | 0.0239 | 0.82% |
64
- | 4.0 | 0.0245 | 1.55% |
65
- | 5.0 | 0.0195 | 0.92% |
66
- | 6.0 | 0.0231 | **0.66%** ← best |
67
- | 7.0 | 0.0199 | 0.70% |
68
- | 8.0 | 0.0211 | 2.62% |
69
- | 9.0 | 0.0191 | 0.72% |
70
- | 10.0 | 0.0186 | 4.43% |
71
- | 11.0 | 0.0172 | 0.69% |
 
 
 
 
 
 
 
 
72
 
73
- Best checkpoint: `training/output_run8/checkpoint-3774` (epoch 6, WER 0.66%)
74
 
75
  ## Output
76
 
77
  | Key | Value |
78
  |-----|-------|
79
- | Best HF checkpoint | `training/output_run8/best/` |
80
- | CTranslate2 model | `training/saved_models/ct2_run8/` |
81
  | Quantization | float16 |
82
  | Inference backend | faster-whisper |
 
1
+ # Hyperparameters — Whisper ATC Fine-tune (Run 9)
2
 
3
  ## Model
4
 
 
30
  | Gradient checkpointing | Yes (use_reentrant=False) |
31
  | Mixed precision | fp16 |
32
  | Max grad norm | 1.0 |
33
+ | Max epochs (configured) | 30 |
34
+ | Early stop patience | 7 epochs |
35
  | Label smoothing | 0.0 |
36
  | Freeze encoder | No |
37
  | Seed | 42 |
38
 
39
+ ## Data Sources
40
+
41
+ | Source | Role | Size |
42
+ |--------|------|------|
43
+ | axite_all.json | SG military ATC synthetic (4 voices + human) | ~15,716 |
44
+ | deepdml/conversations | Real Singapore Changi ATC VHF radio | ~1,443 |
45
+ | mnsc-part1-test | MNSC SG-accented read speech | ~3,000 |
46
+
47
  ## Augmentation
48
 
49
  - Gaussian noise (p=0.4, amplitude 0.001–0.015)
 
59
  | Key | Value |
60
  |-----|-------|
61
  | Metric | WER (lower is better) |
62
+ | Stopped at | Step 21185 / Epoch 19 |
63
+ | Patience | 7 epochs |
64
 
65
  ## Results
66
 
67
  | Epoch | Eval loss | WER |
68
  |-------|-----------|-----|
69
+ | 1.0 | 0.0838 | 11.46% |
70
+ | 2.0 | 0.0550 | 4.28% |
71
+ | 3.0 | 0.0406 | 2.79% |
72
+ | 4.0 | 0.0417 | 6.58% |
73
+ | 5.0 | 0.0381 | 5.46% |
74
+ | 6.0 | 0.0372 | 3.27% |
75
+ | 7.0 | 0.0375 | 1.39% |
76
+ | 8.0 | 0.0381 | 5.52% |
77
+ | 9.0 | 0.0188 | 0.83% |
78
+ | 10.0 | 0.0202 | 0.84% |
79
+ | 11.0 | 0.0185 | 1.05% |
80
+ | 12.0 | 0.0189 | **0.82%** ← best |
81
+ | 13.0 | 0.0189 | 0.95% |
82
+ | 14.0 | 0.0202 | 1.19% |
83
+ | 15.0 | 0.0206 | 0.91% |
84
+ | 16.0 | 0.0191 | 1.16% |
85
+ | 17.0 | 0.0169 | 1.12% |
86
+ | 18.0 | 0.0176 | 1.19% |
87
+ | 19.0 | 0.0185 | 1.19% |
88
 
89
+ Best checkpoint: `training/output_run9/checkpoint-13380` (epoch 12, WER 0.82%)
90
 
91
  ## Output
92
 
93
  | Key | Value |
94
  |-----|-------|
95
+ | Best HF checkpoint | `training/output_run9/best/` |
96
+ | CTranslate2 model | `training/saved_models/ct2_run9/` |
97
  | Quantization | float16 |
98
  | Inference backend | faster-whisper |
ASR/model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d9c466b737a94599b153a4e396dc51e321283e911b8ef59d28e687ff72564874
3
  size 3087284237
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aead4ca85c75d51f0386ffd4b7469f0ec4cb9fd1eba04c3d23960196a3bc851e
3
  size 3087284237
README.md CHANGED
@@ -37,7 +37,7 @@ Fine-tuned for Singapore military ATC speech. Uses CTranslate2 float16 format fo
37
 
38
  | Metric | Value |
39
  |--------|-------|
40
- | WER | **0.66%** |
41
  | Base model | `openai/whisper-large-v3` |
42
  | Size | 2.9 GB |
43
  | Training | Full fine-tune with enhanced VHF radio augmentation |
@@ -86,9 +86,8 @@ Singapore military ATC covering:
86
  | ct2_run5 | 0.48% | jacktol/whisper-large-v3-finetuned-for-ATC | Initial fine-tune |
87
  | ct2_run6 | 0.40% | jacktol/whisper-large-v3-finetuned-for-ATC | +augmentation, weight decay |
88
  | ct2_run7 | 0.24% | jacktol/whisper-large-v3-finetuned-for-ATC | Frozen encoder, +50 real recordings |
89
- | **ct2_run8** | **0.66%** | openai/whisper-large-v3 | Full retrain from base, enhanced augmentation |
90
-
91
- > ct2_run8 trains from the original Whisper base for better generalisation to real-world ATC audio.
92
 
93
  ### LLM (Legacy)
94
 
 
37
 
38
  | Metric | Value |
39
  |--------|-------|
40
+ | WER | **0.82%** |
41
  | Base model | `openai/whisper-large-v3` |
42
  | Size | 2.9 GB |
43
  | Training | Full fine-tune with enhanced VHF radio augmentation |
 
86
  | ct2_run5 | 0.48% | jacktol/whisper-large-v3-finetuned-for-ATC | Initial fine-tune |
87
  | ct2_run6 | 0.40% | jacktol/whisper-large-v3-finetuned-for-ATC | +augmentation, weight decay |
88
  | ct2_run7 | 0.24% | jacktol/whisper-large-v3-finetuned-for-ATC | Frozen encoder, +50 real recordings |
89
+ | ct2_run8 | 0.66% | openai/whisper-large-v3 | Full retrain from base, enhanced augmentation |
90
+ | **ct2_run9** | **0.82%** | openai/whisper-large-v3 | Expanded dataset (+MNSC, +deepdml, 17.8k train), 19 epochs |
 
91
 
92
  ### LLM (Legacy)
93