danielrosehill commited on
Commit
6defe62
·
1 Parent(s): 30c8684
README.md CHANGED
@@ -17,7 +17,9 @@ language:
17
 
18
  # Whisper Hebrish: Whisper Large (Turbo V3) Fine-Tuned For English-Hebrew Immigrant Speech Patterns ("Hebrish")
19
 
20
- ![Demo using whisper.cpp converted model](screenshots/demos/1.png)
 
 
21
  Demo transcription. Whisper.cpp.
22
 
23
  # ASR For Mixed Speech Patterns
@@ -34,24 +36,41 @@ I recently created a personal fine-tune of Whisper.
34
 
35
  While I had the notebook code handy, I thought it would be worth seeing if I could fine-tune Whisper for this purpose, which is related to one of the most important use-cases for ASR fine-tuning: tuning ASR models which are inherently multilingual on underrepresented languages.
36
 
37
- ## Methodology
38
 
39
- I used Claude Code to generate a list of 500 Hebrew words which it believed English speakers may use in daily speech. I recorded a subset of these and added my own as they came to mind.
40
 
41
- I recorded three variations of each word in an attempt to buttress the reliability of the fine-tune. Where variations in pronunciation exist for common words, I recorded each variant.
42
 
43
- The dataset that this model was trained on preserves the original audio files and the ground truths - the latter in the `JSONL`.
 
 
 
 
 
 
 
 
 
 
44
 
 
45
 
46
- ## POCs
 
 
47
 
48
- ![Demo 2](screenshots/demos/3.png)
49
 
50
- ![Demo 3](screenshots/demos/4.png)
51
 
52
- ## Slightly Less Ridiculous
 
 
 
 
 
 
53
 
54
- ![Demo 3](screenshots/demos/5.png)
55
 
56
  ---
57
 
@@ -71,10 +90,11 @@ I used an A100 for the training run which ran across 10 epochs and lasted approx
71
  | **Improvement** | 63.8% reduction |
72
 
73
  **Baseline Performance:**
74
- ![Baseline WER](screenshots/baseline-wer.png)
 
75
 
76
  **Post-Training Performance:**
77
- ![Post-Training WER](screenshots/post-training-wer.png)
78
 
79
  Fine-tuning the Whisper Large V3 Turbo model on English-Hebrew code-switched data resulted in a **63.8% reduction in WER**, demonstrating significant improvement in transcribing mixed-language speech.
80
 
 
17
 
18
  # Whisper Hebrish: Whisper Large (Turbo V3) Fine-Tuned For English-Hebrew Immigrant Speech Patterns ("Hebrish")
19
 
20
+
21
+ ![alt text](screenshots/comparisons/1.png)
22
+
23
  Demo transcription. Whisper.cpp.
24
 
25
  # ASR For Mixed Speech Patterns
 
36
 
37
  While I had the notebook code handy, I thought it would be worth seeing if I could fine-tune Whisper for this purpose, which is related to one of the most important use-cases for ASR fine-tuning: tuning ASR models which are inherently multilingual on underrepresented languages.
38
 
39
+ ## Example
40
 
41
+ OpenAI Whisper Large (V3, Turbo) vs. Fine Tune head to head.
42
 
43
+ Demo with two words in dataset: makolet (minimarket) and teudat zehut (ID card):
44
 
45
+ TRUTH:
46
+
47
+ ```
48
+ I went to the makolet today to pick up some bread, and I also got my teudat zehut.
49
+ ```
50
+
51
+ FINE-TUNE:
52
+
53
+ ```
54
+ I went to the makolet today to pick up some bread, and I also got my teudat zehut.
55
+ ```
56
 
57
+ STOCK WHISPER:
58
 
59
+ ```
60
+ I went to the Macaulay today to pick up some bread and I also got my Theodette Sahoot.
61
+ ```
62
 
 
63
 
64
+ ---
65
 
66
+ ## Methodology
67
+
68
+ I used Claude Code to generate a list of 500 Hebrew words which it believed English speakers may use in daily speech. I recorded a subset of these and added my own as they came to mind.
69
+
70
+ I recorded three variations of each word in an attempt to buttress the reliability of the fine-tune. Where variations in pronunciation exist for common words, I recorded each variant.
71
+
72
+ The dataset that this model was trained on preserves the original audio files and the ground truths - the latter in the `JSONL`.
73
 
 
74
 
75
  ---
76
 
 
90
  | **Improvement** | 63.8% reduction |
91
 
92
  **Baseline Performance:**
93
+
94
+ ![alt text](screenshots/params/baseline-wer.png)
95
 
96
  **Post-Training Performance:**
97
+ ![alt text](screenshots/params/post-training-wer.png)
98
 
99
  Fine-tuning the Whisper Large V3 Turbo model on English-Hebrew code-switched data resulted in a **63.8% reduction in WER**, demonstrating significant improvement in transcribing mixed-language speech.
100
 
screenshots/comparisons/1.png ADDED
screenshots/comparisons/example.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ FINE-TUNE:
2
+
3
+ I went to the makolet today to pick up some bread, and I also got my teudat zehut.
4
+
5
+ WHISPER BASE:
6
+
7
+ I went to the Macaulay today to pick up some bread and I also got my Theodette Sahoot.
screenshots/{baseline-wer.png → params/baseline-wer.png} RENAMED
File without changes
screenshots/{image.jpg → params/image.jpg} RENAMED
File without changes
screenshots/{post-training-wer.png → params/post-training-wer.png} RENAMED
File without changes
screenshots/{training.jpg → params/training.jpg} RENAMED
File without changes