commit

Browse files

Files changed (7) hide show

README.md +32 -12
screenshots/comparisons/1.png +1 -0
screenshots/comparisons/example.txt +7 -0
screenshots/{baseline-wer.png → params/baseline-wer.png} +0 -0
screenshots/{image.jpg → params/image.jpg} +0 -0
screenshots/{post-training-wer.png → params/post-training-wer.png} +0 -0
screenshots/{training.jpg → params/training.jpg} +0 -0

README.md CHANGED Viewed

@@ -17,7 +17,9 @@ language:
 # Whisper Hebrish: Whisper Large (Turbo V3) Fine-Tuned For English-Hebrew Immigrant Speech Patterns ("Hebrish")
-![Demo using whisper.cpp converted model](screenshots/demos/1.png)
 Demo transcription. Whisper.cpp.
 # ASR For Mixed Speech Patterns
@@ -34,24 +36,41 @@ I recently created a personal fine-tune of Whisper.
 While I had the notebook code handy, I thought it would be worth seeing if I could fine-tune Whisper for this purpose, which is related to one of the most important use-cases for ASR fine-tuning: tuning ASR models which are inherently multilingual on underrepresented languages.
-## Methodology
-I used Claude Code to generate a list of 500 Hebrew words which it believed English speakers may use in daily speech. I recorded a subset of these and added my own as they came to mind.
-I recorded three variations of each word in an attempt to buttress the reliability of the fine-tune. Where variations in pronunciation exist for common words, I recorded each variant.
-The dataset that this model was trained on preserves the original audio files and the ground truths - the latter in the `JSONL`.
-## POCs
-![Demo 2](screenshots/demos/3.png)
-![Demo 3](screenshots/demos/4.png)
-## Slightly Less Ridiculous
-![Demo 3](screenshots/demos/5.png)
 ---
@@ -71,10 +90,11 @@ I used an A100 for the training run which ran across 10 epochs and lasted approx
 | **Improvement**                 | 63.8% reduction |
 **Baseline Performance:**
-![Baseline WER](screenshots/baseline-wer.png)
 **Post-Training Performance:**
-![Post-Training WER](screenshots/post-training-wer.png)
 Fine-tuning the Whisper Large V3 Turbo model on English-Hebrew code-switched data resulted in a **63.8% reduction in WER**, demonstrating significant improvement in transcribing mixed-language speech.

 # Whisper Hebrish: Whisper Large (Turbo V3) Fine-Tuned For English-Hebrew Immigrant Speech Patterns ("Hebrish")
+![alt text](screenshots/comparisons/1.png)
 Demo transcription. Whisper.cpp.
 # ASR For Mixed Speech Patterns
 While I had the notebook code handy, I thought it would be worth seeing if I could fine-tune Whisper for this purpose, which is related to one of the most important use-cases for ASR fine-tuning: tuning ASR models which are inherently multilingual on underrepresented languages.
+## Example
+OpenAI Whisper Large (V3, Turbo) vs. Fine Tune head to head.
+Demo with two words in dataset: makolet (minimarket) and teudat zehut (ID card):
+TRUTH:
+```
+I went to the makolet today to pick up some bread, and I also got my teudat zehut.
+```
+FINE-TUNE:
+```
+I went to the makolet today to pick up some bread, and I also got my teudat zehut.
+```
+STOCK WHISPER:
+```
+ I went to the Macaulay today to pick up some bread and I also got my Theodette Sahoot.
+ ```
+---
+## Methodology
+I used Claude Code to generate a list of 500 Hebrew words which it believed English speakers may use in daily speech. I recorded a subset of these and added my own as they came to mind.
+I recorded three variations of each word in an attempt to buttress the reliability of the fine-tune. Where variations in pronunciation exist for common words, I recorded each variant.
+The dataset that this model was trained on preserves the original audio files and the ground truths - the latter in the `JSONL`.
 ---
 | **Improvement**                 | 63.8% reduction |
 **Baseline Performance:**
+![alt text](screenshots/params/baseline-wer.png)
 **Post-Training Performance:**
+![alt text](screenshots/params/post-training-wer.png)
 Fine-tuning the Whisper Large V3 Turbo model on English-Hebrew code-switched data resulted in a **63.8% reduction in WER**, demonstrating significant improvement in transcribing mixed-language speech.

screenshots/comparisons/1.png ADDED Viewed

screenshots/comparisons/example.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+FINE-TUNE:
+I went to the makolet today to pick up some bread, and I also got my teudat zehut.
+WHISPER BASE:
+I went to the Macaulay today to pick up some bread and I also got my Theodette Sahoot.

screenshots/{baseline-wer.png → params/baseline-wer.png} RENAMED Viewed

File without changes

screenshots/{image.jpg → params/image.jpg} RENAMED Viewed

File without changes

screenshots/{post-training-wer.png → params/post-training-wer.png} RENAMED Viewed

File without changes

screenshots/{training.jpg → params/training.jpg} RENAMED Viewed

File without changes