mozilla-foundation/common_voice_17_0
Updated • 5.57k • 16
How to use surafelabebe/whisper-small-am with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="surafelabebe/whisper-small-am") # Load model directly
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
processor = AutoProcessor.from_pretrained("surafelabebe/whisper-small-am")
model = AutoModelForSpeechSeq2Seq.from_pretrained("surafelabebe/whisper-small-am")This model is a fine-tuned version of openai/whisper-small on the Common Voice 17.0 and surafelabebe/fleurs_am (a subset of google/fleurs) datasets. It achieves the following results on the evaluation set:
The model was trained for 10 hours on T4 GPU. Training results indicate potential overfitting. Future improvements will focus on mitigating this by incorporating a larger dataset, extended training epochs, and dropout regularization.
from transformers import pipeline
pipe = pipeline(model="surafelabebe/whisper-small-am")
text = pipe("sample.wav")["text"] # change to "your audio file name"
print(text)
| Input | Output Transcript |
|---|---|
| አቶ ቦጋለ መብራቱ ወይዘሮ ውድነሽ በታሙም ባገቡ በሁለተኛው አመት መጫረሻ ወንድሪክ ሰውለደላቸውን | |
| ከሰብ ለሚሁን ከወይዘሮ ትሩ ወይም ከአብት ሺሰር ጋር ልዩሩ ጉዳይ ኖሮት አይደለም |
The fine-tuning process followed a similar procedure to that described in this blog post.
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Wer |
|---|---|---|---|---|
| 0.0108 | 9.6154 | 1000 | 0.3446 | 54.9759 |
| 0.0009 | 19.2308 | 2000 | 0.4052 | 51.7570 |
| 0.0001 | 28.8462 | 3000 | 0.4277 | 50.9388 |
| 0.0001 | 38.4615 | 4000 | 0.4352 | 50.9657 |
Base model
openai/whisper-small