whisper-tiny-kalaallisut / README.md

VoiceLessQ

Update README.md

7e6d32b verified about 1 year ago

preview code

raw

history blame

2.51 kB

metadata

language: kl
tags:
  - whisper
  - fine-tuning
  - kalaallisut
  - speech-recognition
  - openai-whisper
license: apache-2.0
model_name: VoiceLessQ/whisper-tiny-kalaallisut
model_type: speech-to-text
widget:
  - src: path_to_sample_audio_file.wav

Whisper Tiny Fine-Tuned on Kalaallisut (Greenlandic) 🌍

This is a fine-tuned version of the Whisper Tiny model by OpenAI, adapted to the Kalaallisut (Greenlandic) language. The model has been trained and optimized to handle transcriptions specifically for this language, which is historically underrepresented in speech recognition models.

📚 Training Process

This model was carefully trained on a dataset of Kalaallisut audio files paired with transcriptions. Special care was taken to avoid overfitting, which occurred in earlier versions of this fine-tuning process. After reworking the training approach, including tweaking hyperparameters and employing early stopping to monitor model performance, the final Word Error Rate (WER) was reduced significantly to:

Final Validation WER: 7.62%

⚙️ Features and Improvements

Reduced Overfitting: This version addresses overfitting by employing early stopping with fine-tuned patience and threshold settings to halt training when improvements stalled, ensuring the model generalized better to unseen data.
Kalaallisut Language Support: Whisper's multi-lingual capabilities are fine-tuned specifically for the unique phonetics and structure of Kalaallisut.
Optimized for Whisper Tiny: Even though this model is based on the smallest variant of Whisper (Tiny), it still achieves strong performance in transcription tasks for Kalaallisut.

📊 Performance Metrics

Word Error Rate (WER): 7.62%
Train Loss: 6.69 after 5 epochs

How to Use

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch

# Load the processor and model
processor = WhisperProcessor.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut")
model = WhisperForConditionalGeneration.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut")

# Load audio (example usage)
audio_file = "path_to_audio_file.wav"
input_features = processor(audio_file, return_tensors="pt").input_features

# Generate transcription
with torch.no_grad():
    generated_ids = model.generate(input_features)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(transcription)