VoiceLessQ's picture
Update README.md
310e658 verified
---
language: kl
license: mit
model_name: VoiceLessQ/whisper-tiny-kalaallisut
tags:
- whisper
- fine-tuning
- kalaallisut
- speech-recognition
- openai-whisper
model_type: speech-to-text
widget:
- src: path_to_sample_audio_file.wav
---
This model still spits gibberish and not good enough. Still gonna add more to this model for a while and see if its improving.
# Whisper Tiny Fine-Tuned on Kalaallisut (Greenlandic) 🌍
This is a fine-tuned version of the [Whisper Tiny](https://huggingface.co/openai/whisper-tiny) model by OpenAI, adapted to the **Kalaallisut** (Greenlandic) language. The model has been trained and optimized to handle transcriptions specifically for this language, which is historically underrepresented in speech recognition models.
### πŸ“š Training Process
This model was carefully trained on a dataset of **Kalaallisut** audio files paired with transcriptions. Special care was taken to avoid overfitting, which occurred in earlier versions of this fine-tuning process. After reworking the training approach, including tweaking hyperparameters and employing early stopping to monitor model performance, the final **Word Error Rate (WER)** was reduced significantly to:
1.81%
### βš™οΈ Features and Improvements
- **Reduced Overfitting**: This version addresses overfitting by employing early stopping with fine-tuned patience and threshold settings to halt training when improvements stalled, ensuring the model generalized better to unseen data.
- **Kalaallisut Language Support**: Whisper's multi-lingual capabilities are fine-tuned specifically for the unique phonetics and structure of Kalaallisut.
- **Optimized for Whisper Tiny**: Even though this model is based on the smallest variant of Whisper (Tiny), it still achieves strong performance in transcription tasks for Kalaallisut.
### πŸ“Š Performance Metrics
- **Word Error Rate (WER)**: 1.81%
- **Train Loss**: 0.77 after 50 epochs
Usually trigged by Early Stopping Criteria incoded to the code.
### How to Use
```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
# Load the processor and model
processor = WhisperProcessor.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut")
model = WhisperForConditionalGeneration.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut")
# Load audio (example usage)
audio_file = "path_to_audio_file.wav"
input_features = processor(audio_file, return_tensors="pt").input_features
# Generate transcription
with torch.no_grad():
generated_ids = model.generate(input_features)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(transcription)