File size: 2,505 Bytes

7e6d32b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62f6cd5
e7df9a7
62f6cd5
e7df9a7
62f6cd5
e7df9a7
62f6cd5
e7df9a7
62f6cd5
e7df9a7
62f6cd5
e7df9a7
62f6cd5
 
 
e7df9a7
62f6cd5
e7df9a7
62f6cd5
 
e7df9a7
62f6cd5
e7df9a7
62f6cd5
 
 
e7df9a7
62f6cd5
 
 
e7df9a7
62f6cd5
 
 
e7df9a7
62f6cd5

---
language: kl
tags:
- whisper
- fine-tuning
- kalaallisut
- speech-recognition
- openai-whisper
license: apache-2.0
model_name: VoiceLessQ/whisper-tiny-kalaallisut
model_type: speech-to-text
widget:
- src: path_to_sample_audio_file.wav
---

# Whisper Tiny Fine-Tuned on Kalaallisut (Greenlandic) 🌍

This is a fine-tuned version of the [Whisper Tiny](https://huggingface.co/openai/whisper-tiny) model by OpenAI, adapted to the **Kalaallisut** (Greenlandic) language. The model has been trained and optimized to handle transcriptions specifically for this language, which is historically underrepresented in speech recognition models.

### 📚 Training Process

This model was carefully trained on a dataset of **Kalaallisut** audio files paired with transcriptions. Special care was taken to avoid overfitting, which occurred in earlier versions of this fine-tuning process. After reworking the training approach, including tweaking hyperparameters and employing early stopping to monitor model performance, the final **Word Error Rate (WER)** was reduced significantly to:

- **Final Validation WER: 7.62%**

### ⚙️ Features and Improvements

- **Reduced Overfitting**: This version addresses overfitting by employing early stopping with fine-tuned patience and threshold settings to halt training when improvements stalled, ensuring the model generalized better to unseen data.
- **Kalaallisut Language Support**: Whisper's multi-lingual capabilities are fine-tuned specifically for the unique phonetics and structure of Kalaallisut.
- **Optimized for Whisper Tiny**: Even though this model is based on the smallest variant of Whisper (Tiny), it still achieves strong performance in transcription tasks for Kalaallisut.

### 📊 Performance Metrics

- **Word Error Rate (WER)**: 7.62%
- **Train Loss**: 6.69 after 5 epochs

### How to Use

```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch

# Load the processor and model
processor = WhisperProcessor.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut")
model = WhisperForConditionalGeneration.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut")

# Load audio (example usage)
audio_file = "path_to_audio_file.wav"
input_features = processor(audio_file, return_tensors="pt").input_features

# Generate transcription
with torch.no_grad():
    generated_ids = model.generate(input_features)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(transcription)