|
|
--- |
|
|
language: kl |
|
|
license: mit |
|
|
model_name: VoiceLessQ/whisper-tiny-kalaallisut |
|
|
tags: |
|
|
- whisper |
|
|
- fine-tuning |
|
|
- kalaallisut |
|
|
- speech-recognition |
|
|
- openai-whisper |
|
|
model_type: speech-to-text |
|
|
widget: |
|
|
- src: path_to_sample_audio_file.wav |
|
|
--- |
|
|
This model still spits gibberish and not good enough. Still gonna add more to this model for a while and see if its improving. |
|
|
|
|
|
|
|
|
# Whisper Tiny Fine-Tuned on Kalaallisut (Greenlandic) π |
|
|
|
|
|
This is a fine-tuned version of the [Whisper Tiny](https://huggingface.co/openai/whisper-tiny) model by OpenAI, adapted to the **Kalaallisut** (Greenlandic) language. The model has been trained and optimized to handle transcriptions specifically for this language, which is historically underrepresented in speech recognition models. |
|
|
|
|
|
### π Training Process |
|
|
|
|
|
This model was carefully trained on a dataset of **Kalaallisut** audio files paired with transcriptions. Special care was taken to avoid overfitting, which occurred in earlier versions of this fine-tuning process. After reworking the training approach, including tweaking hyperparameters and employing early stopping to monitor model performance, the final **Word Error Rate (WER)** was reduced significantly to: |
|
|
|
|
|
1.81% |
|
|
|
|
|
### βοΈ Features and Improvements |
|
|
|
|
|
- **Reduced Overfitting**: This version addresses overfitting by employing early stopping with fine-tuned patience and threshold settings to halt training when improvements stalled, ensuring the model generalized better to unseen data. |
|
|
- **Kalaallisut Language Support**: Whisper's multi-lingual capabilities are fine-tuned specifically for the unique phonetics and structure of Kalaallisut. |
|
|
- **Optimized for Whisper Tiny**: Even though this model is based on the smallest variant of Whisper (Tiny), it still achieves strong performance in transcription tasks for Kalaallisut. |
|
|
|
|
|
### π Performance Metrics |
|
|
|
|
|
- **Word Error Rate (WER)**: 1.81% |
|
|
- **Train Loss**: 0.77 after 50 epochs |
|
|
|
|
|
Usually trigged by Early Stopping Criteria incoded to the code. |
|
|
|
|
|
### How to Use |
|
|
|
|
|
```python |
|
|
from transformers import WhisperProcessor, WhisperForConditionalGeneration |
|
|
import torch |
|
|
|
|
|
# Load the processor and model |
|
|
processor = WhisperProcessor.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut") |
|
|
model = WhisperForConditionalGeneration.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut") |
|
|
|
|
|
# Load audio (example usage) |
|
|
audio_file = "path_to_audio_file.wav" |
|
|
input_features = processor(audio_file, return_tensors="pt").input_features |
|
|
|
|
|
# Generate transcription |
|
|
with torch.no_grad(): |
|
|
generated_ids = model.generate(input_features) |
|
|
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True) |
|
|
print(transcription) |