File size: 2,663 Bytes
7e6d32b
 
e339d4b
e60b603
7e6d32b
 
 
 
 
 
 
 
 
 
29d6635
 
7e6d32b
62f6cd5
e7df9a7
62f6cd5
e7df9a7
62f6cd5
e7df9a7
62f6cd5
80f1927
310e658
e7df9a7
62f6cd5
e7df9a7
62f6cd5
 
 
e7df9a7
62f6cd5
e7df9a7
2af44d0
 
 
 
e7df9a7
62f6cd5
e7df9a7
62f6cd5
 
 
e7df9a7
62f6cd5
 
 
e7df9a7
62f6cd5
 
 
e7df9a7
62f6cd5
 
 
 
e339d4b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
language: kl
license: mit
model_name: VoiceLessQ/whisper-tiny-kalaallisut
tags:
- whisper
- fine-tuning
- kalaallisut
- speech-recognition
- openai-whisper
model_type: speech-to-text
widget:
- src: path_to_sample_audio_file.wav
---
This model still spits gibberish and not good enough. Still gonna add more to this model for a while and see if its improving.


# Whisper Tiny Fine-Tuned on Kalaallisut (Greenlandic) 🌍

This is a fine-tuned version of the [Whisper Tiny](https://huggingface.co/openai/whisper-tiny) model by OpenAI, adapted to the **Kalaallisut** (Greenlandic) language. The model has been trained and optimized to handle transcriptions specifically for this language, which is historically underrepresented in speech recognition models.

### πŸ“š Training Process

This model was carefully trained on a dataset of **Kalaallisut** audio files paired with transcriptions. Special care was taken to avoid overfitting, which occurred in earlier versions of this fine-tuning process. After reworking the training approach, including tweaking hyperparameters and employing early stopping to monitor model performance, the final **Word Error Rate (WER)** was reduced significantly to:

1.81%

### βš™οΈ Features and Improvements

- **Reduced Overfitting**: This version addresses overfitting by employing early stopping with fine-tuned patience and threshold settings to halt training when improvements stalled, ensuring the model generalized better to unseen data.
- **Kalaallisut Language Support**: Whisper's multi-lingual capabilities are fine-tuned specifically for the unique phonetics and structure of Kalaallisut.
- **Optimized for Whisper Tiny**: Even though this model is based on the smallest variant of Whisper (Tiny), it still achieves strong performance in transcription tasks for Kalaallisut.

### πŸ“Š Performance Metrics

- **Word Error Rate (WER)**: 1.81%
- **Train Loss**: 0.77 after 50 epochs

Usually trigged by Early Stopping Criteria incoded to the code.

### How to Use

```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch

# Load the processor and model
processor = WhisperProcessor.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut")
model = WhisperForConditionalGeneration.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut")

# Load audio (example usage)
audio_file = "path_to_audio_file.wav"
input_features = processor(audio_file, return_tensors="pt").input_features

# Generate transcription
with torch.no_grad():
    generated_ids = model.generate(input_features)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(transcription)