--- language: kl license: mit model_name: VoiceLessQ/whisper-tiny-kalaallisut tags: - whisper - fine-tuning - kalaallisut - speech-recognition - openai-whisper model_type: speech-to-text widget: - src: path_to_sample_audio_file.wav --- This model still spits gibberish and not good enough. Still gonna add more to this model for a while and see if its improving. # Whisper Tiny Fine-Tuned on Kalaallisut (Greenlandic) 🌍 This is a fine-tuned version of the [Whisper Tiny](https://huggingface.co/openai/whisper-tiny) model by OpenAI, adapted to the **Kalaallisut** (Greenlandic) language. The model has been trained and optimized to handle transcriptions specifically for this language, which is historically underrepresented in speech recognition models. ### 📚 Training Process This model was carefully trained on a dataset of **Kalaallisut** audio files paired with transcriptions. Special care was taken to avoid overfitting, which occurred in earlier versions of this fine-tuning process. After reworking the training approach, including tweaking hyperparameters and employing early stopping to monitor model performance, the final **Word Error Rate (WER)** was reduced significantly to: 1.81% ### ⚙️ Features and Improvements - **Reduced Overfitting**: This version addresses overfitting by employing early stopping with fine-tuned patience and threshold settings to halt training when improvements stalled, ensuring the model generalized better to unseen data. - **Kalaallisut Language Support**: Whisper's multi-lingual capabilities are fine-tuned specifically for the unique phonetics and structure of Kalaallisut. - **Optimized for Whisper Tiny**: Even though this model is based on the smallest variant of Whisper (Tiny), it still achieves strong performance in transcription tasks for Kalaallisut. ### 📊 Performance Metrics - **Word Error Rate (WER)**: 1.81% - **Train Loss**: 0.77 after 50 epochs Usually trigged by Early Stopping Criteria incoded to the code. ### How to Use ```python from transformers import WhisperProcessor, WhisperForConditionalGeneration import torch # Load the processor and model processor = WhisperProcessor.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut") model = WhisperForConditionalGeneration.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut") # Load audio (example usage) audio_file = "path_to_audio_file.wav" input_features = processor(audio_file, return_tensors="pt").input_features # Generate transcription with torch.no_grad(): generated_ids = model.generate(input_features) transcription = processor.batch_decode(generated_ids, skip_special_tokens=True) print(transcription)