VoiceLessQ
/

whisper-tiny-kalaallisut

speech-recognition

Model card Files Files and versions

whisper-tiny-kalaallisut / README.md

VoiceLessQ's picture

Update README.md

310e658 verified about 1 year ago

|

history blame contribute delete

2.66 kB

	---
	language: kl
	license: mit
	model_name: VoiceLessQ/whisper-tiny-kalaallisut
	tags:
	- whisper
	- fine-tuning
	- kalaallisut
	- speech-recognition
	- openai-whisper
	model_type: speech-to-text
	widget:
	- src: path_to_sample_audio_file.wav
	---
	This model still spits gibberish and not good enough. Still gonna add more to this model for a while and see if its improving.


	# Whisper Tiny Fine-Tuned on Kalaallisut (Greenlandic) 🌍

	This is a fine-tuned version of the [Whisper Tiny](https://huggingface.co/openai/whisper-tiny) model by OpenAI, adapted to the Kalaallisut (Greenlandic) language. The model has been trained and optimized to handle transcriptions specifically for this language, which is historically underrepresented in speech recognition models.

	### 📚 Training Process

	This model was carefully trained on a dataset of Kalaallisut audio files paired with transcriptions. Special care was taken to avoid overfitting, which occurred in earlier versions of this fine-tuning process. After reworking the training approach, including tweaking hyperparameters and employing early stopping to monitor model performance, the final Word Error Rate (WER) was reduced significantly to:

	1.81%

	### ⚙️ Features and Improvements

	- Reduced Overfitting: This version addresses overfitting by employing early stopping with fine-tuned patience and threshold settings to halt training when improvements stalled, ensuring the model generalized better to unseen data.
	- Kalaallisut Language Support: Whisper's multi-lingual capabilities are fine-tuned specifically for the unique phonetics and structure of Kalaallisut.
	- Optimized for Whisper Tiny: Even though this model is based on the smallest variant of Whisper (Tiny), it still achieves strong performance in transcription tasks for Kalaallisut.

	### 📊 Performance Metrics

	- Word Error Rate (WER): 1.81%
	- Train Loss: 0.77 after 50 epochs

	Usually trigged by Early Stopping Criteria incoded to the code.

	### How to Use

	```python
	from transformers import WhisperProcessor, WhisperForConditionalGeneration
	import torch

	# Load the processor and model
	processor = WhisperProcessor.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut")
	model = WhisperForConditionalGeneration.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut")

	# Load audio (example usage)
	audio_file = "path_to_audio_file.wav"
	input_features = processor(audio_file, return_tensors="pt").input_features

	# Generate transcription
	with torch.no_grad():
	generated_ids = model.generate(input_features)
	transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)
	print(transcription)