ITG
/

whisper-base-gl

Automatic Speech Recognition

Model card Files Files and versions

whisper-base-gl / README.md

rgomez-itg's picture

Update README.md

edc2869 over 2 years ago

|

history blame contribute delete

3.08 kB

	---
	license: cc-by-nc-nd-4.0
	datasets:
	- openslr
	language:
	- gl
	pipeline_tag: automatic-speech-recognition
	tags:
	- ITG
	- PyTorch
	- Transformers
	- whisper
	- whisper-base
	---

	# Whisper Base Galician

	## Description

	This is a fine-tuned version of the [openai/whisper-base](https://huggingface.co/openai/whisper-base) pre-trained model for ASR in galician.

	---

	## Dataset

	We used one of the datasets available in the openslr repository, the [OpenSLR galician](https://huggingface.co/datasets/openslr/viewer/SLR77).

	---


	## Example inference script

	### Check this example script to run our model in inference mode

	```python
	import torch
	from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq

	filename = "demo.wav" #change this line to the name of your audio file
	sample_rate = 16_000
	processor = AutoProcessor.from_pretrained('ITG/whisper-base-gl')
	model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/whisper-base-gl')
	device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
	model.to(device)

	with torch.no_grad():
	speech_array, _ = librosa.load(filename, sr=sample_rate)
	inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt").to(device)
	input_features = inputs.input_features
	generated_ids = model.generate(inputs=input_features, max_length=225)
	decode_output = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(f"ASR Galician whisper-base output: {decode_output}")
	```
	---

	## Fine-tuning hyper-parameters

	\| Hyper-parameter \| Value \|
	\|:----------------------------------------:\|:---------------------------:\|
	\| Training batch size \| 16 \|
	\| Evaluation batch size \| 8 \|
	\| Learning rate \| 3e-5 \|
	\| Gradient checkpointing \| true \|
	\| Gradient accumulation steps \| 1 \|
	\| Max training epochs \| 100 \|
	\| Max steps \| 4000 \|
	\| Generate max length \| 225 \|
	\| Warmup training steps (%) \| 12,5% \|
	\| FP16 \| true \|
	\| Metric for best model \| wer \|
	\| Greater is better \| false \|


	## Fine-tuning in a different dataset or style

	If you're interested in fine-tuning your own whisper model, we suggest starting with the [openai/whisper-base model](https://huggingface.co/openai/whisper-base). Additionally, you may find the Transformers
	step-by-step guide for [fine-tuning whisper on multilingual ASR datasets](https://huggingface.co/blog/fine-tune-whisper) to be a valuable resource. This guide served as a helpful reference during the training
	process of this Galician whisper-base model!