| | --- |
| | license: cc-by-nc-nd-4.0 |
| | datasets: |
| | - openslr |
| | language: |
| | - gl |
| | pipeline_tag: automatic-speech-recognition |
| | tags: |
| | - ITG |
| | - PyTorch |
| | - Transformers |
| | - whisper |
| | - whisper-base |
| | --- |
| | |
| | # Whisper Base Galician |
| |
|
| | ## Description |
| | |
| | This is a fine-tuned version of the [openai/whisper-base](https://huggingface.co/openai/whisper-base) pre-trained model for ASR in galician. |
| |
|
| | --- |
| |
|
| | ## Dataset |
| |
|
| | We used one of the datasets available in the openslr repository, the [OpenSLR galician](https://huggingface.co/datasets/openslr/viewer/SLR77). |
| |
|
| | --- |
| |
|
| |
|
| | ## Example inference script |
| |
|
| | ### Check this example script to run our model in inference mode |
| |
|
| | ```python |
| | import torch |
| | from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq |
| | |
| | filename = "demo.wav" #change this line to the name of your audio file |
| | sample_rate = 16_000 |
| | processor = AutoProcessor.from_pretrained('ITG/whisper-base-gl') |
| | model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/whisper-base-gl') |
| | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') |
| | model.to(device) |
| | |
| | with torch.no_grad(): |
| | speech_array, _ = librosa.load(filename, sr=sample_rate) |
| | inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt").to(device) |
| | input_features = inputs.input_features |
| | generated_ids = model.generate(inputs=input_features, max_length=225) |
| | decode_output = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] |
| | print(f"ASR Galician whisper-base output: {decode_output}") |
| | ``` |
| | --- |
| |
|
| | ## Fine-tuning hyper-parameters |
| |
|
| | | **Hyper-parameter** | **Value** | |
| | |:----------------------------------------:|:---------------------------:| |
| | | Training batch size | 16 | |
| | | Evaluation batch size | 8 | |
| | | Learning rate | 3e-5 | |
| | | Gradient checkpointing | true | |
| | | Gradient accumulation steps | 1 | |
| | | Max training epochs | 100 | |
| | | Max steps | 4000 | |
| | | Generate max length | 225 | |
| | | Warmup training steps (%) | 12,5% | |
| | | FP16 | true | |
| | | Metric for best model | wer | |
| | | Greater is better | false | |
| |
|
| |
|
| | ## Fine-tuning in a different dataset or style |
| |
|
| | If you're interested in fine-tuning your own whisper model, we suggest starting with the [openai/whisper-base model](https://huggingface.co/openai/whisper-base). Additionally, you may find the Transformers |
| | step-by-step guide for [fine-tuning whisper on multilingual ASR datasets](https://huggingface.co/blog/fine-tune-whisper) to be a valuable resource. This guide served as a helpful reference during the training |
| | process of this Galician whisper-base model! |
| |
|
| |
|
| |
|
| |
|