Whisper Small Pt-Br - RFard

This model is a fine-tuned version of openai/whisper-small on the Common Voice 17.0 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2593
  • Wer: 17.3791

Model description

Whisper Small Pt-Br - RFard is an automatic speech recognition (ASR) model based on Whisper Small by OpenAI. It has been fine-tuned on the Common Voice 17.0 dataset from Mozilla for the Portuguese language (pt-BR), making it more efficient at transcribing speech to text in this language.

The model uses a transformer-based encoder-decoder architecture optimized for ASR tasks, leveraging Whisper's structure to improve transcription accuracy across various audio sources, including different accents and regional variations of Brazilian Portuguese.

With a Word Error Rate (WER) of 17.38%, the model performs well in transcription tasks but may struggle with noisy audio or overlapping speech.

Intended uses & limitations

This model is designed for automatic speech recognition (ASR) in Brazilian Portuguese, making it suitable for tasks such as speech-to-text transcription, voice assistants, automatic subtitles, and other applications that require converting spoken language into written text.

Limitations

The main limitation of this model is that it was not trained for more epochs due to hardware constraints. Extending the training process could further improve its accuracy and robustness, especially in challenging audio conditions such as noisy environments or overlapping speech.

If you are interested in collaborating on further development and improving the model’s performance, feel free to reach out—I am open to cooperation!

Training and evaluation data

For the experiments, the Common Voice 17.0 dataset was used, which was loaded and segmented into two subsets: the training set, composed of 31,432 samples, and the test set, with 9,467 samples.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 2
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.1831 1.0173 1000 0.2593 17.3791

Framework versions

  • Transformers 4.50.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.4.1
  • Tokenizers 0.21.1

Contact

For any inquiries, collaborations, or contributions to the model’s development, feel free to reach out:

📧 Email: rodrigo.correa.fardin@gmail.com

I am open to discussions and potential improvements to the model! 🚀

Downloads last month
3
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RodrigoFardin/whisper-small-pt-br

Finetuned
(3288)
this model

Dataset used to train RodrigoFardin/whisper-small-pt-br

Evaluation results