Whisper Small Pt-Br - RFard
This model is a fine-tuned version of openai/whisper-small on the Common Voice 17.0 dataset. It achieves the following results on the evaluation set:
- Loss: 0.2593
- Wer: 17.3791
Model description
Whisper Small Pt-Br - RFard is an automatic speech recognition (ASR) model based on Whisper Small by OpenAI. It has been fine-tuned on the Common Voice 17.0 dataset from Mozilla for the Portuguese language (pt-BR), making it more efficient at transcribing speech to text in this language.
The model uses a transformer-based encoder-decoder architecture optimized for ASR tasks, leveraging Whisper's structure to improve transcription accuracy across various audio sources, including different accents and regional variations of Brazilian Portuguese.
With a Word Error Rate (WER) of 17.38%, the model performs well in transcription tasks but may struggle with noisy audio or overlapping speech.
Intended uses & limitations
This model is designed for automatic speech recognition (ASR) in Brazilian Portuguese, making it suitable for tasks such as speech-to-text transcription, voice assistants, automatic subtitles, and other applications that require converting spoken language into written text.
Limitations
The main limitation of this model is that it was not trained for more epochs due to hardware constraints. Extending the training process could further improve its accuracy and robustness, especially in challenging audio conditions such as noisy environments or overlapping speech.
If you are interested in collaborating on further development and improving the model’s performance, feel free to reach out—I am open to cooperation!
Training and evaluation data
For the experiments, the Common Voice 17.0 dataset was used, which was loaded and segmented into two subsets: the training set, composed of 31,432 samples, and the test set, with 9,467 samples.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1000
- num_epochs: 2
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss | Wer |
|---|---|---|---|---|
| 0.1831 | 1.0173 | 1000 | 0.2593 | 17.3791 |
Framework versions
- Transformers 4.50.0
- Pytorch 2.6.0+cu124
- Datasets 3.4.1
- Tokenizers 0.21.1
Contact
For any inquiries, collaborations, or contributions to the model’s development, feel free to reach out:
📧 Email: rodrigo.correa.fardin@gmail.com
I am open to discussions and potential improvements to the model! 🚀
- Downloads last month
- 3
Model tree for RodrigoFardin/whisper-small-pt-br
Base model
openai/whisper-smallDataset used to train RodrigoFardin/whisper-small-pt-br
Evaluation results
- Wer on Common Voice 17.0self-reported17.379