Robust Speech Recognition via Large-Scale Weak Supervision
Paper • 2212.04356 • Published • 53
This model is a fine-tuned version of openai/whisper-large-v3 for Afrikaans automatic speech recognition (ASR). It uses LoRA (Low-Rank Adaptation) for efficient fine-tuning, achieving strong performance on Afrikaans transcription tasks.
The model achieves a Word Error Rate (WER) of 12.85% on the evaluation set, representing a 46% improvement over the baseline performance at the start of training (23.92% WER).
| Step | Epoch | Training Loss | Validation Loss | WER (%) |
|---|---|---|---|---|
| 500 | 1.65 | 0.2050 | 0.2360 | 23.92 |
| 1000 | 3.30 | 0.1476 | 0.2091 | 19.07 |
| 1500 | 4.95 | 0.1192 | 0.1995 | 14.13 |
| 2000 | 6.60 | 0.0916 | 0.2031 | 14.01 |
| 2500 | 8.25 | 0.0668 | 0.2093 | 13.10 |
| 3000 | 9.90 | 0.0566 | 0.2142 | 13.07 |
| 3500 | 11.55 | 0.0477 | 0.2226 | 13.36 |
| 4000 | 13.20 | 0.0440 | 0.2270 | 13.81 |
| 4500 | 14.85 | 0.0431 | 0.2301 | 12.85 |
pip install transformers peft accelerate torch
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel, PeftConfig
import torch
# Load base model and processor
base_model_name = "openai/whisper-large-v3"
processor = WhisperProcessor.from_pretrained(base_model_name)
model = WhisperForConditionalGeneration.from_pretrained(base_model_name)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, "YOUR_USERNAME/whisper-large-v3-afrikaans")
# Prepare audio
# audio should be a 16kHz mono audio array
input_features = processor(
audio,
sampling_rate=16000,
return_tensors="pt"
).input_features
# Generate transcription
forced_decoder_ids = processor.get_decoder_prompt_ids(language="af", task="transcribe")
predicted_ids = model.generate(
input_features,
forced_decoder_ids=forced_decoder_ids
)
# Decode transcription
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
If you use this model, please cite:
@misc{whisper-large-v3-afrikaans,
author = Andre Oosthuizen,
title = {Whisper Large V3 Afrikaans},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/andreoosthuizen/whisper-large-v3-afrikaans}}
}
@article{radford2022whisper,
title={Robust speech recognition via large-scale weak supervision},
author={Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
journal={arXiv preprint arXiv:2212.04356},
year={2022}
}
This model was fine-tuned using OpenAI's Whisper Large V3 as the base model. Thanks to the Hugging Face team for the Transformers and PEFT libraries that made this training possible.
Base model
openai/whisper-large-v3