Whisper Large V3 - Afrikaans

This model is a fine-tuned version of openai/whisper-large-v3 for Afrikaans automatic speech recognition (ASR). It uses LoRA (Low-Rank Adaptation) for efficient fine-tuning, achieving strong performance on Afrikaans transcription tasks.

Model Description

Base Model: OpenAI Whisper Large V3
Language: Afrikaans (af)
Training Method: LoRA (PEFT)
Task: Automatic Speech Recognition
License: Apache 2.0

Performance

The model achieves a Word Error Rate (WER) of 12.85% on the evaluation set, representing a 46% improvement over the baseline performance at the start of training (23.92% WER).

Training Progress

Step	Epoch	Training Loss	Validation Loss	WER (%)
500	1.65	0.2050	0.2360	23.92
1000	3.30	0.1476	0.2091	19.07
1500	4.95	0.1192	0.1995	14.13
2000	6.60	0.0916	0.2031	14.01
2500	8.25	0.0668	0.2093	13.10
3000	9.90	0.0566	0.2142	13.07
3500	11.55	0.0477	0.2226	13.36
4000	13.20	0.0440	0.2270	13.81
4500	14.85	0.0431	0.2301	12.85

Intended Uses & Limitations

Intended Uses

Transcribing Afrikaans speech to text
Building Afrikaans voice assistants and dictation systems
Creating subtitles for Afrikaans audio/video content
Accessibility applications for Afrikaans speakers
Research in low-resource language ASR

Limitations

Optimized specifically for Afrikaans; performance on other languages will vary
Performance may degrade on:
- Noisy or low-quality audio
- Strong accents or dialects not well-represented in training data
- Domain-specific terminology
- Multi-speaker scenarios with overlapping speech
As a LoRA adapter, requires the base Whisper Large V3 model to function

Usage

Requirements

pip install transformers peft accelerate torch

Inference Example

from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel, PeftConfig
import torch

# Load base model and processor
base_model_name = "openai/whisper-large-v3"
processor = WhisperProcessor.from_pretrained(base_model_name)
model = WhisperForConditionalGeneration.from_pretrained(base_model_name)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "YOUR_USERNAME/whisper-large-v3-afrikaans")

# Prepare audio
# audio should be a 16kHz mono audio array
input_features = processor(
    audio, 
    sampling_rate=16000, 
    return_tensors="pt"
).input_features

# Generate transcription
forced_decoder_ids = processor.get_decoder_prompt_ids(language="af", task="transcribe")
predicted_ids = model.generate(
    input_features,
    forced_decoder_ids=forced_decoder_ids
)

# Decode transcription
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Training Procedure

Training Hyperparameters

Learning Rate: 1e-4 (linear decay with 50 warmup steps)
Batch Size: 8 per device
Gradient Accumulation Steps: 2
Effective Batch Size: 16
Optimizer: AdamW (fused implementation)
- betas: (0.9, 0.999)
- epsilon: 1e-08
Training Steps: 4,500
Mixed Precision: Native AMP (fp16)
Gradient Checkpointing: Enabled
Evaluation Strategy: Every 500 steps

Training Infrastructure

Framework: Transformers 4.57.3
PEFT Version: 0.18.0
PyTorch: 2.9.0+cu126
Datasets: 4.0.0

Training Features

LoRA adaptation for efficient fine-tuning
Gradient checkpointing for memory efficiency
Mixed precision training (FP16)
Regular evaluation and checkpoint saving

Citation

If you use this model, please cite:

@misc{whisper-large-v3-afrikaans,
  author = Andre Oosthuizen,
  title = {Whisper Large V3 Afrikaans},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/andreoosthuizen/whisper-large-v3-afrikaans}}
}

@article{radford2022whisper,
  title={Robust speech recognition via large-scale weak supervision},
  author={Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  journal={arXiv preprint arXiv:2212.04356},
  year={2022}
}

Acknowledgments

This model was fine-tuned using OpenAI's Whisper Large V3 as the base model. Thanks to the Hugging Face team for the Transformers and PEFT libraries that made this training possible.

Downloads last month: 472

Model tree for andreoosthuizen/whisper-large-v3-afrikaans

Base model

openai/whisper-large-v3

Adapter

(197)

this model

Dataset used to train andreoosthuizen/whisper-large-v3-afrikaans

Paper for andreoosthuizen/whisper-large-v3-afrikaans

Robust Speech Recognition via Large-Scale Weak Supervision

Paper • 2212.04356 • Published Dec 6, 2022 • 53

Evaluation results

Word Error Rate
self-reported

12.850