Whisper Large V3 - Afrikaans

This model is a fine-tuned version of openai/whisper-large-v3 for Afrikaans automatic speech recognition (ASR). It uses LoRA (Low-Rank Adaptation) for efficient fine-tuning, achieving strong performance on Afrikaans transcription tasks.

Model Description

  • Base Model: OpenAI Whisper Large V3
  • Language: Afrikaans (af)
  • Training Method: LoRA (PEFT)
  • Task: Automatic Speech Recognition
  • License: Apache 2.0

Performance

The model achieves a Word Error Rate (WER) of 12.85% on the evaluation set, representing a 46% improvement over the baseline performance at the start of training (23.92% WER).

Training Progress

Step Epoch Training Loss Validation Loss WER (%)
500 1.65 0.2050 0.2360 23.92
1000 3.30 0.1476 0.2091 19.07
1500 4.95 0.1192 0.1995 14.13
2000 6.60 0.0916 0.2031 14.01
2500 8.25 0.0668 0.2093 13.10
3000 9.90 0.0566 0.2142 13.07
3500 11.55 0.0477 0.2226 13.36
4000 13.20 0.0440 0.2270 13.81
4500 14.85 0.0431 0.2301 12.85

Intended Uses & Limitations

Intended Uses

  • Transcribing Afrikaans speech to text
  • Building Afrikaans voice assistants and dictation systems
  • Creating subtitles for Afrikaans audio/video content
  • Accessibility applications for Afrikaans speakers
  • Research in low-resource language ASR

Limitations

  • Optimized specifically for Afrikaans; performance on other languages will vary
  • Performance may degrade on:
    • Noisy or low-quality audio
    • Strong accents or dialects not well-represented in training data
    • Domain-specific terminology
    • Multi-speaker scenarios with overlapping speech
  • As a LoRA adapter, requires the base Whisper Large V3 model to function

Usage

Requirements

pip install transformers peft accelerate torch

Inference Example

from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel, PeftConfig
import torch

# Load base model and processor
base_model_name = "openai/whisper-large-v3"
processor = WhisperProcessor.from_pretrained(base_model_name)
model = WhisperForConditionalGeneration.from_pretrained(base_model_name)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "YOUR_USERNAME/whisper-large-v3-afrikaans")

# Prepare audio
# audio should be a 16kHz mono audio array
input_features = processor(
    audio, 
    sampling_rate=16000, 
    return_tensors="pt"
).input_features

# Generate transcription
forced_decoder_ids = processor.get_decoder_prompt_ids(language="af", task="transcribe")
predicted_ids = model.generate(
    input_features,
    forced_decoder_ids=forced_decoder_ids
)

# Decode transcription
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Training Procedure

Training Hyperparameters

  • Learning Rate: 1e-4 (linear decay with 50 warmup steps)
  • Batch Size: 8 per device
  • Gradient Accumulation Steps: 2
  • Effective Batch Size: 16
  • Optimizer: AdamW (fused implementation)
    • betas: (0.9, 0.999)
    • epsilon: 1e-08
  • Training Steps: 4,500
  • Mixed Precision: Native AMP (fp16)
  • Gradient Checkpointing: Enabled
  • Evaluation Strategy: Every 500 steps

Training Infrastructure

  • Framework: Transformers 4.57.3
  • PEFT Version: 0.18.0
  • PyTorch: 2.9.0+cu126
  • Datasets: 4.0.0

Training Features

  • LoRA adaptation for efficient fine-tuning
  • Gradient checkpointing for memory efficiency
  • Mixed precision training (FP16)
  • Regular evaluation and checkpoint saving

Citation

If you use this model, please cite:

@misc{whisper-large-v3-afrikaans,
  author = Andre Oosthuizen,
  title = {Whisper Large V3 Afrikaans},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/andreoosthuizen/whisper-large-v3-afrikaans}}
}

@article{radford2022whisper,
  title={Robust speech recognition via large-scale weak supervision},
  author={Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  journal={arXiv preprint arXiv:2212.04356},
  year={2022}
}

Acknowledgments

This model was fine-tuned using OpenAI's Whisper Large V3 as the base model. Thanks to the Hugging Face team for the Transformers and PEFT libraries that made this training possible.

Downloads last month
472
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for andreoosthuizen/whisper-large-v3-afrikaans

Adapter
(197)
this model

Dataset used to train andreoosthuizen/whisper-large-v3-afrikaans

Paper for andreoosthuizen/whisper-large-v3-afrikaans

Evaluation results