asrfinetuned โ€” Nigerian-Accented English ASR (Whisper Fine-tune)

Overview

This model is a fine-tuned version of NCAIR1/NigerianAccentedEnglish, which is itself a whisper-small model previously adapted for Nigerian-accented English. This repository continues that adaptation with an additional round of supervised fine-tuning, targeting the well-known problem that mainstream ASR systems (trained mostly on American/British English) show elevated error rates on African-accented speech. The result is a Whisper-based transcription model oriented toward Nigerian-accented English audio.

Training Details

Detail Value
Base model NCAIR1/NigerianAccentedEnglish (a whisper-small derivative)
Architecture WhisperForConditionalGeneration (12 encoder layers, 12 decoder layers, d_model=768, 12 attention heads)
Method Supervised fine-tuning (Seq2Seq Trainer)
Sampling rate 16,000 Hz (WhisperFeatureExtractor, 80 mel bins, 30s chunks)
Learning rate 1e-4, linear schedule, 100 warmup steps
Batch size 8 per device, gradient accumulation 2 (effective batch size 16)
Optimizer AdamW (betas 0.9/0.999, eps 1e-8)
Precision Native AMP (mixed precision)
Training steps 500 (~2.94 epochs)
Final training loss / validation loss 0.2982 / 1.0108
Final validation WER 0.4772 (on the held-out split used during this training run; dataset not documented)
Framework versions Transformers 4.46.3, PyTorch 2.4.1+cu121, Datasets 2.19.0, Tokenizers 0.20.3

Training/validation progression logged during fine-tuning:

Training Loss Epoch Step Validation Loss WER
1.1563 0.5882 100 1.0900 0.5863
0.5809 1.1765 200 1.0982 0.6652
0.5527 1.7647 300 1.0261 0.5772
0.3345 2.3529 400 1.0422 0.4854
0.2982 2.9412 500 1.0108 0.4772

These numbers come directly from the training run's logged metrics. The training dataset itself is not documented in this repository, so the WER above should be read as an internal validation-set result from that run rather than a benchmark on a public, named test set.

Intended Use

  • Transcribing Nigerian-accented English speech (e.g. call center audio, voice notes, interviews, educational content, accessibility/voice-to-text tools) where mainstream ASR models tend to underperform due to accent mismatch.
  • Experimentation and further fine-tuning for African-accented English ASR research.

Not intended for: languages other than English, heavily code-switched audio without further adaptation, or high-stakes decisions made without human review.

How to Use

import torch
import librosa
from transformers import WhisperProcessor, WhisperForConditionalGeneration

model_id = "Ephraimmm/asrfinetuned"

processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)

# Load audio resampled to 16 kHz (required by the feature extractor)
audio, sr = librosa.load("path/to/audio.wav", sr=16000)

input_features = processor(
    audio, sampling_rate=sr, return_tensors="pt"
).input_features

with torch.no_grad():
    predicted_ids = model.generate(input_features)

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])

Or with the pipeline API:

from transformers import pipeline

asr = pipeline("automatic-speech-recognition", model="Ephraimmm/asrfinetuned")
result = asr("path/to/audio.wav")
print(result["text"])

Limitations

  • The only WER figure available (0.4772) is from the internal validation split logged during this specific training run; the underlying dataset is not documented in this repository, so the metric should not be treated as a standardized benchmark result.
  • No independent third-party evaluation (e.g. on a named public test set) has been published for this checkpoint.
  • As a further fine-tune of an already narrowly-adapted Nigerian-English model, performance on accents/dialects outside that training distribution, noisy audio, or non-English speech is unverified.
  • Training data provenance, size, and licensing are not documented; users should evaluate suitability for their own use case before production deployment.

Author

Developed by Ephraimmm.

Downloads last month
18
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Ephraimmm/asrfinetuned

Finetuned
(1)
this model

Spaces using Ephraimmm/asrfinetuned 2