asrfinetuned — Nigerian-Accented English ASR (Whisper Fine-tune)

Overview

This model is a fine-tuned version of NCAIR1/NigerianAccentedEnglish, which is itself a whisper-small model previously adapted for Nigerian-accented English. This repository continues that adaptation with an additional round of supervised fine-tuning, targeting the well-known problem that mainstream ASR systems (trained mostly on American/British English) show elevated error rates on African-accented speech. The result is a Whisper-based transcription model oriented toward Nigerian-accented English audio.

Training Details

Detail	Value
Base model	NCAIR1/NigerianAccentedEnglish (a `whisper-small` derivative)
Architecture	`WhisperForConditionalGeneration` (12 encoder layers, 12 decoder layers, d_model=768, 12 attention heads)
Method	Supervised fine-tuning (Seq2Seq Trainer)
Sampling rate	16,000 Hz (`WhisperFeatureExtractor`, 80 mel bins, 30s chunks)
Learning rate	1e-4, linear schedule, 100 warmup steps
Batch size	8 per device, gradient accumulation 2 (effective batch size 16)
Optimizer	AdamW (betas 0.9/0.999, eps 1e-8)
Precision	Native AMP (mixed precision)
Training steps	500 (~2.94 epochs)
Final training loss / validation loss	0.2982 / 1.0108
Final validation WER	0.4772 (on the held-out split used during this training run; dataset not documented)
Framework versions	Transformers 4.46.3, PyTorch 2.4.1+cu121, Datasets 2.19.0, Tokenizers 0.20.3

Training/validation progression logged during fine-tuning:

Training Loss	Epoch	Step	Validation Loss	WER
1.1563	0.5882	100	1.0900	0.5863
0.5809	1.1765	200	1.0982	0.6652
0.5527	1.7647	300	1.0261	0.5772
0.3345	2.3529	400	1.0422	0.4854
0.2982	2.9412	500	1.0108	0.4772

These numbers come directly from the training run's logged metrics. The training dataset itself is not documented in this repository, so the WER above should be read as an internal validation-set result from that run rather than a benchmark on a public, named test set.

Intended Use

Transcribing Nigerian-accented English speech (e.g. call center audio, voice notes, interviews, educational content, accessibility/voice-to-text tools) where mainstream ASR models tend to underperform due to accent mismatch.
Experimentation and further fine-tuning for African-accented English ASR research.

Not intended for: languages other than English, heavily code-switched audio without further adaptation, or high-stakes decisions made without human review.

How to Use

import torch
import librosa
from transformers import WhisperProcessor, WhisperForConditionalGeneration

model_id = "Ephraimmm/asrfinetuned"

processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)

# Load audio resampled to 16 kHz (required by the feature extractor)
audio, sr = librosa.load("path/to/audio.wav", sr=16000)

input_features = processor(
    audio, sampling_rate=sr, return_tensors="pt"
).input_features

with torch.no_grad():
    predicted_ids = model.generate(input_features)

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])

Or with the pipeline API:

from transformers import pipeline

asr = pipeline("automatic-speech-recognition", model="Ephraimmm/asrfinetuned")
result = asr("path/to/audio.wav")
print(result["text"])

Limitations

The only WER figure available (0.4772) is from the internal validation split logged during this specific training run; the underlying dataset is not documented in this repository, so the metric should not be treated as a standardized benchmark result.
No independent third-party evaluation (e.g. on a named public test set) has been published for this checkpoint.
As a further fine-tune of an already narrowly-adapted Nigerian-English model, performance on accents/dialects outside that training distribution, noisy audio, or non-English speech is unverified.
Training data provenance, size, and licensing are not documented; users should evaluate suitability for their own use case before production deployment.

Author

Developed by Ephraimmm.

Downloads last month: 18

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for Ephraimmm/asrfinetuned

Base model

openai/whisper-small

Finetuned

NCAIR1/NigerianAccentedEnglish