Instructions to use Ephraimmm/asrfinetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ephraimmm/asrfinetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="Ephraimmm/asrfinetuned")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("Ephraimmm/asrfinetuned") model = AutoModelForSpeechSeq2Seq.from_pretrained("Ephraimmm/asrfinetuned") - Notebooks
- Google Colab
- Kaggle
asrfinetuned โ Nigerian-Accented English ASR (Whisper Fine-tune)
Overview
This model is a fine-tuned version of NCAIR1/NigerianAccentedEnglish, which is itself a whisper-small model previously adapted for Nigerian-accented English. This repository continues that adaptation with an additional round of supervised fine-tuning, targeting the well-known problem that mainstream ASR systems (trained mostly on American/British English) show elevated error rates on African-accented speech. The result is a Whisper-based transcription model oriented toward Nigerian-accented English audio.
Training Details
| Detail | Value |
|---|---|
| Base model | NCAIR1/NigerianAccentedEnglish (a whisper-small derivative) |
| Architecture | WhisperForConditionalGeneration (12 encoder layers, 12 decoder layers, d_model=768, 12 attention heads) |
| Method | Supervised fine-tuning (Seq2Seq Trainer) |
| Sampling rate | 16,000 Hz (WhisperFeatureExtractor, 80 mel bins, 30s chunks) |
| Learning rate | 1e-4, linear schedule, 100 warmup steps |
| Batch size | 8 per device, gradient accumulation 2 (effective batch size 16) |
| Optimizer | AdamW (betas 0.9/0.999, eps 1e-8) |
| Precision | Native AMP (mixed precision) |
| Training steps | 500 (~2.94 epochs) |
| Final training loss / validation loss | 0.2982 / 1.0108 |
| Final validation WER | 0.4772 (on the held-out split used during this training run; dataset not documented) |
| Framework versions | Transformers 4.46.3, PyTorch 2.4.1+cu121, Datasets 2.19.0, Tokenizers 0.20.3 |
Training/validation progression logged during fine-tuning:
| Training Loss | Epoch | Step | Validation Loss | WER |
|---|---|---|---|---|
| 1.1563 | 0.5882 | 100 | 1.0900 | 0.5863 |
| 0.5809 | 1.1765 | 200 | 1.0982 | 0.6652 |
| 0.5527 | 1.7647 | 300 | 1.0261 | 0.5772 |
| 0.3345 | 2.3529 | 400 | 1.0422 | 0.4854 |
| 0.2982 | 2.9412 | 500 | 1.0108 | 0.4772 |
These numbers come directly from the training run's logged metrics. The training dataset itself is not documented in this repository, so the WER above should be read as an internal validation-set result from that run rather than a benchmark on a public, named test set.
Intended Use
- Transcribing Nigerian-accented English speech (e.g. call center audio, voice notes, interviews, educational content, accessibility/voice-to-text tools) where mainstream ASR models tend to underperform due to accent mismatch.
- Experimentation and further fine-tuning for African-accented English ASR research.
Not intended for: languages other than English, heavily code-switched audio without further adaptation, or high-stakes decisions made without human review.
How to Use
import torch
import librosa
from transformers import WhisperProcessor, WhisperForConditionalGeneration
model_id = "Ephraimmm/asrfinetuned"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)
# Load audio resampled to 16 kHz (required by the feature extractor)
audio, sr = librosa.load("path/to/audio.wav", sr=16000)
input_features = processor(
audio, sampling_rate=sr, return_tensors="pt"
).input_features
with torch.no_grad():
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])
Or with the pipeline API:
from transformers import pipeline
asr = pipeline("automatic-speech-recognition", model="Ephraimmm/asrfinetuned")
result = asr("path/to/audio.wav")
print(result["text"])
Limitations
- The only WER figure available (0.4772) is from the internal validation split logged during this specific training run; the underlying dataset is not documented in this repository, so the metric should not be treated as a standardized benchmark result.
- No independent third-party evaluation (e.g. on a named public test set) has been published for this checkpoint.
- As a further fine-tune of an already narrowly-adapted Nigerian-English model, performance on accents/dialects outside that training distribution, noisy audio, or non-English speech is unverified.
- Training data provenance, size, and licensing are not documented; users should evaluate suitability for their own use case before production deployment.
Author
Developed by Ephraimmm.
- Downloads last month
- 18