Instructions to use StephaneBah/Whisper-AfroRad-FR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use StephaneBah/Whisper-AfroRad-FR with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="StephaneBah/Whisper-AfroRad-FR")# Load model directly from transformers import AutoProcessor, AutoModel processor = AutoProcessor.from_pretrained("StephaneBah/Whisper-AfroRad-FR") model = AutoModel.from_pretrained("StephaneBah/Whisper-AfroRad-FR") - Notebooks
- Google Colab
- Kaggle
Whisper Small FR - Radiologie (AfroRad)
This model is a fine-tuned version of openai/whisper-small adapted for medical radiology dictation in the Afro-French context. It was specifically optimized for French-speaking African regions.
Model Description
The model focuses on two main adaptations:
- Acoustic Adaptation: Capturing the phonetic nuances of French-speaking African regions to improve recognition of local accents.
- Medical Terminology: Stabilizing technical radiology terms (Spine, Shoulder, Thorax, Mammography, CT scans) in a dictation context.
It uses LoRA (Low-Rank Adaptation) via the adapters library, specifically targeting the first 4 layers of the Encoder (for acoustic/accent adaptation) and the full Decoder (for medical jargon and linguistic structure).
Training and Evaluation Data
- Training Dataset: ~4.5 hours of specialized radiology recordings (562 audios).
Training Procedure
Training Hyperparameters
- Learning Rate: 3e-5 (Global)
- Encoder (L0-L3): 8e-6
- Decoder: 4e-5
- Optimizer: AdamW 8-bit (
bnb.optim.AdamW8bit) - Batch Size: 12 (train), 8 (eval)
- Max Steps: 1,700
Training Results
| Training Loss | Epoch | Step | Validation Loss | WER (%) |
|---|---|---|---|---|
| No log | 3.03 | 100 | 0.0778 | 133.333 |
| No log | 6.06 | 200 | 0.0579 | 14.2827 |
| No log | 9.09 | 300 | 0.0542 | 15.6211 |
| No log | 12.12 | 400 | 0.0514 | 7.42367 |
| 0.0761 | 15.15 | 500 | 0.0465 | 8.42744 |
| No log | 18.18 | 600 | 0.0450 | 6.41991 |
| No log | 21.21 | 700 | 0.0457 | 6.44082 |
| No log | 24.24 | 800 | 0.0458 | 6.37808 |
| No log | 27.27 | 900 | 0.0458 | 6.29444 |
| 0.0003 | 30.30 | 1000 | 0.0464 | 8.26014 |
| No log | 33.33 | 1100 | 0.0466 | 8.30197 |
| No log | 36.36 | 1200 | 0.0466 | 8.23923 |
| No log | 39.39 | 1300 | 0.0468 | 8.19741 |
| 0.0001 | 42.42 | 1400 | 0.0468 | 8.19741 |
Final Performance:
- Best WER: 6.29% (Step 900)
- Final WER: 8.19% (Step 1400)
The model shows strong convergence with excellent generalization to unseen medical terminology and regional accent variations.
Performance on the Test Set
The model was evaluated on the AfroRadVoice-FR test split (75 audio files, independent of training), using identical decoding settings (temperature = 0.0) across all models for a fair comparison.
| Rank | Model | WER (%) | CER (%) | Sentence Accuracy (%) |
|---|---|---|---|---|
| 1 | Whisper-AfroRad-FR (this model) | 20.93 | 16.80 | 34.67 |
| 2 | Med-Whisper-AfroRad-FR | 21.84 | 17.68 | 29.33 |
| 3 | whisper-small-rad-FR | 25.12 | 20.89 | 33.33 |
| 4 | nvidia/canary-1b-v2 | 33.96 | 11.10 | 1.33 |
| 5 | Qwen/Qwen3-ASR-0.6B | 45.40 | 17.55 | 0.00 |
| 6 | bofenghuang/whisper-small-cv11-french | 75.11 | 53.65 | 0.00 |
| 7 | openai/whisper-small (baseline) | 79.12 | 54.47 | 0.00 |
| 8 | openai/whisper-large-v3 | 120.41 | 84.02 | 0.00 |
Framework Versions
- Transformers 4.47.0+
- Adapters 1.0.0+
- PyTorch 2.6.0+
- Datasets 3.6.0
- Python 3.10+
Citation
If you use this model in your research, please cite:
@misc{whisper-afrorad-fr,
author = {StephaneBah},
title = {Whisper-AfroRad-FR: Medical Radiology ASR for Afro-French Context},
year = {2026},
publisher = {Hugging Face},
howpublished = {\\url{https://huggingface.co/StephaneBah/Whisper-AfroRad-FR}}
}
- Downloads last month
- 34
Model tree for StephaneBah/Whisper-AfroRad-FR
Base model
openai/whisper-smallDataset used to train StephaneBah/Whisper-AfroRad-FR
Evaluation results
- Word Error Rate (WER)self-reported
- WER (Greedy) on Common Voice 11.0test set self-reported
- WER (Greedy) on Multilingual LibriSpeech (MLS)test set self-reported
- WER (Greedy) on VoxPopulitest set self-reported
- WER (Greedy) on Fleurstest set self-reported
- WER (Greedy) on African Accented Frenchtest set self-reported