Whisper Small FR - Radiologie (AfroRad)

This model is a fine-tuned version of openai/whisper-small adapted for medical radiology dictation in the Afro-French context. It was specifically optimized for French-speaking African regions.

Model Description

The model focuses on two main adaptations:

Acoustic Adaptation: Capturing the phonetic nuances of French-speaking African regions to improve recognition of local accents.
Medical Terminology: Stabilizing technical radiology terms (Spine, Shoulder, Thorax, Mammography, CT scans) in a dictation context.

It uses LoRA (Low-Rank Adaptation) via the adapters library, specifically targeting the first 4 layers of the Encoder (for acoustic/accent adaptation) and the full Decoder (for medical jargon and linguistic structure).

Training and Evaluation Data

Training Dataset: ~4.5 hours of specialized radiology recordings (562 audios).

Training Procedure

Training Hyperparameters

Learning Rate: 3e-5 (Global)
- Encoder (L0-L3): 8e-6
- Decoder: 4e-5
Optimizer: AdamW 8-bit (bnb.optim.AdamW8bit)
Batch Size: 12 (train), 8 (eval)
Max Steps: 1,700

Training Results

Training Loss	Epoch	Step	Validation Loss	WER (%)
No log	3.03	100	0.0778	133.333
No log	6.06	200	0.0579	14.2827
No log	9.09	300	0.0542	15.6211
No log	12.12	400	0.0514	7.42367
0.0761	15.15	500	0.0465	8.42744
No log	18.18	600	0.0450	6.41991
No log	21.21	700	0.0457	6.44082
No log	24.24	800	0.0458	6.37808
No log	27.27	900	0.0458	6.29444
0.0003	30.30	1000	0.0464	8.26014
No log	33.33	1100	0.0466	8.30197
No log	36.36	1200	0.0466	8.23923
No log	39.39	1300	0.0468	8.19741
0.0001	42.42	1400	0.0468	8.19741

Final Performance:

Best WER: 6.29% (Step 900)
Final WER: 8.19% (Step 1400)

The model shows strong convergence with excellent generalization to unseen medical terminology and regional accent variations.

Performance on the Test Set

The model was evaluated on the AfroRadVoice-FR test split (75 audio files, independent of training), using identical decoding settings (temperature = 0.0) across all models for a fair comparison.

Rank	Model	WER (%)	CER (%)	Sentence Accuracy (%)
1	Whisper-AfroRad-FR (this model)	20.93	16.80	34.67
2	Med-Whisper-AfroRad-FR	21.84	17.68	29.33
3	whisper-small-rad-FR	25.12	20.89	33.33
4	nvidia/canary-1b-v2	33.96	11.10	1.33
5	Qwen/Qwen3-ASR-0.6B	45.40	17.55	0.00
6	bofenghuang/whisper-small-cv11-french	75.11	53.65	0.00
7	openai/whisper-small (baseline)	79.12	54.47	0.00
8	openai/whisper-large-v3	120.41	84.02	0.00

Framework Versions

Transformers 4.47.0+
Adapters 1.0.0+
PyTorch 2.6.0+
Datasets 3.6.0
Python 3.10+

Citation

If you use this model in your research, please cite:

@misc{whisper-afrorad-fr,
  author = {StephaneBah},
  title = {Whisper-AfroRad-FR: Medical Radiology ASR for Afro-French Context},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\\url{https://huggingface.co/StephaneBah/Whisper-AfroRad-FR}}
}

Downloads last month: 34

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for StephaneBah/Whisper-AfroRad-FR

Base model

openai/whisper-small

Finetuned

(3578)

this model

Dataset used to train StephaneBah/Whisper-AfroRad-FR

Evaluation results

Word Error Rate (WER)
self-reported
WER (Greedy) on Common Voice 11.0
test set self-reported
WER (Greedy) on Multilingual LibriSpeech (MLS)
test set self-reported
WER (Greedy) on VoxPopuli
test set self-reported
WER (Greedy) on Fleurs
test set self-reported
WER (Greedy) on African Accented French
test set self-reported