Whisper-AfroRad-FR / README.md
StephaneBah's picture
Update README.md
b3a563a verified
metadata
license: apache-2.0
language:
  - fr
metrics:
  - wer
base_model:
  - openai/whisper-small
pipeline_tag: automatic-speech-recognition
library_name: transformers
tags:
  - medical
  - radiology
  - african_french_accent
  - afro-french
  - generated_from_trainer
model-index:
  - name: Whisper Small FR - Radiologie (AfroRad)
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        metrics:
          - name: Word Error Rate (WER)
            type: wer
            value: null
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 11.0
          type: mozilla-foundation/common_voice_11_0
          config: fr
          split: test
          args: fr
        metrics:
          - name: WER (Greedy)
            type: wer
            value: null
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Multilingual LibriSpeech (MLS)
          type: facebook/multilingual_librispeech
          config: french
          split: test
          args: french
        metrics:
          - name: WER (Greedy)
            type: wer
            value: null
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: VoxPopuli
          type: facebook/voxpopuli
          config: fr
          split: test
          args: fr
        metrics:
          - name: WER (Greedy)
            type: wer
            value: null
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Fleurs
          type: google/fleurs
          config: fr_fr
          split: test
          args: fr_fr
        metrics:
          - name: WER (Greedy)
            type: wer
            value: null
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: African Accented French
          type: gigant/african_accented_french
          config: fr
          split: test
          args: fr
        metrics:
          - name: WER (Greedy)
            type: wer
            value: null
datasets:
  - StephaneBah/Africa_Radiology_FR

Whisper Small FR - Radiologie (AfroRad)

This model is a fine-tuned version of openai/whisper-small adapted for medical radiology dictation in the Afro-French context. It was specifically optimized for French-speaking African regions.

Model Description

The model focuses on two main adaptations:

  1. Acoustic Adaptation: Capturing the phonetic nuances of French-speaking African regions to improve recognition of local accents.
  2. Medical Terminology: Stabilizing technical radiology terms (Spine, Shoulder, Thorax, Mammography, CT scans) in a dictation context.

It uses LoRA (Low-Rank Adaptation) via the adapters library, specifically targeting the first 4 layers of the Encoder (for acoustic/accent adaptation) and the full Decoder (for medical jargon and linguistic structure).

Training and Evaluation Data

  • Training Dataset: ~4.5 hours of specialized radiology recordings (562 audios).

Training Procedure

Training Hyperparameters

  • Learning Rate: 3e-5 (Global)
    • Encoder (L0-L3): 8e-6
    • Decoder: 4e-5
  • Optimizer: AdamW 8-bit (bnb.optim.AdamW8bit)
  • Batch Size: 12 (train), 8 (eval)
  • Max Steps: 1,700

Training Results

Training Loss Epoch Step Validation Loss WER (%)
No log 3.03 100 0.0778 133.333
No log 6.06 200 0.0579 14.2827
No log 9.09 300 0.0542 15.6211
No log 12.12 400 0.0514 7.42367
0.0761 15.15 500 0.0465 8.42744
No log 18.18 600 0.0450 6.41991
No log 21.21 700 0.0457 6.44082
No log 24.24 800 0.0458 6.37808
No log 27.27 900 0.0458 6.29444
0.0003 30.30 1000 0.0464 8.26014
No log 33.33 1100 0.0466 8.30197
No log 36.36 1200 0.0466 8.23923
No log 39.39 1300 0.0468 8.19741
0.0001 42.42 1400 0.0468 8.19741

Final Performance:

  • Best WER: 6.29% (Step 900)
  • Final WER: 8.19% (Step 1400)

The model shows strong convergence with excellent generalization to unseen medical terminology and regional accent variations.

Framework Versions

  • Transformers 4.47.0+
  • Adapters 1.0.0+
  • PyTorch 2.6.0+
  • Datasets 3.6.0
  • Python 3.10+

Citation

If you use this model in your research, please cite:

@misc{whisper-afrorad-fr,
  author = {StephaneBah},
  title = {Whisper-AfroRad-FR: Medical Radiology ASR for Afro-French Context},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\\url{https://huggingface.co/StephaneBah/Whisper-AfroRad-FR}}
}