Update README.md

b3a563a verified 2 months ago

5.18 kB

license: apache-2.0
language:
  - fr
metrics:
  - wer
base_model:
  - openai/whisper-small
pipeline_tag: automatic-speech-recognition
library_name: transformers
tags:
  - medical
  - radiology
  - african_french_accent
  - afro-french
  - generated_from_trainer
model-index:
  - name: Whisper Small FR - Radiologie (AfroRad)
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        metrics:
          - name: Word Error Rate (WER)
            type: wer
            value: null
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 11.0
          type: mozilla-foundation/common_voice_11_0
          config: fr
          split: test
          args: fr
        metrics:
          - name: WER (Greedy)
            type: wer
            value: null
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Multilingual LibriSpeech (MLS)
          type: facebook/multilingual_librispeech
          config: french
          split: test
          args: french
        metrics:
          - name: WER (Greedy)
            type: wer
            value: null
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: VoxPopuli
          type: facebook/voxpopuli
          config: fr
          split: test
          args: fr
        metrics:
          - name: WER (Greedy)
            type: wer
            value: null
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Fleurs
          type: google/fleurs
          config: fr_fr
          split: test
          args: fr_fr
        metrics:
          - name: WER (Greedy)
            type: wer
            value: null
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: African Accented French
          type: gigant/african_accented_french
          config: fr
          split: test
          args: fr
        metrics:
          - name: WER (Greedy)
            type: wer
            value: null
datasets:
  - StephaneBah/Africa_Radiology_FR

Whisper Small FR - Radiologie (AfroRad)

This model is a fine-tuned version of openai/whisper-small adapted for medical radiology dictation in the Afro-French context. It was specifically optimized for French-speaking African regions.

Model Description

The model focuses on two main adaptations:

Acoustic Adaptation: Capturing the phonetic nuances of French-speaking African regions to improve recognition of local accents.
Medical Terminology: Stabilizing technical radiology terms (Spine, Shoulder, Thorax, Mammography, CT scans) in a dictation context.

It uses LoRA (Low-Rank Adaptation) via the adapters library, specifically targeting the first 4 layers of the Encoder (for acoustic/accent adaptation) and the full Decoder (for medical jargon and linguistic structure).

Training and Evaluation Data

Training Dataset: ~4.5 hours of specialized radiology recordings (562 audios).

Training Procedure

Training Hyperparameters

Learning Rate: 3e-5 (Global)
- Encoder (L0-L3): 8e-6
- Decoder: 4e-5
Optimizer: AdamW 8-bit (bnb.optim.AdamW8bit)
Batch Size: 12 (train), 8 (eval)
Max Steps: 1,700

Training Results

Training Loss	Epoch	Step	Validation Loss	WER (%)
No log	3.03	100	0.0778	133.333
No log	6.06	200	0.0579	14.2827
No log	9.09	300	0.0542	15.6211
No log	12.12	400	0.0514	7.42367
0.0761	15.15	500	0.0465	8.42744
No log	18.18	600	0.0450	6.41991
No log	21.21	700	0.0457	6.44082
No log	24.24	800	0.0458	6.37808
No log	27.27	900	0.0458	6.29444
0.0003	30.30	1000	0.0464	8.26014
No log	33.33	1100	0.0466	8.30197
No log	36.36	1200	0.0466	8.23923
No log	39.39	1300	0.0468	8.19741
0.0001	42.42	1400	0.0468	8.19741

Final Performance:

Best WER: 6.29% (Step 900)
Final WER: 8.19% (Step 1400)

The model shows strong convergence with excellent generalization to unseen medical terminology and regional accent variations.

Framework Versions

Transformers 4.47.0+
Adapters 1.0.0+
PyTorch 2.6.0+
Datasets 3.6.0
Python 3.10+

Citation

If you use this model in your research, please cite:

@misc{whisper-afrorad-fr,
  author = {StephaneBah},
  title = {Whisper-AfroRad-FR: Medical Radiology ASR for Afro-French Context},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\\url{https://huggingface.co/StephaneBah/Whisper-AfroRad-FR}}
}