FrWhisper (Whisper Large-v3 fine-tuned for French conversational speech)

FrWhisper is a fine-tune of openai/whisper-large-v3 for spoken French, optimised for conversational speech with disfluencies (hesitations such as "euh", repetitions, interjections, spoken numbers). It is trained on material from the ESLO and LangAge corpora.

Versions

This repository keeps both releases as git tags:

  • v2 (this version, main): trained on an improved subset of ESLO/LangAge data.
  • v1: the earlier model, trained on a less optimal subset of ESLO/LAngAge data. Load it with revision="v1".
from transformers import pipeline
asr = pipeline("automatic-speech-recognition", model="aihpi/FrWhisper")            # v2 (latest)
asr_v1 = pipeline("automatic-speech-recognition", model="aihpi/FrWhisper", revision="v1")

Evaluation

WER / CER on a fixed 1000-sample held-out subset of the corrected test split (greedy decoding, forced French, lowercase + punctuation-stripped normalisation):

Model WER (all) WER ESLO WER LangAge CER (all)
Whisper large-v3 (base) 50.81 48.11 54.64 35.44
FrWhisper v1 (pre-improvement) 34.13 27.74 43.19 23.68
FrWhisper v2 (this model) 29.87 22.41 40.46 20.95

v2 improves over the base model by ~21 WER points and over v1 by ~4 points overall, with the largest gain on ESLO (the corpus most affected by the improved subset).

Usage

from transformers import pipeline

asr = pipeline("automatic-speech-recognition", model="aihpi/FrWhisper")
result = asr("audio.wav")   # 16 kHz mono; French is forced by the generation config
print(result["text"])

For long files, enable chunking: pipeline(..., chunk_length_s=30).

Training

  • Base model: openai/whisper-large-v3
  • Data: improved subset of ESLO + LangAge French speech (see the gated dataset aihpi/FrWhisper-dataset), 16 kHz mono, utterance-level segments.
  • Setup: bf16, learning rate 1.5e-5 (linear), warmup 1000 steps, effective batch size 64, max grad norm 0.5. Early stopping on eval WER (patience 3) kept the best checkpoint; French transcription is forced during generation.

Intended use and limitations

  • Intended use: non-commercial academic research on French ASR, especially conversational / disfluent speech.
  • Out of scope: commercial use (prohibited by the NonCommercial licence term).
  • Limitations: tuned for conversational French from sociolinguistic interviews; the eval split shares speakers with training, so reported WER is somewhat optimistic; performance on other domains/accents may vary.

Licence

Released under CC BY-NC-SA 4.0, subject to the terms of the underlying ESLO and LangAge corpora. ESLO material is Copyright (c) 2012 Université d'Orléans / LLL, freely available for non-commercial use under a Creative Commons licence.

Citation

@misc{frwhisper2025,
  title={FrWhisper: Whisper Large-v3 fine-tuned for French conversational speech},
  author={Hanno Müller, Annette Gerstenberg},
  year={2025},
  note={Fine-tuned on the LangAge and ESLO corpora}
}

Authors

Funding

The AI Service Centre Berlin Brandenburg is funded by the Federal Ministry of Research, Technology and Space under the funding code 01IS22092.

Contact

For questions about this model, please open an issue in the repository or contact kisz@hpi.de.

Downloads last month
258
Safetensors
Model size
2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aihpi/FrWhisper

Finetuned
(861)
this model