Instructions to use aihpi/FrWhisper with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use aihpi/FrWhisper with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="aihpi/FrWhisper")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("aihpi/FrWhisper") model = AutoModelForSpeechSeq2Seq.from_pretrained("aihpi/FrWhisper") - Notebooks
- Google Colab
- Kaggle
FrWhisper (Whisper Large-v3 fine-tuned for French conversational speech)
FrWhisper is a fine-tune of openai/whisper-large-v3 for spoken French, optimised for conversational speech with disfluencies (hesitations such as "euh", repetitions, interjections, spoken numbers). It is trained on material from the ESLO and LangAge corpora.
Versions
This repository keeps both releases as git tags:
v2(this version,main): trained on an improved subset of ESLO/LangAge data.v1: the earlier model, trained on a less optimal subset of ESLO/LAngAge data. Load it withrevision="v1".
from transformers import pipeline
asr = pipeline("automatic-speech-recognition", model="aihpi/FrWhisper") # v2 (latest)
asr_v1 = pipeline("automatic-speech-recognition", model="aihpi/FrWhisper", revision="v1")
Evaluation
WER / CER on a fixed 1000-sample held-out subset of the corrected test split (greedy decoding, forced French, lowercase + punctuation-stripped normalisation):
| Model | WER (all) | WER ESLO | WER LangAge | CER (all) |
|---|---|---|---|---|
| Whisper large-v3 (base) | 50.81 | 48.11 | 54.64 | 35.44 |
| FrWhisper v1 (pre-improvement) | 34.13 | 27.74 | 43.19 | 23.68 |
| FrWhisper v2 (this model) | 29.87 | 22.41 | 40.46 | 20.95 |
v2 improves over the base model by ~21 WER points and over v1 by ~4 points overall, with the largest gain on ESLO (the corpus most affected by the improved subset).
Usage
from transformers import pipeline
asr = pipeline("automatic-speech-recognition", model="aihpi/FrWhisper")
result = asr("audio.wav") # 16 kHz mono; French is forced by the generation config
print(result["text"])
For long files, enable chunking: pipeline(..., chunk_length_s=30).
Training
- Base model: openai/whisper-large-v3
- Data: improved subset of ESLO + LangAge French speech (see the gated dataset aihpi/FrWhisper-dataset), 16 kHz mono, utterance-level segments.
- Setup: bf16, learning rate 1.5e-5 (linear), warmup 1000 steps, effective batch size 64, max grad norm 0.5. Early stopping on eval WER (patience 3) kept the best checkpoint; French transcription is forced during generation.
Intended use and limitations
- Intended use: non-commercial academic research on French ASR, especially conversational / disfluent speech.
- Out of scope: commercial use (prohibited by the NonCommercial licence term).
- Limitations: tuned for conversational French from sociolinguistic interviews; the eval split shares speakers with training, so reported WER is somewhat optimistic; performance on other domains/accents may vary.
Licence
Released under CC BY-NC-SA 4.0, subject to the terms of the underlying ESLO and LangAge corpora. ESLO material is Copyright (c) 2012 Université d'Orléans / LLL, freely available for non-commercial use under a Creative Commons licence.
Citation
@misc{frwhisper2025,
title={FrWhisper: Whisper Large-v3 fine-tuned for French conversational speech},
author={Hanno Müller, Annette Gerstenberg},
year={2025},
note={Fine-tuned on the LangAge and ESLO corpora}
}
Authors
Funding
The AI Service Centre Berlin Brandenburg is funded by the Federal Ministry of Research, Technology and Space under the funding code 01IS22092.
Contact
For questions about this model, please open an issue in the repository or contact kisz@hpi.de.
- Downloads last month
- 258
Model tree for aihpi/FrWhisper
Base model
openai/whisper-large-v3