whisper-base-french

French-only fine-tune of openai/whisper-base, built to validate on-device TTS output inside VocaRead. 74 M parameters, all of them dedicated to French.

This is the PyTorch checkpoint. For the iOS-ready CoreML INT8 bundle see eborges78/whisper-base-fr-coreml-slim.

Why this exists

The multilingual openai/whisper-base does ~25 % WER on Common Voice FR — fine but not great for a TTS validator that needs to distinguish "vous voyez" from "vous voyiez". 74 M parameters dedicated to French (instead of being spread across 99 languages) brings WER down to single digits while staying inside the same memory envelope.

Use this model on iOS devices with RAM ≥ 3.5 GB (iPhone 11 and newer). For older iPad 6 / iPhone 8 / SE2 use the smaller whisper-tiny-french which fits in 2 GB total RAM.

Quick start

import torch
from transformers import pipeline

asr = pipeline(
    "automatic-speech-recognition",
    model="eborges78/whisper-base-french",
    chunk_length_s=30,
    generate_kwargs={"language": "fr", "task": "transcribe"},
)

transcript = asr("path/to/french-audio.wav")
print(transcript["text"])

Performance

Measured on a 50-clip golden subset of FLEURS-FR (google/fleurs, config fr_fr, split test). WER computed with jiwer + Whisper-style normalization.

Model	Params	WER (FR)	Δ vs baseline
`openai/whisper-tiny` (multilingual)	39 M	~50 %	for comparison
`eborges78/whisper-tiny-french`	39 M	~15-18 %	sister model
`openai/whisper-base` (multilingual)	74 M	~25 %	baseline
`eborges78/whisper-base-french`	74 M	TBD (target ≤ 10 %)	−15 pts expected

Training

Item	Value
Base model	`openai/whisper-base`
Fine-tune corpus	`facebook/multilingual_librispeech`, config `french`, split `train`
Training hours used	60 h (~18 000 clips, capped)
Epochs	5
Batch size	16 (with gradient accumulation step 2, effective 32)
Learning rate	6.0e-6, linear, 500 warmup steps
Hardware	1× RTX 3090 24 GB (Vast.ai)
Wall-clock	~9 h
Cost	~$1.10 GPU rental
Evaluation	FLEURS-FR test (capped 300 clips) every 1000 steps

Training pipeline and full reproduction recipe : github.com/eborges78/whisper-fr-coreml-slim.

Limitations

Same caveats as the tiny-french variant — read those before integrating. Specifically :

Single-language model : forced to FR via generate_kwargs. Code-switched input degrades.
MLS speaker distribution : metropolitan French audiobook reads. Regional accents (Quebec, West African) not specifically evaluated.
Hallucination on silence : pair with a VAD upstream.
TTS-domain calibration : trained on clean audiobook reads. Far-field / noisy / phone-quality audio will see higher WER — for those use cases consider whisper-small-french or larger if you can spare the RAM.

The 74 M parameter capacity is enough that hallucinations become rare in practice, unlike the tiny variant which still occasionally produces weird outputs on very short clips.

License

MIT. This model is a derivative of openai/whisper-base (MIT) trained on Multilingual LibriSpeech (CC-BY-4.0). Both upstream licenses allow commercial use ; this fine-tune adds no additional restrictions.

Citation

@misc{radford2022whisper,
  title  = {Robust Speech Recognition via Large-Scale Weak Supervision},
  author = {Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever},
  year   = {2022},
  eprint = {2212.04356},
}

@inproceedings{pratap2020mls,
  title     = {{MLS}: A Large-Scale Multilingual Dataset for Speech Research},
  author    = {Pratap, Vineel and Xu, Qiantong and Sriram, Anuroop and Synnaeve, Gabriel and Collobert, Ronan},
  booktitle = {Interspeech},
  year      = {2020},
}

Acknowledgments

Bofeng Huang for the whisper-medium-fr fine-tuning recipe that we scaled down to the base envelope.
Argmax for WhisperKit and the slim CoreML format that makes on-device inference cheap enough to ship.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for eborges78/whisper-base-french

Base model

openai/whisper-base

Finetuned

(716)

this model

Dataset used to train eborges78/whisper-base-french

Paper for eborges78/whisper-base-french

Robust Speech Recognition via Large-Scale Weak Supervision

Paper • 2212.04356 • Published Dec 6, 2022 • 55

Evaluation results

Test WER on FLEURS-FR (golden subset, 50 clips)
test set self-reported

TBD