Instructions to use eborges78/whisper-base-french with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use eborges78/whisper-base-french with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="eborges78/whisper-base-french")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("eborges78/whisper-base-french", dtype="auto") - Notebooks
- Google Colab
- Kaggle
whisper-base-french
French-only fine-tune of openai/whisper-base, built to validate on-device TTS output inside VocaRead. 74 M parameters, all of them dedicated to French.
This is the PyTorch checkpoint. For the iOS-ready CoreML INT8 bundle see eborges78/whisper-base-fr-coreml-slim.
Why this exists
The multilingual openai/whisper-base does ~25 % WER on Common Voice FR — fine but not great for a TTS validator that needs to distinguish "vous voyez" from "vous voyiez". 74 M parameters dedicated to French (instead of being spread across 99 languages) brings WER down to single digits while staying inside the same memory envelope.
Use this model on iOS devices with RAM ≥ 3.5 GB (iPhone 11 and newer). For older iPad 6 / iPhone 8 / SE2 use the smaller whisper-tiny-french which fits in 2 GB total RAM.
Quick start
import torch
from transformers import pipeline
asr = pipeline(
"automatic-speech-recognition",
model="eborges78/whisper-base-french",
chunk_length_s=30,
generate_kwargs={"language": "fr", "task": "transcribe"},
)
transcript = asr("path/to/french-audio.wav")
print(transcript["text"])
Performance
Measured on a 50-clip golden subset of FLEURS-FR (google/fleurs, config fr_fr, split test). WER computed with jiwer + Whisper-style normalization.
| Model | Params | WER (FR) | Δ vs baseline |
|---|---|---|---|
openai/whisper-tiny (multilingual) |
39 M | ~50 % | for comparison |
eborges78/whisper-tiny-french |
39 M | ~15-18 % | sister model |
openai/whisper-base (multilingual) |
74 M | ~25 % | baseline |
eborges78/whisper-base-french |
74 M | TBD (target ≤ 10 %) | −15 pts expected |
Training
| Item | Value |
|---|---|
| Base model | openai/whisper-base |
| Fine-tune corpus | facebook/multilingual_librispeech, config french, split train |
| Training hours used | 60 h (~18 000 clips, capped) |
| Epochs | 5 |
| Batch size | 16 (with gradient accumulation step 2, effective 32) |
| Learning rate | 6.0e-6, linear, 500 warmup steps |
| Hardware | 1× RTX 3090 24 GB (Vast.ai) |
| Wall-clock | ~9 h |
| Cost | ~$1.10 GPU rental |
| Evaluation | FLEURS-FR test (capped 300 clips) every 1000 steps |
Training pipeline and full reproduction recipe : github.com/eborges78/whisper-fr-coreml-slim.
Limitations
Same caveats as the tiny-french variant — read those before integrating. Specifically :
- Single-language model : forced to FR via
generate_kwargs. Code-switched input degrades. - MLS speaker distribution : metropolitan French audiobook reads. Regional accents (Quebec, West African) not specifically evaluated.
- Hallucination on silence : pair with a VAD upstream.
- TTS-domain calibration : trained on clean audiobook reads. Far-field / noisy / phone-quality audio will see higher WER — for those use cases consider
whisper-small-frenchor larger if you can spare the RAM.
The 74 M parameter capacity is enough that hallucinations become rare in practice, unlike the tiny variant which still occasionally produces weird outputs on very short clips.
License
MIT. This model is a derivative of openai/whisper-base (MIT) trained on Multilingual LibriSpeech (CC-BY-4.0). Both upstream licenses allow commercial use ; this fine-tune adds no additional restrictions.
Citation
@misc{radford2022whisper,
title = {Robust Speech Recognition via Large-Scale Weak Supervision},
author = {Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever},
year = {2022},
eprint = {2212.04356},
}
@inproceedings{pratap2020mls,
title = {{MLS}: A Large-Scale Multilingual Dataset for Speech Research},
author = {Pratap, Vineel and Xu, Qiantong and Sriram, Anuroop and Synnaeve, Gabriel and Collobert, Ronan},
booktitle = {Interspeech},
year = {2020},
}
Acknowledgments
- Bofeng Huang for the
whisper-medium-frfine-tuning recipe that we scaled down to thebaseenvelope. - Argmax for WhisperKit and the slim CoreML format that makes on-device inference cheap enough to ship.
Model tree for eborges78/whisper-base-french
Base model
openai/whisper-baseDataset used to train eborges78/whisper-base-french
Paper for eborges78/whisper-base-french
Robust Speech Recognition via Large-Scale Weak Supervision
Evaluation results
- Test WER on FLEURS-FR (golden subset, 50 clips)test set self-reportedTBD