Instructions to use eborges78/whisper-tiny-french with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use eborges78/whisper-tiny-french with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="eborges78/whisper-tiny-french")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("eborges78/whisper-tiny-french") model = AutoModelForSpeechSeq2Seq.from_pretrained("eborges78/whisper-tiny-french") - Notebooks
- Google Colab
- Kaggle
whisper-tiny-french
French-only fine-tune of openai/whisper-tiny, built to validate on-device TTS output inside VocaRead. 39 M parameters, all of them dedicated to French.
This is the PyTorch checkpoint. For the iOS-ready CoreML INT8 bundle see eborges78/whisper-tiny-fr-coreml-slim.
Why this exists
openai/whisper-tiny is the only Whisper size that fits in RAM on iPad 6 / iPhone 8 / SE2. But its 39 M parameters span 99 languages — French gets a fraction of that capacity, and the WER on Common Voice FR sits around 50 %.
A French-only fine-tune of the same architecture concentrates 100 % of the capacity on FR and pushes WER under 20 %, all while staying inside the same memory envelope. Drop-in replacement for the multilingual tiny when the input language is known.
Quick start
import torch
from transformers import pipeline
asr = pipeline(
"automatic-speech-recognition",
model="eborges78/whisper-tiny-french",
chunk_length_s=30,
generate_kwargs={"language": "fr", "task": "transcribe"},
)
transcript = asr("path/to/french-audio.wav")
print(transcript["text"])
Performance
Measured on a 50-clip golden subset of FLEURS-FR (google/fleurs, config fr_fr, split test). WER computed with jiwer + Whisper-style normalization (lowercase, strip punctuation, collapse whitespace).
| Model | Params | WER (FR) | Δ vs baseline |
|---|---|---|---|
openai/whisper-tiny (multilingual) |
39 M | ~50 % | baseline |
eborges78/whisper-tiny-french (this model) |
39 M | 43 % (measured) | −7 pts |
openai/whisper-base (multilingual) |
74 M | ~25 % | for comparison |
eborges78/whisper-base-french (planned) |
74 M | ~8-10 % | sister model |
Honest assessment : the FLEURS WER of 43 % is above the original 20 % quality gate target documented in
configs/tiny-fr.yaml. The model trained for 4 epochs on 30 h of MLS-FR but FLEURS is a substantially harder out-of-distribution eval (news/short prompts with English loanwords like "springboks", "u.s. corps of engineers" vs MLS's 19th-century French audiobooks).Internal eval on 300 MLS-FR test clips (in-distribution) lands at WER 32 % — closer to but still above target. The model is published under the
devbranch revision rather thanmainto flag this gap.For the target VocaRead use case (validating clean French TTS output of public-domain literature), the in-distribution performance is what matters — and the model improves noticeably over the multilingual tiny baseline (which hallucinates English phrases on FR audio).
Per-clip behaviour examples (from the 50-clip FLEURS golden set)
Best cases (clean French, no loanwords, WER 7-11 %) :
REF: ainsi le crayon était un bon ami pour beaucoup de gens lorsqu'il est sorti
HYP: si le crayon était un bon ami pour beaucoup de gens lorsqu'il est sorti
WER: 0.07
Worst cases (English loanwords + proper nouns, WER > 80 %) :
REF: pour les springboks ce fut la fin d'une série de cinq défaites
HYP: pour l'esprit de boxe se fût la fin d'une cerine de sang du défaite
WER: 0.75
The model maps unseen English words phonetically — expected since MLS is a 19th-century French literature corpus.
Training
| Item | Value |
|---|---|
| Base model | openai/whisper-tiny |
| Fine-tune corpus | facebook/multilingual_librispeech, config french, split train |
| Training hours used | 30 h (~9 000 clips, capped) |
| Epochs | 4 |
| Steps | 1 128 |
| Batch size | 32 |
| Learning rate | 1.0e-5, linear, 500 warmup steps |
| Hardware | 1× RTX 3090 24 GB (Vast.ai) |
| Wall-clock | ~6 h total (4h29 dataset mel-mapping CPU, 1h26 training GPU) |
| Cost | ~€2 actual (single-instance, sub-optimal — see repo docs/adding-a-language.md for the CPU+GPU split that saves ~50 %) |
| In-training eval | MLS test (capped 300 clips) every 500 steps |
| Final training loss | 0.44 (started at 1.32) |
| In-training eval WER (MLS test) | 31.96 % |
| FLEURS-FR bench WER (50 clips) | 42.96 % |
Training pipeline and full reproduction recipe : github.com/eborges78/whisper-fr-coreml-slim.
Known training limitations
- 4 epochs probably insufficient. Training loss was still descending in epoch 4 (0.50 → 0.44). A retraining at 8-10 epochs would likely move WER lower.
- MLS-only corpus. Adding FLEURS train data, VoxPopuli FR, or a custom news corpus to the training mix would help bridge the FLEURS eval gap.
- No FR-specific augmentation. Current
spec_augmentis conservative (time_mask_param: 30, freq_mask_param: 27). More aggressive masking might help.
Limitations
This model is calibrated for the specific downstream task of validating TTS output read-aloud audio. It will work but is sub-optimal for :
- Far-field noisy speech : trained on clean audiobook reads, will degrade on phone-call quality audio. Use whisper-base-french or whisper-small-french for noisier inputs.
- Code-switching : capacity is 100 % FR. Sentences mixing French and English will be transcribed entirely in French (the English chunks get phonetically mapped). For mixed-language input, stay on multilingual whisper-base or larger.
- Strong regional accents : MLS speakers are mostly metropolitan / continental French. Quebec or West African French may have higher WER. We did not specifically evaluate this.
- Hallucination at the edges : like all Whisper sizes, the model can hallucinate on silence-only inputs (it generates audiobook-style filler). Always pair with a VAD or duration check upstream.
- Single-language only : forced to FR via
generate_kwargs={"language": "fr"}. Passing other languages will produce garbage — use the multilingual base if you don't know the language ahead of time.
License
MIT. This model is a derivative of openai/whisper-tiny (MIT) trained on Multilingual LibriSpeech (CC-BY-4.0). Both upstream licenses allow commercial use ; this fine-tune adds no additional restrictions.
Citation
If you use this model in a paper or product, please cite the upstream Whisper paper and the MLS dataset :
@misc{radford2022whisper,
title = {Robust Speech Recognition via Large-Scale Weak Supervision},
author = {Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever},
year = {2022},
eprint = {2212.04356},
}
@inproceedings{pratap2020mls,
title = {{MLS}: A Large-Scale Multilingual Dataset for Speech Research},
author = {Pratap, Vineel and Xu, Qiantong and Sriram, Anuroop and Synnaeve, Gabriel and Collobert, Ronan},
booktitle = {Interspeech},
year = {2020},
}
Acknowledgments
- Bofeng Huang for the
whisper-medium-frfine-tuning recipe scaled down here to thetinyenvelope. - Argmax for WhisperKit — without their slim CoreML bundles + ANE optimization, this wouldn't fit on an iPad 6 at all.
- Downloads last month
- 41
Model tree for eborges78/whisper-tiny-french
Base model
openai/whisper-tinyDataset used to train eborges78/whisper-tiny-french
Paper for eborges78/whisper-tiny-french
Robust Speech Recognition via Large-Scale Weak Supervision
Evaluation results
- Test WER on FLEURS-FR (golden subset, 50 clips)test set self-reported0.430