| | --- |
| | license: mit |
| | language: |
| | - fr |
| | metrics: |
| | - wer |
| | - cer |
| | base_model: |
| | - UsefulSensors/moonshine-tiny |
| | pipeline_tag: automatic-speech-recognition |
| | library_name: transformers |
| | arvix: https://arxiv.org/abs/2410.15608 |
| | datasets: |
| | - facebook/multilingual_librispeech |
| | tags: |
| | - audio |
| | - automatic-speech-recognition |
| | - speech-to-text |
| | - speech |
| | - french |
| | - moonshine |
| | - asr |
| | --- |
| | |
| | # Moonshine-Tiny-FR: French Speech Recognition Model |
| |
|
| | **Fine-tuned Moonshine ASR model for French language** |
| |
|
| | This is a fine-tuned version of [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny) specifically optimized for French speech recognition. The model achieves state-of-the-art performance for its size (27M parameters) on French ASR tasks. |
| |
|
| | **Links:** |
| | - [[Original Moonshine Blog]](https://petewarden.com/2024/10/21/introducing-moonshine-the-new-state-of-the-art-for-speech-to-text/) |
| | - [[Original Paper]](https://arxiv.org/abs/2410.15608) |
| | - [[Fine-Tuning Guide]](https://github.com/pierre-cheneau/finetune-moonshine-asr) |
| |
|
| | ## Usage |
| |
|
| | ### Installation |
| | ```bash |
| | pip install --upgrade pip |
| | pip install --upgrade transformers datasets[audio] |
| | ``` |
| |
|
| | ### Basic Usage |
| |
|
| | ```python |
| | from transformers import MoonshineForConditionalGeneration, AutoProcessor |
| | import torch |
| | import torchaudio |
| | |
| | # Load model and processor |
| | model = MoonshineForConditionalGeneration.from_pretrained('Cornebidouil/moonshine-tiny-fr') |
| | processor = AutoProcessor.from_pretrained('Cornebidouil/moonshine-tiny-fr') |
| | |
| | # Load and resample audio to 16kHz |
| | audio, sr = torchaudio.load("french_audio.wav") |
| | if sr != 16000: |
| | audio = torchaudio.functional.resample(audio, sr, 16000) |
| | audio = audio[0].numpy() # Convert to mono |
| | |
| | # Prepare inputs |
| | inputs = processor(audio, sampling_rate=16000, return_tensors="pt") |
| | |
| | # Generate transcription |
| | # Calculate max_new_tokens to avoid truncation (5 tokens per second is optimal for French) |
| | audio_duration = len(audio) / 16000 |
| | max_new_tokens = int(audio_duration * 5) |
| | |
| | generated_ids = model.generate(**inputs, max_new_tokens=max_new_tokens) |
| | transcription = processor.decode(generated_ids[0], skip_special_tokens=True) |
| | print(transcription) |
| | ``` |
| |
|
| | ### Advanced Usage |
| |
|
| | For production deployments with: |
| | - **Live transcription** with Voice Activity Detection |
| | - **ONNX optimization** (20-30% faster) |
| | - **Batch processing** scripts |
| | - **Complete inference pipeline** |
| |
|
| | See the included [`inference.py`](https://github.com/pierre-cheneau/finetune-moonshine-asr/blob/main/scripts/inference.py) script in the [fine-tuning guide](https://github.com/pierre-cheneau/finetune-moonshine-asr). |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | - **Base Model:** [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny) |
| | - **Language:** French (fr) |
| | - **Model Size:** 27M parameters |
| | - **Fine-tuned on:** Multilingual LibriSpeech (MLS) French dataset specifically segmented for the requirements of the moonshine model |
| | - **Training Duration:** 8,000 steps |
| | - **Optimizer:** Schedule-free AdamW |
| | - **License:** MIT |
| |
|
| | ### Model Architecture |
| |
|
| | Moonshine is a compact sequence-to-sequence ASR model designed for efficient on-device inference: |
| | - **Encoder:** Convolutional feature extraction + Transformer blocks |
| | - **Decoder:** Autoregressive Transformer decoder |
| | - **Parameters:** 27M (tiny variant) |
| | - **Input:** 16kHz mono audio |
| | - **Output:** French text transcription |
| |
|
| | ## Performance |
| |
|
| | ### Evaluation Metrics |
| |
|
| | Evaluated on Multilingual LibriSpeech (MLS) French test set: |
| |
|
| | | Metric | Score | |
| | |--------|-------| |
| | | **Word Error Rate (WER)** | 21.8% | |
| | | **Character Error Rate (CER)** | ~10% | |
| | | **Real-Time Factor (RTF)** | 0.11x (CPU) | |
| |
|
| | **Inference Speed:** ~9x faster than real-time on CPU, enabling live transcription. |
| |
|
| | ### Comparison |
| |
|
| | | Model | Size | Language | WER (MLS-FR) | |
| | |-------|------|----------|--------------| |
| | | Whisper-tiny | 39M | Multilingual | ~25% | |
| | | **Moonshine-tiny-fr** | 27M | French | **21.8%** | |
| | | Whisper-base | 74M | Multilingual | ~18% | |
| |
|
| | *Moonshine-tiny-fr achieves competitive performance with 30% fewer parameters than Whisper-tiny. While being a proof of concept. More work should be done to create a proper and robust dataset.* |
| |
|
| | ## Training Details / Fine tuning |
| |
|
| | Please refer to my Github repo for the training procedure : |
| |
|
| | ## Use Cases |
| |
|
| | ### Primary Applications |
| |
|
| | ✅ **French Speech Recognition** |
| | - Real-time transcription |
| | - Audio file transcription |
| | - Voice commands |
| | - Accessibility tools |
| |
|
| | ✅ **Resource-Constrained Environments** |
| | - On-device transcription (mobile, edge devices) |
| | - Low-latency applications |
| | - Offline transcription |
| |
|
| | ✅ **Hogwarts Legacy SpellCaster** |
| | - Ultra-lightweight and low latency spell speech recognition |
| | - https://github.com/pierre-cheneau/HogwartsLegacy-SpellCaster |
| |
|
| | ## Limitations and Biases |
| |
|
| | ### Known Limitations for this tiny model |
| |
|
| | - **Hallucination:** Like all seq2seq models, may generate text not present in audio |
| | - **Repetition:** May repeat phrases, especially with greedy decoding (use beam search) |
| | - **Short Segments:** Performance may degrade on very short audio clips (<0.5s) |
| | - **Domain Specificity:** Trained primarily on audiobooks (read speech) |
| | - **Accents:** Best performance on metropolitan French; regional accents may have higher WER |
| | - **Background Noise:** Performance degrades with significant background noise |
| |
|
| | ## Model Card Author |
| |
|
| | **Pierre Chéneau (Cornebidouil)** |
| |
|
| | Geologist, Developer and maintainer of this fine-tuned French model. |
| |
|
| | **Links:** |
| | - 🌐 [Personal Website](https://pcheneau.fr) |
| | - 💼 [GitHub](https://github.com/pierre-cheneau) |
| | - 📚 [Fine-tuning Guide](https://github.com/pierre-cheneau/finetune-moonshine-asr) |
| |
|
| | ## Citations |
| |
|
| | ### This Model |
| |
|
| | ```bibtex |
| | @misc{cheneau2026moonshine-tiny-fr, |
| | author = {Pierre Chéneau (Cornebidouil)}, |
| | title = {Moonshine-Tiny-FR: Fine-tuned French Speech Recognition}, |
| | year = {2026}, |
| | publisher = {HuggingFace}, |
| | url = {https://huggingface.co/Cornebidouil/moonshine-tiny-fr} |
| | } |
| | ``` |
| |
|
| | ### Fine tuning Guide |
| |
|
| | ```bibtex |
| | @misc{cheneau2026moonshine-finetune, |
| | author = {Pierre Chéneau (Cornebidouil)}, |
| | title = {Moonshine ASR Fine-Tuning Guide}, |
| | year = {2026}, |
| | publisher = {GitHub}, |
| | url = {https://github.com/pierre-cheneau/finetune-moonshine-asr} |
| | } |
| | ``` |
| |
|
| | ### Original Moonshine Model |
| |
|
| | ```bibtex |
| | @misc{jeffries2024moonshinespeechrecognitionlive, |
| | title={Moonshine: Speech Recognition for Live Transcription and Voice Commands}, |
| | author={Nat Jeffries and Evan King and Manjunath Kudlur and Guy Nicholson and James Wang and Pete Warden}, |
| | year={2024}, |
| | eprint={2410.15608}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.SD}, |
| | url={https://arxiv.org/abs/2410.15608}, |
| | } |
| | ``` |
| |
|
| | ### Multilingual LibriSpeech Dataset |
| |
|
| | ```bibtex |
| | @inproceedings{panayotov2015librispeech, |
| | title={Multilingual LibriSpeech: A Corpus for Speech Recognition in Multiple Languages}, |
| | author={Pratap, Vineel and Xu, Qiantong and Sriram, Anuroop and Synnaeve, Gabriel and Collobert, Ronan}, |
| | booktitle={Interspeech}, |
| | year={2020} |
| | } |
| | ``` |
| |
|
| | ## Additional Resources |
| |
|
| | - **Fine-Tuning Guide:** [Complete tutorial](https://github.com/pierre-cheneau/finetune-moonshine-asr) |
| | - **Original Moonshine:** [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny) |
| | - **Dataset:** [Multilingual LibriSpeech](https://huggingface.co/datasets/facebook/multilingual_librispeech) |
| | - **Issues/Support:** [GitHub Issues](https://github.com/pierre-cheneau/finetune-moonshine-asr/issues) |
| |
|
| | ## License |
| |
|
| | This model is released under the MIT License, consistent with the base Moonshine model. |
| |
|
| | ``` |
| | MIT License |
| | |
| | Copyright (c) 2026 Pierre Chéneau (Cornebidouil) |
| | |
| | Permission is hereby granted, free of charge, to any person obtaining a copy |
| | of this software and associated documentation files (the "Software"), to deal |
| | in the Software without restriction... |
| | ``` |
| |
|
| | ## Acknowledgments |
| |
|
| | - **Useful Sensors** for the original Moonshine architecture and pre-trained model |
| | - **Meta AI** for the Multilingual LibriSpeech dataset |
| | - **HuggingFace** for the transformers library and model hosting |
| | - **Schedule-Free Learning** for the optimizer implementation |
| |
|
| | --- |
| |
|
| | **Questions?** Open an issue on the [fine-tuning guide repository](https://github.com/pierre-cheneau/finetune-moonshine-asr) or check the documentation. |
| |
|
| | **Want to fine-tune for your language?** See the [complete fine-tuning guide](https://github.com/pierre-cheneau/finetune-moonshine-asr). |