File size: 8,348 Bytes
9a1d944 a673c6d 9a1d944 43a4312 a673c6d f5d1ff1 a673c6d f5d1ff1 a673c6d f5d1ff1 a673c6d f5d1ff1 a673c6d 3dc02eb a673c6d a179153 a673c6d a179153 a673c6d 3dc02eb a673c6d cae6a66 a673c6d 3dc02eb a673c6d 3dc02eb a673c6d 3dc02eb a673c6d 3dc02eb a673c6d ccd0f66 a673c6d 3dc02eb a673c6d 3dc02eb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 | ---
license: mit
language:
- fr
metrics:
- wer
- cer
base_model:
- UsefulSensors/moonshine-tiny
pipeline_tag: automatic-speech-recognition
library_name: transformers
arvix: https://arxiv.org/abs/2410.15608
datasets:
- facebook/multilingual_librispeech
tags:
- audio
- automatic-speech-recognition
- speech-to-text
- speech
- french
- moonshine
- asr
---
# Moonshine-Tiny-FR: French Speech Recognition Model
**Fine-tuned Moonshine ASR model for French language**
This is a fine-tuned version of [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny) specifically optimized for French speech recognition. The model achieves state-of-the-art performance for its size (27M parameters) on French ASR tasks.
**Links:**
- [[Original Moonshine Blog]](https://petewarden.com/2024/10/21/introducing-moonshine-the-new-state-of-the-art-for-speech-to-text/)
- [[Original Paper]](https://arxiv.org/abs/2410.15608)
- [[Fine-Tuning Guide]](https://github.com/pierre-cheneau/finetune-moonshine-asr)
## Usage
### Installation
```bash
pip install --upgrade pip
pip install --upgrade transformers datasets[audio]
```
### Basic Usage
```python
from transformers import MoonshineForConditionalGeneration, AutoProcessor
import torch
import torchaudio
# Load model and processor
model = MoonshineForConditionalGeneration.from_pretrained('Cornebidouil/moonshine-tiny-fr')
processor = AutoProcessor.from_pretrained('Cornebidouil/moonshine-tiny-fr')
# Load and resample audio to 16kHz
audio, sr = torchaudio.load("french_audio.wav")
if sr != 16000:
audio = torchaudio.functional.resample(audio, sr, 16000)
audio = audio[0].numpy() # Convert to mono
# Prepare inputs
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
# Generate transcription
# Calculate max_new_tokens to avoid truncation (5 tokens per second is optimal for French)
audio_duration = len(audio) / 16000
max_new_tokens = int(audio_duration * 5)
generated_ids = model.generate(**inputs, max_new_tokens=max_new_tokens)
transcription = processor.decode(generated_ids[0], skip_special_tokens=True)
print(transcription)
```
### Advanced Usage
For production deployments with:
- **Live transcription** with Voice Activity Detection
- **ONNX optimization** (20-30% faster)
- **Batch processing** scripts
- **Complete inference pipeline**
See the included [`inference.py`](https://github.com/pierre-cheneau/finetune-moonshine-asr/blob/main/scripts/inference.py) script in the [fine-tuning guide](https://github.com/pierre-cheneau/finetune-moonshine-asr).
## Model Details
### Model Description
- **Base Model:** [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny)
- **Language:** French (fr)
- **Model Size:** 27M parameters
- **Fine-tuned on:** Multilingual LibriSpeech (MLS) French dataset specifically segmented for the requirements of the moonshine model
- **Training Duration:** 8,000 steps
- **Optimizer:** Schedule-free AdamW
- **License:** MIT
### Model Architecture
Moonshine is a compact sequence-to-sequence ASR model designed for efficient on-device inference:
- **Encoder:** Convolutional feature extraction + Transformer blocks
- **Decoder:** Autoregressive Transformer decoder
- **Parameters:** 27M (tiny variant)
- **Input:** 16kHz mono audio
- **Output:** French text transcription
## Performance
### Evaluation Metrics
Evaluated on Multilingual LibriSpeech (MLS) French test set:
| Metric | Score |
|--------|-------|
| **Word Error Rate (WER)** | 21.8% |
| **Character Error Rate (CER)** | ~10% |
| **Real-Time Factor (RTF)** | 0.11x (CPU) |
**Inference Speed:** ~9x faster than real-time on CPU, enabling live transcription.
### Comparison
| Model | Size | Language | WER (MLS-FR) |
|-------|------|----------|--------------|
| Whisper-tiny | 39M | Multilingual | ~25% |
| **Moonshine-tiny-fr** | 27M | French | **21.8%** |
| Whisper-base | 74M | Multilingual | ~18% |
*Moonshine-tiny-fr achieves competitive performance with 30% fewer parameters than Whisper-tiny. While being a proof of concept. More work should be done to create a proper and robust dataset.*
## Training Details / Fine tuning
Please refer to my Github repo for the training procedure :
## Use Cases
### Primary Applications
✅ **French Speech Recognition**
- Real-time transcription
- Audio file transcription
- Voice commands
- Accessibility tools
✅ **Resource-Constrained Environments**
- On-device transcription (mobile, edge devices)
- Low-latency applications
- Offline transcription
✅ **Hogwarts Legacy SpellCaster**
- Ultra-lightweight and low latency spell speech recognition
- https://github.com/pierre-cheneau/HogwartsLegacy-SpellCaster
## Limitations and Biases
### Known Limitations for this tiny model
- **Hallucination:** Like all seq2seq models, may generate text not present in audio
- **Repetition:** May repeat phrases, especially with greedy decoding (use beam search)
- **Short Segments:** Performance may degrade on very short audio clips (<0.5s)
- **Domain Specificity:** Trained primarily on audiobooks (read speech)
- **Accents:** Best performance on metropolitan French; regional accents may have higher WER
- **Background Noise:** Performance degrades with significant background noise
## Model Card Author
**Pierre Chéneau (Cornebidouil)**
Geologist, Developer and maintainer of this fine-tuned French model.
**Links:**
- 🌐 [Personal Website](https://pcheneau.fr)
- 💼 [GitHub](https://github.com/pierre-cheneau)
- 📚 [Fine-tuning Guide](https://github.com/pierre-cheneau/finetune-moonshine-asr)
## Citations
### This Model
```bibtex
@misc{cheneau2026moonshine-tiny-fr,
author = {Pierre Chéneau (Cornebidouil)},
title = {Moonshine-Tiny-FR: Fine-tuned French Speech Recognition},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/Cornebidouil/moonshine-tiny-fr}
}
```
### Fine tuning Guide
```bibtex
@misc{cheneau2026moonshine-finetune,
author = {Pierre Chéneau (Cornebidouil)},
title = {Moonshine ASR Fine-Tuning Guide},
year = {2026},
publisher = {GitHub},
url = {https://github.com/pierre-cheneau/finetune-moonshine-asr}
}
```
### Original Moonshine Model
```bibtex
@misc{jeffries2024moonshinespeechrecognitionlive,
title={Moonshine: Speech Recognition for Live Transcription and Voice Commands},
author={Nat Jeffries and Evan King and Manjunath Kudlur and Guy Nicholson and James Wang and Pete Warden},
year={2024},
eprint={2410.15608},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2410.15608},
}
```
### Multilingual LibriSpeech Dataset
```bibtex
@inproceedings{panayotov2015librispeech,
title={Multilingual LibriSpeech: A Corpus for Speech Recognition in Multiple Languages},
author={Pratap, Vineel and Xu, Qiantong and Sriram, Anuroop and Synnaeve, Gabriel and Collobert, Ronan},
booktitle={Interspeech},
year={2020}
}
```
## Additional Resources
- **Fine-Tuning Guide:** [Complete tutorial](https://github.com/pierre-cheneau/finetune-moonshine-asr)
- **Original Moonshine:** [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny)
- **Dataset:** [Multilingual LibriSpeech](https://huggingface.co/datasets/facebook/multilingual_librispeech)
- **Issues/Support:** [GitHub Issues](https://github.com/pierre-cheneau/finetune-moonshine-asr/issues)
## License
This model is released under the MIT License, consistent with the base Moonshine model.
```
MIT License
Copyright (c) 2026 Pierre Chéneau (Cornebidouil)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction...
```
## Acknowledgments
- **Useful Sensors** for the original Moonshine architecture and pre-trained model
- **Meta AI** for the Multilingual LibriSpeech dataset
- **HuggingFace** for the transformers library and model hosting
- **Schedule-Free Learning** for the optimizer implementation
---
**Questions?** Open an issue on the [fine-tuning guide repository](https://github.com/pierre-cheneau/finetune-moonshine-asr) or check the documentation.
**Want to fine-tune for your language?** See the [complete fine-tuning guide](https://github.com/pierre-cheneau/finetune-moonshine-asr). |