moonshine-tiny-fr / README.md
Cornebidouil's picture
Update README.md
cae6a66 verified
---
license: mit
language:
- fr
metrics:
- wer
- cer
base_model:
- UsefulSensors/moonshine-tiny
pipeline_tag: automatic-speech-recognition
library_name: transformers
arvix: https://arxiv.org/abs/2410.15608
datasets:
- facebook/multilingual_librispeech
tags:
- audio
- automatic-speech-recognition
- speech-to-text
- speech
- french
- moonshine
- asr
---
# Moonshine-Tiny-FR: French Speech Recognition Model
**Fine-tuned Moonshine ASR model for French language**
This is a fine-tuned version of [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny) specifically optimized for French speech recognition. The model achieves state-of-the-art performance for its size (27M parameters) on French ASR tasks.
**Links:**
- [[Original Moonshine Blog]](https://petewarden.com/2024/10/21/introducing-moonshine-the-new-state-of-the-art-for-speech-to-text/)
- [[Original Paper]](https://arxiv.org/abs/2410.15608)
- [[Fine-Tuning Guide]](https://github.com/pierre-cheneau/finetune-moonshine-asr)
## Usage
### Installation
```bash
pip install --upgrade pip
pip install --upgrade transformers datasets[audio]
```
### Basic Usage
```python
from transformers import MoonshineForConditionalGeneration, AutoProcessor
import torch
import torchaudio
# Load model and processor
model = MoonshineForConditionalGeneration.from_pretrained('Cornebidouil/moonshine-tiny-fr')
processor = AutoProcessor.from_pretrained('Cornebidouil/moonshine-tiny-fr')
# Load and resample audio to 16kHz
audio, sr = torchaudio.load("french_audio.wav")
if sr != 16000:
audio = torchaudio.functional.resample(audio, sr, 16000)
audio = audio[0].numpy() # Convert to mono
# Prepare inputs
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
# Generate transcription
# Calculate max_new_tokens to avoid truncation (5 tokens per second is optimal for French)
audio_duration = len(audio) / 16000
max_new_tokens = int(audio_duration * 5)
generated_ids = model.generate(**inputs, max_new_tokens=max_new_tokens)
transcription = processor.decode(generated_ids[0], skip_special_tokens=True)
print(transcription)
```
### Advanced Usage
For production deployments with:
- **Live transcription** with Voice Activity Detection
- **ONNX optimization** (20-30% faster)
- **Batch processing** scripts
- **Complete inference pipeline**
See the included [`inference.py`](https://github.com/pierre-cheneau/finetune-moonshine-asr/blob/main/scripts/inference.py) script in the [fine-tuning guide](https://github.com/pierre-cheneau/finetune-moonshine-asr).
## Model Details
### Model Description
- **Base Model:** [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny)
- **Language:** French (fr)
- **Model Size:** 27M parameters
- **Fine-tuned on:** Multilingual LibriSpeech (MLS) French dataset specifically segmented for the requirements of the moonshine model
- **Training Duration:** 8,000 steps
- **Optimizer:** Schedule-free AdamW
- **License:** MIT
### Model Architecture
Moonshine is a compact sequence-to-sequence ASR model designed for efficient on-device inference:
- **Encoder:** Convolutional feature extraction + Transformer blocks
- **Decoder:** Autoregressive Transformer decoder
- **Parameters:** 27M (tiny variant)
- **Input:** 16kHz mono audio
- **Output:** French text transcription
## Performance
### Evaluation Metrics
Evaluated on Multilingual LibriSpeech (MLS) French test set:
| Metric | Score |
|--------|-------|
| **Word Error Rate (WER)** | 21.8% |
| **Character Error Rate (CER)** | ~10% |
| **Real-Time Factor (RTF)** | 0.11x (CPU) |
**Inference Speed:** ~9x faster than real-time on CPU, enabling live transcription.
### Comparison
| Model | Size | Language | WER (MLS-FR) |
|-------|------|----------|--------------|
| Whisper-tiny | 39M | Multilingual | ~25% |
| **Moonshine-tiny-fr** | 27M | French | **21.8%** |
| Whisper-base | 74M | Multilingual | ~18% |
*Moonshine-tiny-fr achieves competitive performance with 30% fewer parameters than Whisper-tiny. While being a proof of concept. More work should be done to create a proper and robust dataset.*
## Training Details / Fine tuning
Please refer to my Github repo for the training procedure :
## Use Cases
### Primary Applications
**French Speech Recognition**
- Real-time transcription
- Audio file transcription
- Voice commands
- Accessibility tools
**Resource-Constrained Environments**
- On-device transcription (mobile, edge devices)
- Low-latency applications
- Offline transcription
**Hogwarts Legacy SpellCaster**
- Ultra-lightweight and low latency spell speech recognition
- https://github.com/pierre-cheneau/HogwartsLegacy-SpellCaster
## Limitations and Biases
### Known Limitations for this tiny model
- **Hallucination:** Like all seq2seq models, may generate text not present in audio
- **Repetition:** May repeat phrases, especially with greedy decoding (use beam search)
- **Short Segments:** Performance may degrade on very short audio clips (<0.5s)
- **Domain Specificity:** Trained primarily on audiobooks (read speech)
- **Accents:** Best performance on metropolitan French; regional accents may have higher WER
- **Background Noise:** Performance degrades with significant background noise
## Model Card Author
**Pierre Chéneau (Cornebidouil)**
Geologist, Developer and maintainer of this fine-tuned French model.
**Links:**
- 🌐 [Personal Website](https://pcheneau.fr)
- 💼 [GitHub](https://github.com/pierre-cheneau)
- 📚 [Fine-tuning Guide](https://github.com/pierre-cheneau/finetune-moonshine-asr)
## Citations
### This Model
```bibtex
@misc{cheneau2026moonshine-tiny-fr,
author = {Pierre Chéneau (Cornebidouil)},
title = {Moonshine-Tiny-FR: Fine-tuned French Speech Recognition},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/Cornebidouil/moonshine-tiny-fr}
}
```
### Fine tuning Guide
```bibtex
@misc{cheneau2026moonshine-finetune,
author = {Pierre Chéneau (Cornebidouil)},
title = {Moonshine ASR Fine-Tuning Guide},
year = {2026},
publisher = {GitHub},
url = {https://github.com/pierre-cheneau/finetune-moonshine-asr}
}
```
### Original Moonshine Model
```bibtex
@misc{jeffries2024moonshinespeechrecognitionlive,
title={Moonshine: Speech Recognition for Live Transcription and Voice Commands},
author={Nat Jeffries and Evan King and Manjunath Kudlur and Guy Nicholson and James Wang and Pete Warden},
year={2024},
eprint={2410.15608},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2410.15608},
}
```
### Multilingual LibriSpeech Dataset
```bibtex
@inproceedings{panayotov2015librispeech,
title={Multilingual LibriSpeech: A Corpus for Speech Recognition in Multiple Languages},
author={Pratap, Vineel and Xu, Qiantong and Sriram, Anuroop and Synnaeve, Gabriel and Collobert, Ronan},
booktitle={Interspeech},
year={2020}
}
```
## Additional Resources
- **Fine-Tuning Guide:** [Complete tutorial](https://github.com/pierre-cheneau/finetune-moonshine-asr)
- **Original Moonshine:** [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny)
- **Dataset:** [Multilingual LibriSpeech](https://huggingface.co/datasets/facebook/multilingual_librispeech)
- **Issues/Support:** [GitHub Issues](https://github.com/pierre-cheneau/finetune-moonshine-asr/issues)
## License
This model is released under the MIT License, consistent with the base Moonshine model.
```
MIT License
Copyright (c) 2026 Pierre Chéneau (Cornebidouil)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction...
```
## Acknowledgments
- **Useful Sensors** for the original Moonshine architecture and pre-trained model
- **Meta AI** for the Multilingual LibriSpeech dataset
- **HuggingFace** for the transformers library and model hosting
- **Schedule-Free Learning** for the optimizer implementation
---
**Questions?** Open an issue on the [fine-tuning guide repository](https://github.com/pierre-cheneau/finetune-moonshine-asr) or check the documentation.
**Want to fine-tune for your language?** See the [complete fine-tuning guide](https://github.com/pierre-cheneau/finetune-moonshine-asr).