Update README.md

cae6a66 verified about 1 month ago

8.35 kB

	---
	license: mit
	language:
	- fr
	metrics:
	- wer
	- cer
	base_model:
	- UsefulSensors/moonshine-tiny
	pipeline_tag: automatic-speech-recognition
	library_name: transformers
	arvix: https://arxiv.org/abs/2410.15608
	datasets:
	- facebook/multilingual_librispeech
	tags:
	- audio
	- automatic-speech-recognition
	- speech-to-text
	- speech
	- french
	- moonshine
	- asr
	---

	# Moonshine-Tiny-FR: French Speech Recognition Model

	Fine-tuned Moonshine ASR model for French language

	This is a fine-tuned version of [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny) specifically optimized for French speech recognition. The model achieves state-of-the-art performance for its size (27M parameters) on French ASR tasks.

	Links:
	- [[Original Moonshine Blog]](https://petewarden.com/2024/10/21/introducing-moonshine-the-new-state-of-the-art-for-speech-to-text/)
	- [[Original Paper]](https://arxiv.org/abs/2410.15608)
	- [[Fine-Tuning Guide]](https://github.com/pierre-cheneau/finetune-moonshine-asr)

	## Usage

	### Installation
	```bash
	pip install --upgrade pip
	pip install --upgrade transformers datasets[audio]
	```

	### Basic Usage

	```python
	from transformers import MoonshineForConditionalGeneration, AutoProcessor
	import torch
	import torchaudio

	# Load model and processor
	model = MoonshineForConditionalGeneration.from_pretrained('Cornebidouil/moonshine-tiny-fr')
	processor = AutoProcessor.from_pretrained('Cornebidouil/moonshine-tiny-fr')

	# Load and resample audio to 16kHz
	audio, sr = torchaudio.load("french_audio.wav")
	if sr != 16000:
	audio = torchaudio.functional.resample(audio, sr, 16000)
	audio = audio[0].numpy() # Convert to mono

	# Prepare inputs
	inputs = processor(audio, sampling_rate=16000, return_tensors="pt")

	# Generate transcription
	# Calculate max_new_tokens to avoid truncation (5 tokens per second is optimal for French)
	audio_duration = len(audio) / 16000
	max_new_tokens = int(audio_duration * 5)

	generated_ids = model.generate(**inputs, max_new_tokens=max_new_tokens)
	transcription = processor.decode(generated_ids[0], skip_special_tokens=True)
	print(transcription)
	```

	### Advanced Usage

	For production deployments with:
	- Live transcription with Voice Activity Detection
	- ONNX optimization (20-30% faster)
	- Batch processing scripts
	- Complete inference pipeline

	See the included [`inference.py`](https://github.com/pierre-cheneau/finetune-moonshine-asr/blob/main/scripts/inference.py) script in the [fine-tuning guide](https://github.com/pierre-cheneau/finetune-moonshine-asr).

	## Model Details

	### Model Description

	- Base Model: [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny)
	- Language: French (fr)
	- Model Size: 27M parameters
	- Fine-tuned on: Multilingual LibriSpeech (MLS) French dataset specifically segmented for the requirements of the moonshine model
	- Training Duration: 8,000 steps
	- Optimizer: Schedule-free AdamW
	- License: MIT

	### Model Architecture

	Moonshine is a compact sequence-to-sequence ASR model designed for efficient on-device inference:
	- Encoder: Convolutional feature extraction + Transformer blocks
	- Decoder: Autoregressive Transformer decoder
	- Parameters: 27M (tiny variant)
	- Input: 16kHz mono audio
	- Output: French text transcription

	## Performance

	### Evaluation Metrics

	Evaluated on Multilingual LibriSpeech (MLS) French test set:

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Word Error Rate (WER) \| 21.8% \|
	\| Character Error Rate (CER) \| ~10% \|
	\| Real-Time Factor (RTF) \| 0.11x (CPU) \|

	Inference Speed: ~9x faster than real-time on CPU, enabling live transcription.

	### Comparison

	\| Model \| Size \| Language \| WER (MLS-FR) \|
	\|-------\|------\|----------\|--------------\|
	\| Whisper-tiny \| 39M \| Multilingual \| ~25% \|
	\| Moonshine-tiny-fr \| 27M \| French \| 21.8% \|
	\| Whisper-base \| 74M \| Multilingual \| ~18% \|

	Moonshine-tiny-fr achieves competitive performance with 30% fewer parameters than Whisper-tiny. While being a proof of concept. More work should be done to create a proper and robust dataset.

	## Training Details / Fine tuning

	Please refer to my Github repo for the training procedure :

	## Use Cases

	### Primary Applications

	✅ French Speech Recognition
	- Real-time transcription
	- Audio file transcription
	- Voice commands
	- Accessibility tools

	✅ Resource-Constrained Environments
	- On-device transcription (mobile, edge devices)
	- Low-latency applications
	- Offline transcription

	✅ Hogwarts Legacy SpellCaster
	- Ultra-lightweight and low latency spell speech recognition
	- https://github.com/pierre-cheneau/HogwartsLegacy-SpellCaster

	## Limitations and Biases

	### Known Limitations for this tiny model

	- Hallucination: Like all seq2seq models, may generate text not present in audio
	- Repetition: May repeat phrases, especially with greedy decoding (use beam search)
	- Short Segments: Performance may degrade on very short audio clips (<0.5s)
	- Domain Specificity: Trained primarily on audiobooks (read speech)
	- Accents: Best performance on metropolitan French; regional accents may have higher WER
	- Background Noise: Performance degrades with significant background noise

	## Model Card Author

	Pierre Chéneau (Cornebidouil)

	Geologist, Developer and maintainer of this fine-tuned French model.

	Links:
	- 🌐 [Personal Website](https://pcheneau.fr)
	- 💼 [GitHub](https://github.com/pierre-cheneau)
	- 📚 [Fine-tuning Guide](https://github.com/pierre-cheneau/finetune-moonshine-asr)

	## Citations

	### This Model

	```bibtex
	@misc{cheneau2026moonshine-tiny-fr,
	author = {Pierre Chéneau (Cornebidouil)},
	title = {Moonshine-Tiny-FR: Fine-tuned French Speech Recognition},
	year = {2026},
	publisher = {HuggingFace},
	url = {https://huggingface.co/Cornebidouil/moonshine-tiny-fr}
	}
	```

	### Fine tuning Guide

	```bibtex
	@misc{cheneau2026moonshine-finetune,
	author = {Pierre Chéneau (Cornebidouil)},
	title = {Moonshine ASR Fine-Tuning Guide},
	year = {2026},
	publisher = {GitHub},
	url = {https://github.com/pierre-cheneau/finetune-moonshine-asr}
	}
	```

	### Original Moonshine Model

	```bibtex
	@misc{jeffries2024moonshinespeechrecognitionlive,
	title={Moonshine: Speech Recognition for Live Transcription and Voice Commands},
	author={Nat Jeffries and Evan King and Manjunath Kudlur and Guy Nicholson and James Wang and Pete Warden},
	year={2024},
	eprint={2410.15608},
	archivePrefix={arXiv},
	primaryClass={cs.SD},
	url={https://arxiv.org/abs/2410.15608},
	}
	```

	### Multilingual LibriSpeech Dataset

	```bibtex
	@inproceedings{panayotov2015librispeech,
	title={Multilingual LibriSpeech: A Corpus for Speech Recognition in Multiple Languages},
	author={Pratap, Vineel and Xu, Qiantong and Sriram, Anuroop and Synnaeve, Gabriel and Collobert, Ronan},
	booktitle={Interspeech},
	year={2020}
	}
	```

	## Additional Resources

	- Fine-Tuning Guide: [Complete tutorial](https://github.com/pierre-cheneau/finetune-moonshine-asr)
	- Original Moonshine: [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny)
	- Dataset: [Multilingual LibriSpeech](https://huggingface.co/datasets/facebook/multilingual_librispeech)
	- Issues/Support: [GitHub Issues](https://github.com/pierre-cheneau/finetune-moonshine-asr/issues)

	## License

	This model is released under the MIT License, consistent with the base Moonshine model.

	```
	MIT License

	Copyright (c) 2026 Pierre Chéneau (Cornebidouil)

	Permission is hereby granted, free of charge, to any person obtaining a copy
	of this software and associated documentation files (the "Software"), to deal
	in the Software without restriction...
	```

	## Acknowledgments

	- Useful Sensors for the original Moonshine architecture and pre-trained model
	- Meta AI for the Multilingual LibriSpeech dataset
	- HuggingFace for the transformers library and model hosting
	- Schedule-Free Learning for the optimizer implementation

	---

	Questions? Open an issue on the [fine-tuning guide repository](https://github.com/pierre-cheneau/finetune-moonshine-asr) or check the documentation.

	Want to fine-tune for your language? See the [complete fine-tuning guide](https://github.com/pierre-cheneau/finetune-moonshine-asr).