Moonshine-Tiny-FR: French Speech Recognition Model

Fine-tuned Moonshine ASR model for French language

This is a fine-tuned version of UsefulSensors/moonshine-tiny specifically optimized for French speech recognition. The model achieves state-of-the-art performance for its size (27M parameters) on French ASR tasks.

Links:

Usage

Installation

pip install --upgrade pip
pip install --upgrade transformers datasets[audio]

Basic Usage

from transformers import MoonshineForConditionalGeneration, AutoProcessor
import torch
import torchaudio

# Load model and processor
model = MoonshineForConditionalGeneration.from_pretrained('Cornebidouil/moonshine-tiny-fr')
processor = AutoProcessor.from_pretrained('Cornebidouil/moonshine-tiny-fr')

# Load and resample audio to 16kHz
audio, sr = torchaudio.load("french_audio.wav")
if sr != 16000:
    audio = torchaudio.functional.resample(audio, sr, 16000)
audio = audio[0].numpy()  # Convert to mono

# Prepare inputs
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")

# Generate transcription
# Calculate max_new_tokens to avoid truncation (5 tokens per second is optimal for French)
audio_duration = len(audio) / 16000
max_new_tokens = int(audio_duration * 5)

generated_ids = model.generate(**inputs, max_new_tokens=max_new_tokens)
transcription = processor.decode(generated_ids[0], skip_special_tokens=True)
print(transcription)

Advanced Usage

For production deployments with:

Live transcription with Voice Activity Detection
ONNX optimization (20-30% faster)
Batch processing scripts
Complete inference pipeline

See the included inference.py script in the fine-tuning guide.

Model Details

Model Description

Base Model: UsefulSensors/moonshine-tiny
Language: French (fr)
Model Size: 27M parameters
Fine-tuned on: Multilingual LibriSpeech (MLS) French dataset specifically segmented for the requirements of the moonshine model
Training Duration: 8,000 steps
Optimizer: Schedule-free AdamW
License: MIT

Model Architecture

Moonshine is a compact sequence-to-sequence ASR model designed for efficient on-device inference:

Encoder: Convolutional feature extraction + Transformer blocks
Decoder: Autoregressive Transformer decoder
Parameters: 27M (tiny variant)
Input: 16kHz mono audio
Output: French text transcription

Performance

Evaluation Metrics

Evaluated on Multilingual LibriSpeech (MLS) French test set:

Metric	Score
Word Error Rate (WER)	21.8%
Character Error Rate (CER)	~10%
Real-Time Factor (RTF)	0.11x (CPU)

Inference Speed: ~9x faster than real-time on CPU, enabling live transcription.

Comparison

Model	Size	Language	WER (MLS-FR)
Whisper-tiny	39M	Multilingual	~25%
Moonshine-tiny-fr	27M	French	21.8%
Whisper-base	74M	Multilingual	~18%

Moonshine-tiny-fr achieves competitive performance with 30% fewer parameters than Whisper-tiny. While being a proof of concept. More work should be done to create a proper and robust dataset.

Training Details / Fine tuning

Please refer to my Github repo for the training procedure :

Use Cases

Primary Applications

✅ French Speech Recognition

Real-time transcription
Audio file transcription
Voice commands
Accessibility tools

✅ Resource-Constrained Environments

On-device transcription (mobile, edge devices)
Low-latency applications
Offline transcription

✅ Hogwarts Legacy SpellCaster

Ultra-lightweight and low latency spell speech recognition
https://github.com/pierre-cheneau/HogwartsLegacy-SpellCaster

Limitations and Biases

Known Limitations for this tiny model

Hallucination: Like all seq2seq models, may generate text not present in audio
Repetition: May repeat phrases, especially with greedy decoding (use beam search)
Short Segments: Performance may degrade on very short audio clips (<0.5s)
Domain Specificity: Trained primarily on audiobooks (read speech)
Accents: Best performance on metropolitan French; regional accents may have higher WER
Background Noise: Performance degrades with significant background noise

Model Card Author

Pierre Chéneau (Cornebidouil)

Geologist, Developer and maintainer of this fine-tuned French model.

Links:

Citations

This Model

@misc{cheneau2026moonshine-tiny-fr,
  author = {Pierre Chéneau (Cornebidouil)},
  title = {Moonshine-Tiny-FR: Fine-tuned French Speech Recognition},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Cornebidouil/moonshine-tiny-fr}
}

Fine tuning Guide

@misc{cheneau2026moonshine-finetune,
  author = {Pierre Chéneau (Cornebidouil)},
  title = {Moonshine ASR Fine-Tuning Guide},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/pierre-cheneau/finetune-moonshine-asr}
}

Original Moonshine Model

@misc{jeffries2024moonshinespeechrecognitionlive,
      title={Moonshine: Speech Recognition for Live Transcription and Voice Commands},
      author={Nat Jeffries and Evan King and Manjunath Kudlur and Guy Nicholson and James Wang and Pete Warden},
      year={2024},
      eprint={2410.15608},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2410.15608},
}

Multilingual LibriSpeech Dataset

@inproceedings{panayotov2015librispeech,
  title={Multilingual LibriSpeech: A Corpus for Speech Recognition in Multiple Languages},
  author={Pratap, Vineel and Xu, Qiantong and Sriram, Anuroop and Synnaeve, Gabriel and Collobert, Ronan},
  booktitle={Interspeech},
  year={2020}
}

Additional Resources

Fine-Tuning Guide: Complete tutorial
Original Moonshine: UsefulSensors/moonshine-tiny
Dataset: Multilingual LibriSpeech
Issues/Support: GitHub Issues

License

This model is released under the MIT License, consistent with the base Moonshine model.

MIT License

Copyright (c) 2026 Pierre Chéneau (Cornebidouil)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction...

Acknowledgments

Useful Sensors for the original Moonshine architecture and pre-trained model
Meta AI for the Multilingual LibriSpeech dataset
HuggingFace for the transformers library and model hosting
Schedule-Free Learning for the optimizer implementation

Questions? Open an issue on the fine-tuning guide repository or check the documentation.

Want to fine-tune for your language? See the complete fine-tuning guide.

Downloads last month: 30

Safetensors

Model size

27.1M params

Tensor type

F32

Model tree for Cornebidouil/moonshine-tiny-fr

Base model

UsefulSensors/moonshine-tiny

Finetuned

(4)

this model

Quantizations

11 models

Dataset used to train Cornebidouil/moonshine-tiny-fr

Paper for Cornebidouil/moonshine-tiny-fr

Moonshine: Speech Recognition for Live Transcription and Voice Commands

Paper • 2410.15608 • Published Oct 21, 2024 • 12