facebook/multilingual_librispeech
Viewer • Updated • 1.49M • 49.4k • 180
Fine-tuned UsefulSensors/moonshine-tiny for German automatic speech recognition.
from transformers import pipeline
transcriber = pipeline("automatic-speech-recognition", model="dattazigzag/moonshine-tiny-de")
result = transcriber("german_audio.wav")
print(result["text"])
from pathlib import Path
audio_files = Path("./audio").glob("*.wav")
for audio in audio_files:
result = transcriber(str(audio))
print(f"{audio.name}: {result['text']}")
from transformers import AutoProcessor, MoonshineForConditionalGeneration
import torch
model = MoonshineForConditionalGeneration.from_pretrained("dattazigzag/moonshine-tiny-de")
processor = AutoProcessor.from_pretrained("dattazigzag/moonshine-tiny-de")
model.eval()
# Process audio (16kHz mono WAV)
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
generated_ids = model.generate(**inputs, max_new_tokens=80)
text = processor.tokenizer.decode(generated_ids[0], skip_special_tokens=True)
This is not trained from scratch. We fine-tuned the English-only moonshine-tiny model to understand German. The pre-trained model already knew audio feature extraction, attention patterns, and tokenization — we adapted it to German phonetics and vocabulary.
| Setting | Value |
|---|---|
| Optimizer | schedule-free AdamW |
| Learning rate | 3e-4 (constant after 300-step warmup) |
| Precision | bf16 |
| Batch size | 16 per device × 4 accumulation = 64 effective |
| Audio duration | 4–20 seconds |
| Gradient checkpointing | Disabled (broken with Moonshine in transformers 4.49) |
| Curriculum learning | Disabled (simple first run) |
| Step | Loss | WER |
|---|---|---|
| 500 | 2.37 | — |
| 1,000 | 2.04 | 46.5% |
| 5,000 | ~1.65 | ~39% |
| 10,000 | 1.61 | 36.7% |
.ort format for the native moonshine-voice CLI. ONNX conversion is a planned next step.Trained using a fork of Pierre Chéneau's finetune-moonshine-asr with German-specific adaptations:
@misc{datta2026moonshine-tiny-de,
author = {Saurabh Datta},
title = {Moonshine-Tiny-DE: Fine-tuned German Speech Recognition},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/dattazigzag/moonshine-tiny-de}
}