Parakeet-TDT 0.6B v3 (MLX)

NVIDIA's state-of-the-art multilingual ASR model, optimized for Apple Silicon via MLX.

Maintained by Sonic Speech - Local-first speech recognition for macOS.

Model Details

Property Value
Parameters 0.6B
Languages 25 (see full list below)
Architecture FastConformer + TDT (Token-and-Duration Transducer)
Precision bfloat16
WER (LibriSpeech test-clean) 1.78%
RTFx (Apple Silicon) 78-100x real-time

Supported Languages

English, German, Spanish, French, Italian, Portuguese, Dutch, Polish, Russian, Ukrainian, Turkish, Arabic, Persian, Hindi, Tamil, Telugu, Chinese, Japanese, Korean, Vietnamese, Thai, Indonesian, Malay, Tagalog, Swahili

Performance

Benchmarked on LibriSpeech test-clean (500 samples, 63.5 minutes):

Metric Value
WER 1.78%
RTFx 78-100x real-time
Memory (inference) ~2.5GB

Bug Fixes Included

This model distribution includes critical fixes from parakeet-mlx:

  • STFT Magnitude Fix (v0.5.0) - Corrected L2 norm calculation, eliminates hallucinations on non-speech signals
  • Memory Leak Fix - Proper MLX Metal cache clearing for batch processing
  • Timestamp Discontinuity Fix - Accurate timestamps for long-form audio

Usage

With parakeet-mlx

pip install parakeet-mlx
from parakeet import from_pretrained

model = from_pretrained("sonic-speech/parakeet-tdt-0.6b-v3")
result = model.transcribe("audio.wav")
print(result.text)

With Sonic Speech App

This model is automatically downloaded when using Sonic Speech for macOS.

License

This model is released under CC-BY-4.0, following NVIDIA's original licensing.

Citation

@misc{nvidia2024parakeet,
  title={Parakeet: A Family of Automatic Speech Recognition Models},
  author={NVIDIA NeMo Team},
  year={2024},
  publisher={NVIDIA}
}

Links

Downloads last month
143
MLX
Hardware compatibility
Log In to view the estimation

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sonic-speech/parakeet-tdt-0.6b-v3

Finetuned
(16)
this model