Parakeet-TDT 0.6B v3 (MLX)

NVIDIA's state-of-the-art multilingual ASR model, optimized for Apple Silicon via MLX.

Maintained by Sonic Speech - Local-first speech recognition for macOS.

Model Details

Property	Value
Parameters	0.6B
Languages	25 (see full list below)
Architecture	FastConformer + TDT (Token-and-Duration Transducer)
Precision	bfloat16
WER (LibriSpeech test-clean)	1.78%
RTFx (Apple Silicon)	78-100x real-time

Supported Languages

English, German, Spanish, French, Italian, Portuguese, Dutch, Polish, Russian, Ukrainian, Turkish, Arabic, Persian, Hindi, Tamil, Telugu, Chinese, Japanese, Korean, Vietnamese, Thai, Indonesian, Malay, Tagalog, Swahili

Performance

Benchmarked on LibriSpeech test-clean (500 samples, 63.5 minutes):

Metric	Value
WER	1.78%
RTFx	78-100x real-time
Memory (inference)	~2.5GB

Bug Fixes Included

This model distribution includes critical fixes from parakeet-mlx:

STFT Magnitude Fix (v0.5.0) - Corrected L2 norm calculation, eliminates hallucinations on non-speech signals
Memory Leak Fix - Proper MLX Metal cache clearing for batch processing
Timestamp Discontinuity Fix - Accurate timestamps for long-form audio

Usage

With parakeet-mlx

pip install parakeet-mlx

from parakeet import from_pretrained

model = from_pretrained("sonic-speech/parakeet-tdt-0.6b-v3")
result = model.transcribe("audio.wav")
print(result.text)

With Sonic Speech App

This model is automatically downloaded when using Sonic Speech for macOS.

License

This model is released under CC-BY-4.0, following NVIDIA's original licensing.

Citation

@misc{nvidia2024parakeet,
  title={Parakeet: A Family of Automatic Speech Recognition Models},
  author={NVIDIA NeMo Team},
  year={2024},
  publisher={NVIDIA}
}

Model tree for sonic-speech/parakeet-tdt-0.6b-v3

Base model

nvidia/parakeet-tdt-0.6b-v2

Finetuned

(16)

this model

sonic-speech
/

parakeet-tdt-0.6b-v3