Parakeet-TDT 0.6B v2 (MLX)

NVIDIA's state-of-the-art English-only ASR model, optimized for Apple Silicon via MLX.

Maintained by Sonic Speech - Local-first speech recognition for macOS.

Model Details

Property	Value
Parameters	0.6B
Languages	English only
Architecture	FastConformer + TDT (Token-and-Duration Transducer)
Precision	bfloat16
WER (LibriSpeech test-clean)	1.67%
RTFx (Apple Silicon)	78-100x real-time

Performance

Benchmarked on LibriSpeech test-clean (500 samples, 63.5 minutes):

Metric	Value
WER	1.67%
RTFx	78-100x real-time
Memory (inference)	~2.5GB

Note: v2 achieves slightly better WER than v3 on English due to its English-only training focus.

Bug Fixes Included

This model distribution includes critical fixes from parakeet-mlx:

STFT Magnitude Fix (v0.5.0) - Corrected L2 norm calculation, eliminates hallucinations on non-speech signals
Memory Leak Fix - Proper MLX Metal cache clearing for batch processing
Timestamp Discontinuity Fix - Accurate timestamps for long-form audio

Usage

With parakeet-mlx

pip install parakeet-mlx

from parakeet import from_pretrained

model = from_pretrained("sonic-speech/parakeet-tdt-0.6b-v2")
result = model.transcribe("audio.wav")
print(result.text)

With Sonic Speech App

This model is automatically downloaded when using Sonic Speech for macOS.

License

This model is released under CC-BY-4.0, following NVIDIA's original licensing.

Citation

@misc{nvidia2024parakeet,
  title={Parakeet: A Family of Automatic Speech Recognition Models},
  author={NVIDIA NeMo Team},
  year={2024},
  publisher={NVIDIA}
}

sonic-speech
/

parakeet-tdt-0.6b-v2