Parakeet-TDT 0.6B v2 (MLX)

NVIDIA's state-of-the-art English-only ASR model, optimized for Apple Silicon via MLX.

Maintained by Sonic Speech - Local-first speech recognition for macOS.

Model Details

Property Value
Parameters 0.6B
Languages English only
Architecture FastConformer + TDT (Token-and-Duration Transducer)
Precision bfloat16
WER (LibriSpeech test-clean) 1.67%
RTFx (Apple Silicon) 78-100x real-time

Performance

Benchmarked on LibriSpeech test-clean (500 samples, 63.5 minutes):

Metric Value
WER 1.67%
RTFx 78-100x real-time
Memory (inference) ~2.5GB

Note: v2 achieves slightly better WER than v3 on English due to its English-only training focus.

Bug Fixes Included

This model distribution includes critical fixes from parakeet-mlx:

  • STFT Magnitude Fix (v0.5.0) - Corrected L2 norm calculation, eliminates hallucinations on non-speech signals
  • Memory Leak Fix - Proper MLX Metal cache clearing for batch processing
  • Timestamp Discontinuity Fix - Accurate timestamps for long-form audio

Usage

With parakeet-mlx

pip install parakeet-mlx
from parakeet import from_pretrained

model = from_pretrained("sonic-speech/parakeet-tdt-0.6b-v2")
result = model.transcribe("audio.wav")
print(result.text)

With Sonic Speech App

This model is automatically downloaded when using Sonic Speech for macOS.

License

This model is released under CC-BY-4.0, following NVIDIA's original licensing.

Citation

@misc{nvidia2024parakeet,
  title={Parakeet: A Family of Automatic Speech Recognition Models},
  author={NVIDIA NeMo Team},
  year={2024},
  publisher={NVIDIA}
}

Links

Downloads last month
43
MLX
Hardware compatibility
Log In to view the estimation

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support