Parakeet-TDT 0.6B v2 (MLX)
NVIDIA's state-of-the-art English-only ASR model, optimized for Apple Silicon via MLX.
Maintained by Sonic Speech - Local-first speech recognition for macOS.
Model Details
| Property | Value |
|---|---|
| Parameters | 0.6B |
| Languages | English only |
| Architecture | FastConformer + TDT (Token-and-Duration Transducer) |
| Precision | bfloat16 |
| WER (LibriSpeech test-clean) | 1.67% |
| RTFx (Apple Silicon) | 78-100x real-time |
Performance
Benchmarked on LibriSpeech test-clean (500 samples, 63.5 minutes):
| Metric | Value |
|---|---|
| WER | 1.67% |
| RTFx | 78-100x real-time |
| Memory (inference) | ~2.5GB |
Note: v2 achieves slightly better WER than v3 on English due to its English-only training focus.
Bug Fixes Included
This model distribution includes critical fixes from parakeet-mlx:
- STFT Magnitude Fix (v0.5.0) - Corrected L2 norm calculation, eliminates hallucinations on non-speech signals
- Memory Leak Fix - Proper MLX Metal cache clearing for batch processing
- Timestamp Discontinuity Fix - Accurate timestamps for long-form audio
Usage
With parakeet-mlx
pip install parakeet-mlx
from parakeet import from_pretrained
model = from_pretrained("sonic-speech/parakeet-tdt-0.6b-v2")
result = model.transcribe("audio.wav")
print(result.text)
With Sonic Speech App
This model is automatically downloaded when using Sonic Speech for macOS.
License
This model is released under CC-BY-4.0, following NVIDIA's original licensing.
Citation
@misc{nvidia2024parakeet,
title={Parakeet: A Family of Automatic Speech Recognition Models},
author={NVIDIA NeMo Team},
year={2024},
publisher={NVIDIA}
}
Links
- Sonic Speech - Local-first dictation for macOS
- parakeet-mlx - MLX implementation
- NVIDIA Parakeet - Original model
- Downloads last month
- 43
Hardware compatibility
Log In
to view the estimation
Quantized