--- title: Sonic Speech emoji: 🎤 colorFrom: purple colorTo: blue sdk: static pinned: false --- # Sonic Speech Optimized speech models for Apple Silicon, powering [Sonic](https://github.com/flight505/sonic-workspace) — a local-first voice AI system. All models run entirely on-device using [MLX](https://github.com/ml-explore/mlx). No cloud, no API keys, no data leaves your Mac. ## ASR — Parakeet TDT (NVIDIA, ported to MLX) SOTA English speech recognition with encoder-only mixed-precision quantization. | Model | Size | WER (LibriSpeech) | WER (TED-LIUM) | RTFx | Peak Memory | |-------|------|-------------------|-----------------|------|-------------| | [parakeet-tdt-0.6b-v3](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v3) | 1,254 MB | 0.82% | 15.1% | 73x | 3,002 MB | | [parakeet-tdt-0.6b-v3-int8](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v3-int8) | 755 MB | 0.82% | 15.1% | 95x | 1,268 MB | | [parakeet-tdt-0.6b-v3-int4](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v3-int4) | 489 MB | 0.82% | 15.5% | 98x | 1,003 MB | | [parakeet-tdt-0.6b-v2](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v2) | 1,222 MB | — | — | — | — | | [parakeet-tdt-0.6b-v2-int8](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v2-int8) | 736 MB | — | — | — | — | | [parakeet-tdt-0.6b-v2-int4](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v2-int4) | 470 MB | — | — | — | — | **v3** supports 25 languages. **v2** is English-only. **INT8 recommended** — zero WER loss, 40% smaller, 30% faster. ## TTS — Kokoro 82M (MLX) Fast text-to-speech with 32+ voices (American, British, Japanese, Chinese). | Model | Size | Short Text | Medium Text | TTFC (streaming) | RTFx | |-------|------|------------|-------------|------------------|------| | [kokoro-82m-bf16](https://huggingface.co/sonic-speech/kokoro-82m-bf16) | ~170 MB | 47 ms | 224 ms | 126 ms | 41x | ## Quantization Strategy Only the Conformer encoder (~85% of params) is quantized — the decoder stays BF16 for token precision. | Variant | Size | Speed | Memory | WER Impact | |---------|------|-------|--------|------------| | INT8 | -40% | +30% | -58% | None | | INT4 | -61% | +34% | -67% | +0.4pp on real speech | ## Quick Start ```python # ASR from parakeet import from_pretrained model = from_pretrained("sonic-speech/parakeet-tdt-0.6b-v3-int8") # TTS from sonic_tts import SonicTTS tts = SonicTTS(voice="af_heart") All benchmarks: Apple M3 Max 64 GB, macOS Sequoia, MLX 0.30.4. Built by https://huggingface.co/flight505. ```