README / README.md
flight505's picture
Fix: strip trailing whitespace from YAML frontmatter
f813cc9 verified
---
title: Sonic Speech
emoji: 🎀
colorFrom: purple
colorTo: blue
sdk: static
pinned: false
---
# Sonic Speech
Optimized speech models for Apple Silicon, powering [Sonic](https://github.com/flight505/sonic-workspace) β€” a local-first voice AI
system. All models run entirely on-device using [MLX](https://github.com/ml-explore/mlx). No cloud, no API keys, no data leaves your
Mac.
## ASR β€” Parakeet TDT (NVIDIA, ported to MLX)
SOTA English speech recognition with encoder-only mixed-precision quantization.
| Model | Size | WER (LibriSpeech) | WER (TED-LIUM) | RTFx | Peak Memory |
|-------|------|-------------------|-----------------|------|-------------|
| [parakeet-tdt-0.6b-v3](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v3) | 1,254 MB | 0.82% | 15.1% | 73x | 3,002 MB |
| [parakeet-tdt-0.6b-v3-int8](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v3-int8) | 755 MB | 0.82% | 15.1% | 95x | 1,268
MB |
| [parakeet-tdt-0.6b-v3-int4](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v3-int4) | 489 MB | 0.82% | 15.5% | 98x | 1,003
MB |
| [parakeet-tdt-0.6b-v2](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v2) | 1,222 MB | β€” | β€” | β€” | β€” |
| [parakeet-tdt-0.6b-v2-int8](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v2-int8) | 736 MB | β€” | β€” | β€” | β€” |
| [parakeet-tdt-0.6b-v2-int4](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v2-int4) | 470 MB | β€” | β€” | β€” | β€” |
**v3** supports 25 languages. **v2** is English-only. **INT8 recommended** β€” zero WER loss, 40% smaller, 30% faster.
## TTS β€” Kokoro 82M (MLX)
Fast text-to-speech with 32+ voices (American, British, Japanese, Chinese).
| Model | Size | Short Text | Medium Text | TTFC (streaming) | RTFx |
|-------|------|------------|-------------|------------------|------|
| [kokoro-82m-bf16](https://huggingface.co/sonic-speech/kokoro-82m-bf16) | ~170 MB | 47 ms | 224 ms | 126 ms | 41x |
## Quantization Strategy
Only the Conformer encoder (~85% of params) is quantized β€” the decoder stays BF16 for token precision.
| Variant | Size | Speed | Memory | WER Impact |
|---------|------|-------|--------|------------|
| INT8 | -40% | +30% | -58% | None |
| INT4 | -61% | +34% | -67% | +0.4pp on real speech |
## Quick Start
```python
# ASR
from parakeet import from_pretrained
model = from_pretrained("sonic-speech/parakeet-tdt-0.6b-v3-int8")
# TTS
from sonic_tts import SonicTTS
tts = SonicTTS(voice="af_heart")
All benchmarks: Apple M3 Max 64 GB, macOS Sequoia, MLX 0.30.4. Built by https://huggingface.co/flight505.
```