Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,10 +1,63 @@
|
|
| 1 |
-
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
colorTo: blue
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
Edit this `README.md` markdown file to author your organization card.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Sonic Speech
|
| 3 |
+
emoji: π€
|
| 4 |
+
colorFrom: purple
|
| 5 |
colorTo: blue
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
+
# Sonic Speech
|
| 11 |
+
|
| 12 |
+
Optimized speech models for Apple Silicon, powering [Sonic](https://github.com/flight505/sonic-workspace) β a local-first voice AI
|
| 13 |
+
system. All models run entirely on-device using [MLX](https://github.com/ml-explore/mlx). No cloud, no API keys, no data leaves your
|
| 14 |
+
Mac.
|
| 15 |
+
|
| 16 |
+
## ASR β Parakeet TDT (NVIDIA, ported to MLX)
|
| 17 |
+
|
| 18 |
+
SOTA English speech recognition with encoder-only mixed-precision quantization.
|
| 19 |
+
|
| 20 |
+
| Model | Size | WER (LibriSpeech) | WER (TED-LIUM) | RTFx | Peak Memory |
|
| 21 |
+
|-------|------|-------------------|-----------------|------|-------------|
|
| 22 |
+
| [parakeet-tdt-0.6b-v3](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v3) | 1,254 MB | 0.82% | 15.1% | 73x | 3,002 MB |
|
| 23 |
+
| [parakeet-tdt-0.6b-v3-int8](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v3-int8) | 755 MB | 0.82% | 15.1% | 95x | 1,268
|
| 24 |
+
MB |
|
| 25 |
+
| [parakeet-tdt-0.6b-v3-int4](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v3-int4) | 489 MB | 0.82% | 15.5% | 98x | 1,003
|
| 26 |
+
MB |
|
| 27 |
+
| [parakeet-tdt-0.6b-v2](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v2) | 1,222 MB | β | β | β | β |
|
| 28 |
+
| [parakeet-tdt-0.6b-v2-int8](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v2-int8) | 736 MB | β | β | β | β |
|
| 29 |
+
| [parakeet-tdt-0.6b-v2-int4](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v2-int4) | 470 MB | β | β | β | β |
|
| 30 |
+
|
| 31 |
+
**v3** supports 25 languages. **v2** is English-only. **INT8 recommended** β zero WER loss, 40% smaller, 30% faster.
|
| 32 |
+
|
| 33 |
+
## TTS β Kokoro 82M (MLX)
|
| 34 |
+
|
| 35 |
+
Fast text-to-speech with 32+ voices (American, British, Japanese, Chinese).
|
| 36 |
+
|
| 37 |
+
| Model | Size | Short Text | Medium Text | TTFC (streaming) | RTFx |
|
| 38 |
+
|-------|------|------------|-------------|------------------|------|
|
| 39 |
+
| [kokoro-82m-bf16](https://huggingface.co/sonic-speech/kokoro-82m-bf16) | ~170 MB | 47 ms | 224 ms | 126 ms | 41x |
|
| 40 |
+
|
| 41 |
+
## Quantization Strategy
|
| 42 |
+
|
| 43 |
+
Only the Conformer encoder (~85% of params) is quantized β the decoder stays BF16 for token precision.
|
| 44 |
+
|
| 45 |
+
| Variant | Size | Speed | Memory | WER Impact |
|
| 46 |
+
|---------|------|-------|--------|------------|
|
| 47 |
+
| INT8 | -40% | +30% | -58% | None |
|
| 48 |
+
| INT4 | -61% | +34% | -67% | +0.4pp on real speech |
|
| 49 |
+
|
| 50 |
+
## Quick Start
|
| 51 |
+
|
| 52 |
+
```python
|
| 53 |
+
# ASR
|
| 54 |
+
from parakeet import from_pretrained
|
| 55 |
+
model = from_pretrained("sonic-speech/parakeet-tdt-0.6b-v3-int8")
|
| 56 |
+
|
| 57 |
+
# TTS
|
| 58 |
+
from sonic_tts import SonicTTS
|
| 59 |
+
tts = SonicTTS(voice="af_heart")
|
| 60 |
+
|
| 61 |
+
All benchmarks: Apple M3 Max 64 GB, macOS Sequoia, MLX 0.30.4. Built by https://huggingface.co/flight505.
|
| 62 |
+
|
| 63 |
Edit this `README.md` markdown file to author your organization card.
|