Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -7,13 +7,13 @@ sdk: static
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
| 11 |
|
| 12 |
Optimized speech models for Apple Silicon, powering [Sonic](https://github.com/flight505/sonic-workspace) β a local-first voice AI
|
| 13 |
system. All models run entirely on-device using [MLX](https://github.com/ml-explore/mlx). No cloud, no API keys, no data leaves your
|
| 14 |
Mac.
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
SOTA English speech recognition with encoder-only mixed-precision quantization.
|
| 19 |
|
|
@@ -30,7 +30,7 @@ pinned: false
|
|
| 30 |
|
| 31 |
**v3** supports 25 languages. **v2** is English-only. **INT8 recommended** β zero WER loss, 40% smaller, 30% faster.
|
| 32 |
|
| 33 |
-
|
| 34 |
|
| 35 |
Fast text-to-speech with 32+ voices (American, British, Japanese, Chinese).
|
| 36 |
|
|
@@ -38,7 +38,7 @@ pinned: false
|
|
| 38 |
|-------|------|------------|-------------|------------------|------|
|
| 39 |
| [kokoro-82m-bf16](https://huggingface.co/sonic-speech/kokoro-82m-bf16) | ~170 MB | 47 ms | 224 ms | 126 ms | 41x |
|
| 40 |
|
| 41 |
-
|
| 42 |
|
| 43 |
Only the Conformer encoder (~85% of params) is quantized β the decoder stays BF16 for token precision.
|
| 44 |
|
|
@@ -47,7 +47,7 @@ pinned: false
|
|
| 47 |
| INT8 | -40% | +30% | -58% | None |
|
| 48 |
| INT4 | -61% | +34% | -67% | +0.4pp on real speech |
|
| 49 |
|
| 50 |
-
|
| 51 |
|
| 52 |
```python
|
| 53 |
# ASR
|
|
@@ -59,5 +59,4 @@ pinned: false
|
|
| 59 |
tts = SonicTTS(voice="af_heart")
|
| 60 |
|
| 61 |
All benchmarks: Apple M3 Max 64 GB, macOS Sequoia, MLX 0.30.4. Built by https://huggingface.co/flight505.
|
| 62 |
-
|
| 63 |
-
Edit this `README.md` markdown file to author your organization card.
|
|
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
+
# Sonic Speech
|
| 11 |
|
| 12 |
Optimized speech models for Apple Silicon, powering [Sonic](https://github.com/flight505/sonic-workspace) β a local-first voice AI
|
| 13 |
system. All models run entirely on-device using [MLX](https://github.com/ml-explore/mlx). No cloud, no API keys, no data leaves your
|
| 14 |
Mac.
|
| 15 |
|
| 16 |
+
## ASR β Parakeet TDT (NVIDIA, ported to MLX)
|
| 17 |
|
| 18 |
SOTA English speech recognition with encoder-only mixed-precision quantization.
|
| 19 |
|
|
|
|
| 30 |
|
| 31 |
**v3** supports 25 languages. **v2** is English-only. **INT8 recommended** β zero WER loss, 40% smaller, 30% faster.
|
| 32 |
|
| 33 |
+
## TTS β Kokoro 82M (MLX)
|
| 34 |
|
| 35 |
Fast text-to-speech with 32+ voices (American, British, Japanese, Chinese).
|
| 36 |
|
|
|
|
| 38 |
|-------|------|------------|-------------|------------------|------|
|
| 39 |
| [kokoro-82m-bf16](https://huggingface.co/sonic-speech/kokoro-82m-bf16) | ~170 MB | 47 ms | 224 ms | 126 ms | 41x |
|
| 40 |
|
| 41 |
+
## Quantization Strategy
|
| 42 |
|
| 43 |
Only the Conformer encoder (~85% of params) is quantized β the decoder stays BF16 for token precision.
|
| 44 |
|
|
|
|
| 47 |
| INT8 | -40% | +30% | -58% | None |
|
| 48 |
| INT4 | -61% | +34% | -67% | +0.4pp on real speech |
|
| 49 |
|
| 50 |
+
## Quick Start
|
| 51 |
|
| 52 |
```python
|
| 53 |
# ASR
|
|
|
|
| 59 |
tts = SonicTTS(voice="af_heart")
|
| 60 |
|
| 61 |
All benchmarks: Apple M3 Max 64 GB, macOS Sequoia, MLX 0.30.4. Built by https://huggingface.co/flight505.
|
| 62 |
+
```
|
|
|