flight505 commited on
Commit
5af3702
Β·
verified Β·
1 Parent(s): 2511174

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -4
README.md CHANGED
@@ -1,10 +1,63 @@
1
- ---
2
- title: README
3
- emoji: πŸ‘
4
- colorFrom: indigo
5
  colorTo: blue
6
  sdk: static
7
  pinned: false
8
  ---
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  Edit this `README.md` markdown file to author your organization card.
 
1
+ ---
2
+ title: Sonic Speech
3
+ emoji: 🎀
4
+ colorFrom: purple
5
  colorTo: blue
6
  sdk: static
7
  pinned: false
8
  ---
9
 
10
+ # Sonic Speech
11
+
12
+ Optimized speech models for Apple Silicon, powering [Sonic](https://github.com/flight505/sonic-workspace) β€” a local-first voice AI
13
+ system. All models run entirely on-device using [MLX](https://github.com/ml-explore/mlx). No cloud, no API keys, no data leaves your
14
+ Mac.
15
+
16
+ ## ASR β€” Parakeet TDT (NVIDIA, ported to MLX)
17
+
18
+ SOTA English speech recognition with encoder-only mixed-precision quantization.
19
+
20
+ | Model | Size | WER (LibriSpeech) | WER (TED-LIUM) | RTFx | Peak Memory |
21
+ |-------|------|-------------------|-----------------|------|-------------|
22
+ | [parakeet-tdt-0.6b-v3](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v3) | 1,254 MB | 0.82% | 15.1% | 73x | 3,002 MB |
23
+ | [parakeet-tdt-0.6b-v3-int8](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v3-int8) | 755 MB | 0.82% | 15.1% | 95x | 1,268
24
+ MB |
25
+ | [parakeet-tdt-0.6b-v3-int4](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v3-int4) | 489 MB | 0.82% | 15.5% | 98x | 1,003
26
+ MB |
27
+ | [parakeet-tdt-0.6b-v2](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v2) | 1,222 MB | β€” | β€” | β€” | β€” |
28
+ | [parakeet-tdt-0.6b-v2-int8](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v2-int8) | 736 MB | β€” | β€” | β€” | β€” |
29
+ | [parakeet-tdt-0.6b-v2-int4](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v2-int4) | 470 MB | β€” | β€” | β€” | β€” |
30
+
31
+ **v3** supports 25 languages. **v2** is English-only. **INT8 recommended** β€” zero WER loss, 40% smaller, 30% faster.
32
+
33
+ ## TTS β€” Kokoro 82M (MLX)
34
+
35
+ Fast text-to-speech with 32+ voices (American, British, Japanese, Chinese).
36
+
37
+ | Model | Size | Short Text | Medium Text | TTFC (streaming) | RTFx |
38
+ |-------|------|------------|-------------|------------------|------|
39
+ | [kokoro-82m-bf16](https://huggingface.co/sonic-speech/kokoro-82m-bf16) | ~170 MB | 47 ms | 224 ms | 126 ms | 41x |
40
+
41
+ ## Quantization Strategy
42
+
43
+ Only the Conformer encoder (~85% of params) is quantized β€” the decoder stays BF16 for token precision.
44
+
45
+ | Variant | Size | Speed | Memory | WER Impact |
46
+ |---------|------|-------|--------|------------|
47
+ | INT8 | -40% | +30% | -58% | None |
48
+ | INT4 | -61% | +34% | -67% | +0.4pp on real speech |
49
+
50
+ ## Quick Start
51
+
52
+ ```python
53
+ # ASR
54
+ from parakeet import from_pretrained
55
+ model = from_pretrained("sonic-speech/parakeet-tdt-0.6b-v3-int8")
56
+
57
+ # TTS
58
+ from sonic_tts import SonicTTS
59
+ tts = SonicTTS(voice="af_heart")
60
+
61
+ All benchmarks: Apple M3 Max 64 GB, macOS Sequoia, MLX 0.30.4. Built by https://huggingface.co/flight505.
62
+
63
  Edit this `README.md` markdown file to author your organization card.