Spaces:

sonic-speech
/

README

Running

App Files Files Community

README / README.md

flight505

Fix: strip trailing whitespace from YAML frontmatter

f813cc9 verified 8 days ago

preview code

raw

history blame contribute delete

2.64 kB

	---
	title: Sonic Speech
	emoji: 🎤
	colorFrom: purple
	colorTo: blue
	sdk: static
	pinned: false
	---

	# Sonic Speech

	Optimized speech models for Apple Silicon, powering [Sonic](https://github.com/flight505/sonic-workspace) — a local-first voice AI
	system. All models run entirely on-device using [MLX](https://github.com/ml-explore/mlx). No cloud, no API keys, no data leaves your
	Mac.

	## ASR — Parakeet TDT (NVIDIA, ported to MLX)

	SOTA English speech recognition with encoder-only mixed-precision quantization.

	\| Model \| Size \| WER (LibriSpeech) \| WER (TED-LIUM) \| RTFx \| Peak Memory \|
	\|-------\|------\|-------------------\|-----------------\|------\|-------------\|
	\| [parakeet-tdt-0.6b-v3](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v3) \| 1,254 MB \| 0.82% \| 15.1% \| 73x \| 3,002 MB \|
	\| [parakeet-tdt-0.6b-v3-int8](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v3-int8) \| 755 MB \| 0.82% \| 15.1% \| 95x \| 1,268
	MB \|
	\| [parakeet-tdt-0.6b-v3-int4](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v3-int4) \| 489 MB \| 0.82% \| 15.5% \| 98x \| 1,003
	MB \|
	\| [parakeet-tdt-0.6b-v2](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v2) \| 1,222 MB \| — \| — \| — \| — \|
	\| [parakeet-tdt-0.6b-v2-int8](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v2-int8) \| 736 MB \| — \| — \| — \| — \|
	\| [parakeet-tdt-0.6b-v2-int4](https://huggingface.co/sonic-speech/parakeet-tdt-0.6b-v2-int4) \| 470 MB \| — \| — \| — \| — \|

	v3 supports 25 languages. v2 is English-only. INT8 recommended — zero WER loss, 40% smaller, 30% faster.

	## TTS — Kokoro 82M (MLX)

	Fast text-to-speech with 32+ voices (American, British, Japanese, Chinese).

	\| Model \| Size \| Short Text \| Medium Text \| TTFC (streaming) \| RTFx \|
	\|-------\|------\|------------\|-------------\|------------------\|------\|
	\| [kokoro-82m-bf16](https://huggingface.co/sonic-speech/kokoro-82m-bf16) \| ~170 MB \| 47 ms \| 224 ms \| 126 ms \| 41x \|

	## Quantization Strategy

	Only the Conformer encoder (~85% of params) is quantized — the decoder stays BF16 for token precision.

	\| Variant \| Size \| Speed \| Memory \| WER Impact \|
	\|---------\|------\|-------\|--------\|------------\|
	\| INT8 \| -40% \| +30% \| -58% \| None \|
	\| INT4 \| -61% \| +34% \| -67% \| +0.4pp on real speech \|

	## Quick Start

	```python
	# ASR
	from parakeet import from_pretrained
	model = from_pretrained("sonic-speech/parakeet-tdt-0.6b-v3-int8")

	# TTS
	from sonic_tts import SonicTTS
	tts = SonicTTS(voice="af_heart")

	All benchmarks: Apple M3 Max 64 GB, macOS Sequoia, MLX 0.30.4. Built by https://huggingface.co/flight505.
	```