---
license: openrail
language:
    - en
    - ko
    - es
    - pt
    - fr
pipeline_tag: text-to-speech
tags:
    - text-to-speech
    - speech-synthesis
    - tts
    - mlx
    - mlx-audio
library_name: mlx-audio
base_model: Supertone/supertonic-2
---

# Supertonic-2 (MLX)

**Supertonic-2-MLX** is a pure-MLX port of
[Supertone/supertonic-2](https://huggingface.co/Supertone/supertonic-2),
a lightning-fast on-device TTS system. It runs natively on Apple Silicon
through [`mlx-audio`](https://github.com/typomonster/mlx-audio) — no ONNX
Runtime, no Python inference server, just `mx.load` + Metal.

- **66M params**, 4 sub-models (duration predictor, text encoder,
  flow-matching vector estimator, Vocos-style vocoder).
- **5 languages**: English, Korean, Spanish, Portuguese, French.
- **10 preset voices**: `M1`–`M5` (male), `F1`–`F5` (female).
- **44.1 kHz** output, ~0.03 RTF on M4 Pro with 5 Euler steps.
- **float32 parity** with the upstream ONNX Runtime pipeline.

## Install

Supertonic support hasn't been upstreamed to `mlx-audio` yet — install the
fork [`typomonster/mlx-audio`](https://github.com/typomonster/mlx-audio):

```bash
pip install git+https://github.com/typomonster/mlx-audio.git
```

## Quick start

```python
from mlx_audio.tts import load

# Downloads this repo on first run and caches under ~/.cache/huggingface/.
model = load("typomonster/supertonic-2-mlx")

for r in model.generate("Hello world.", voice="M1", lang="en"):
    # r.audio is an mx.array at model.sample_rate (44100 Hz)
    print(r.samples, r.real_time_factor)
```

## Save to WAV

```python
import numpy as np, soundfile as sf
from mlx_audio.tts import load

model = load("typomonster/supertonic-2-mlx")
pieces = [np.asarray(r.audio) for r in
          model.generate("오늘 날씨가 정말 좋네요.", voice="F1", lang="ko")]
wav = np.concatenate(pieces) if len(pieces) > 1 else pieces[0]
sf.write("out.wav", wav, model.sample_rate)
```

## Multi-language, multi-voice

```python
from mlx_audio.tts import load

model = load("typomonster/supertonic-2-mlx")

cases = [
    ("en", "M1", "The quick brown fox jumps over the lazy dog."),
    ("ko", "F1", "말듣쓰는 음성 비서입니다."),
    ("es", "F3", "Hola, ¿cómo estás hoy?"),
    ("pt", "M2", "Bom dia, tudo bem?"),
    ("fr", "F5", "Bonjour, comment ça va ?"),
]
for lang, voice, text in cases:
    for r in model.generate(text, voice=voice, lang=lang):
        print(lang, voice, r.samples, r.real_time_factor)
```

## Performance

Measured on **Apple M1 Max** with 5 Euler steps, post-warmup:

| Input                                     | Audio  | Wall  | RTF    |
| ----------------------------------------- | ------ | ----- | ------ |
| `"Hello world."` (en, M1)                 | 1.46 s | 42 ms | 0.029× |
| `"오늘 아침 공원을 산책했어요."` (ko, F1) | 2.63 s | 47 ms | 0.018× |

Lower RTF is better (<1× means faster than real-time).

More audio samples generated with MLX:
<https://github.com/typomonster/mlx-audio/tree/main/docs/supertonic>

## Generation options

```python
model.generate(
    text,
    voice="M1",           # one of M1–M5, F1–F5
    lang="en",            # en | ko | es | pt | fr
    speed=1.05,           # >1 speaks faster (scales predicted duration)
    steps=5,              # Euler steps; more = higher quality, slower
    seed=0,               # deterministic given the same seed + input
    chunk_max_len=None,   # override default (ko=120 chars, others=300)
    silence_between_chunks=0.3,  # seconds between chunks in long texts
)
```

## Files
- `config.json` — mlx-audio model config
- `{duration_predictor,text_encoder,vector_estimator,vocoder}.safetensors` — MLX weights
- `unicode_indexer.json`, `voice_styles/*.json` — runtime assets
- `tts.json` — upstream pipeline config (preserved for reference)

## References
- Upstream model: [Supertone/supertonic-2](https://huggingface.co/Supertone/supertonic-2)
- Upstream code: [supertone-inc/supertonic](https://github.com/supertone-inc/supertonic) · fork with MLX integration: [typomonster/supertonic](https://github.com/typomonster/supertonic)
- mlx-audio (with Supertonic support): [typomonster/mlx-audio](https://github.com/typomonster/mlx-audio) · upstream: [Blaizzy/mlx-audio](https://github.com/Blaizzy/mlx-audio)

## License
OpenRAIL-M (inherited from the upstream model). See `LICENSE` for the full
terms — redistribution must carry the use-based restrictions (Attachment A)
forward.