File size: 2,845 Bytes

---
license: openrail
library_name: mlx
pipeline_tag: text-to-speech
base_model:
- Supertone/supertonic-3
tags:
- mlx
- apple-silicon
- text-to-speech
- on-device
- audio
language:
- multilingual
---

Part of the [Supertonic 3 MLX](https://huggingface.co/collections/mlx-community/supertonic-3-6a15767066e3067422a932d3) collection.

# Supertonic 3 (MLX)

Apple MLX graph-runtime conversion of [Supertone/supertonic-3](https://huggingface.co/Supertone/supertonic-3), a compact multilingual TTS model distributed by upstream as ONNX assets.

## TL;DR

| | |
|---|---|
| **Format** | JSON graph topology + NPZ initializers |
| **Runtime** | [`ailuntx/supertonic-mlx`](https://github.com/ailuntx/supertonic-mlx) |
| **Official code** | [`supertone-inc/supertonic`](https://github.com/supertone-inc/supertonic) |
| **Sample rate** | 44.1 kHz |
| **HF Space** | [`mlx-community/supertonic-3`](https://huggingface.co/spaces/mlx-community/supertonic-3) |
| **Hardware** | Runs on HF Linux CPU fallback; Apple Silicon recommended locally |

## Quick Start

```bash
hf download mlx-community/supertonic-3 --local-dir ./models/supertonic-3

git clone https://github.com/ailuntx/supertonic-mlx.git
cd supertonic-mlx
python -m venv .venv
.venv/bin/pip install mlx soundfile numpy

.venv/bin/python scripts/infer_mlx.py \
  --model ./models/supertonic-3 \
  --text "Supertonic 3 is running with MLX." \
  --lang en \
  --voice M1 \
  --total-step 8 \
  --output output.wav
```

## Layout

```text
supertonic-3/
├── README.md
├── mlx_manifest.json
├── graphs/
├── weights/
└── voice_styles/
```

## Conversion Notes

| Component | Source | MLX handling |
|---|---|---|
| ONNX graphs | `Supertone/supertonic-3` | graph topology exported to JSON |
| initializers | official ONNX assets | saved as NPZ arrays |
| runtime ops | Supertonic ONNX subset | implemented in `ailuntx/supertonic-mlx` with MLX arrays |

## Validation

The MLX graph runtime has been checked against ONNX Runtime on the official assets; per-stage maximum absolute errors are around `1e-5`. The HF Space API has returned audio successfully with real wall-time status reporting.

## License

Model license follows the upstream Supertonic 3 model card (`openrail`).

## Citation

```bibtex
@misc{supertonic-mlx,
  title  = {supertonic-mlx: Apple MLX port of Supertonic 3},
  author = {ailuntx},
  year   = {2026},
  url    = {https://github.com/ailuntx/supertonic-mlx},
}

@article{kim2025supertonic,
  title   = {SupertonicTTS: Towards Highly Efficient and Streamlined Text-to-Speech System},
  author  = {Kim, Hyeongju and Yang, Jinhyeok and Yu, Yechan and Ji, Seunghun and Morton, Jacob and Bous, Frederik and Byun, Joon and Lee, Juheon},
  journal = {arXiv preprint arXiv:2503.23108},
  year    = {2025},
  url     = {https://arxiv.org/abs/2503.23108},
}
```