File size: 2,845 Bytes
100bd71
358ac80
100bd71
31d1148
 
 
100bd71
 
31d1148
100bd71
 
 
 
31d1148
100bd71
 
31d1148
100bd71
31d1148
100bd71
31d1148
100bd71
31d1148
100bd71
31d1148
 
 
 
 
 
 
 
100bd71
 
 
 
31d1148
100bd71
31d1148
 
 
6d46447
 
 
 
 
 
 
 
 
100bd71
 
31d1148
100bd71
31d1148
 
 
 
 
 
 
 
100bd71
31d1148
100bd71
31d1148
 
 
 
 
100bd71
31d1148
100bd71
31d1148
100bd71
c6390b0
100bd71
c6390b0
 
 
 
464f037
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
license: openrail
library_name: mlx
pipeline_tag: text-to-speech
base_model:
- Supertone/supertonic-3
tags:
- mlx
- apple-silicon
- text-to-speech
- on-device
- audio
language:
- multilingual
---

Part of the [Supertonic 3 MLX](https://huggingface.co/collections/mlx-community/supertonic-3-6a15767066e3067422a932d3) collection.

# Supertonic 3 (MLX)

Apple MLX graph-runtime conversion of [Supertone/supertonic-3](https://huggingface.co/Supertone/supertonic-3), a compact multilingual TTS model distributed by upstream as ONNX assets.

## TL;DR

| | |
|---|---|
| **Format** | JSON graph topology + NPZ initializers |
| **Runtime** | [`ailuntx/supertonic-mlx`](https://github.com/ailuntx/supertonic-mlx) |
| **Official code** | [`supertone-inc/supertonic`](https://github.com/supertone-inc/supertonic) |
| **Sample rate** | 44.1 kHz |
| **HF Space** | [`mlx-community/supertonic-3`](https://huggingface.co/spaces/mlx-community/supertonic-3) |
| **Hardware** | Runs on HF Linux CPU fallback; Apple Silicon recommended locally |

## Quick Start

```bash
hf download mlx-community/supertonic-3 --local-dir ./models/supertonic-3

git clone https://github.com/ailuntx/supertonic-mlx.git
cd supertonic-mlx
python -m venv .venv
.venv/bin/pip install mlx soundfile numpy

.venv/bin/python scripts/infer_mlx.py \
  --model ./models/supertonic-3 \
  --text "Supertonic 3 is running with MLX." \
  --lang en \
  --voice M1 \
  --total-step 8 \
  --output output.wav
```

## Layout

```text
supertonic-3/
β”œβ”€β”€ README.md
β”œβ”€β”€ mlx_manifest.json
β”œβ”€β”€ graphs/
β”œβ”€β”€ weights/
└── voice_styles/
```

## Conversion Notes

| Component | Source | MLX handling |
|---|---|---|
| ONNX graphs | `Supertone/supertonic-3` | graph topology exported to JSON |
| initializers | official ONNX assets | saved as NPZ arrays |
| runtime ops | Supertonic ONNX subset | implemented in `ailuntx/supertonic-mlx` with MLX arrays |

## Validation

The MLX graph runtime has been checked against ONNX Runtime on the official assets; per-stage maximum absolute errors are around `1e-5`. The HF Space API has returned audio successfully with real wall-time status reporting.

## License

Model license follows the upstream Supertonic 3 model card (`openrail`).

## Citation

```bibtex
@misc{supertonic-mlx,
  title  = {supertonic-mlx: Apple MLX port of Supertonic 3},
  author = {ailuntx},
  year   = {2026},
  url    = {https://github.com/ailuntx/supertonic-mlx},
}

@article{kim2025supertonic,
  title   = {SupertonicTTS: Towards Highly Efficient and Streamlined Text-to-Speech System},
  author  = {Kim, Hyeongju and Yang, Jinhyeok and Yu, Yechan and Ji, Seunghun and Morton, Jacob and Bous, Frederik and Byun, Joon and Lee, Juheon},
  journal = {arXiv preprint arXiv:2503.23108},
  year    = {2025},
  url     = {https://arxiv.org/abs/2503.23108},
}
```