| # mlx-community/supertonic-2 | |
| This model was converted to MLX format from [`Supertone/supertonic-2`](https://huggingface.co/Supertone/supertonic-2) using mlx-audio version **0.2.8**. | |
| SuperTonic 2 is a high-quality text-to-speech model with voice style control. | |
| ## Use with mlx-audio | |
| ```bash | |
| pip install -U mlx-audio | |
| ``` | |
| ### CLI Example: | |
| ```bash | |
| mlx_audio.tts.generate --model mlx-community/supertonic-2 --text "Hello, this is a test." --voice M1 | |
| ``` | |
| ### Python Example: | |
| ```python | |
| from mlx_audio.tts.utils import load_model | |
| model = load_model("mlx-community/supertonic-2") | |
| for result in model.generate("Hello, this is a test.", voice="M1"): | |
| print(f"Generated {result.audio_duration} of audio") | |
| ``` | |
| ## Model Details | |
| - **Architecture**: Text encoder + Duration predictor + Flow matching (vector field) + Vocoder | |
| - **Sample rate**: 44100 Hz | |
| - **Voices**: M1-M5, F1-F5 (10 built-in voice styles) | |
| - **Latent dim**: 24 (compressed to 144 via chunking) | |
| - **Flow matching steps**: 10 (configurable) | |