Soprano 1.1 80M - MLX Format

LLM Backbone: Qwen3-based, 17 layers, 512 hidden size
Audio Decoder: Vocos-based with ConvNeXt blocks, 768-dim
Sample Rate: 32,000 Hz
Samples per Token: 2,048

Ultra-fast text-to-speech model converted to MLX format for Apple Silicon.

Model Description

Soprano 1.1 is an improved version of the Soprano TTS model featuring:

File	Description
`model.safetensors`	LLM weights (converted to camelCase keys)
`decoder.safetensors`	Vocos decoder weights
`tokenizer.json`	Tokenizer vocabulary
`config.json`	Model configuration

This model is designed for use with VoiceKit on macOS. The weights have been converted from the original PyTorch format with:

Apache 2.0

Based on Soprano-80M by SWivid.

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support