YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Soprano 1.1 80M - MLX Format
Ultra-fast text-to-speech model converted to MLX format for Apple Silicon.
Model Description
Soprano 1.1 is an improved version of the Soprano TTS model featuring:
- 80M parameters - Compact yet powerful
- 32kHz audio output - High quality synthesis
- Real-time streaming - Ultra-low latency generation
- Improved decoder - 768-dim decoder (up from 512) for better audio quality
Architecture
- LLM Backbone: Qwen3-based, 17 layers, 512 hidden size
- Audio Decoder: Vocos-based with ConvNeXt blocks, 768-dim
- Sample Rate: 32,000 Hz
- Samples per Token: 2,048
Files
| File | Description |
|---|---|
model.safetensors |
LLM weights (converted to camelCase keys) |
decoder.safetensors |
Vocos decoder weights |
tokenizer.json |
Tokenizer vocabulary |
config.json |
Model configuration |
Usage
This model is designed for use with VoiceKit on macOS. The weights have been converted from the original PyTorch format with:
- BFloat16 โ Float32 conversion
- snake_case โ camelCase key renaming for Swift compatibility
License
Apache 2.0
Credits
Based on Soprano-80M by SWivid.
- Downloads last month
- 12
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support