YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Soprano 1.1 80M - MLX Format

Ultra-fast text-to-speech model converted to MLX format for Apple Silicon.

Model Description

Soprano 1.1 is an improved version of the Soprano TTS model featuring:

  • 80M parameters - Compact yet powerful
  • 32kHz audio output - High quality synthesis
  • Real-time streaming - Ultra-low latency generation
  • Improved decoder - 768-dim decoder (up from 512) for better audio quality

Architecture

  • LLM Backbone: Qwen3-based, 17 layers, 512 hidden size
  • Audio Decoder: Vocos-based with ConvNeXt blocks, 768-dim
  • Sample Rate: 32,000 Hz
  • Samples per Token: 2,048

Files

File Description
model.safetensors LLM weights (converted to camelCase keys)
decoder.safetensors Vocos decoder weights
tokenizer.json Tokenizer vocabulary
config.json Model configuration

Usage

This model is designed for use with VoiceKit on macOS. The weights have been converted from the original PyTorch format with:

  • BFloat16 โ†’ Float32 conversion
  • snake_case โ†’ camelCase key renaming for Swift compatibility

License

Apache 2.0

Credits

Based on Soprano-80M by SWivid.

Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support