uZipVoice ONNX Models (Distilled)

ONNX models for uZipVoice - Unity implementation of ZipVoice, a lightweight zero-shot text-to-speech system using Flow Matching.

Model Description

These ONNX models are exported from ZipVoice-Distill (distilled version) for use with Unity AI Inference Engine (Sentis).

  • Zero-shot TTS: Generate speech in any voice using just a few seconds of reference audio
  • Fast Generation: Distilled model with embedded CFG enables high-quality synthesis in 4-8 steps
  • Lightweight: 123M parameters total
  • Distilled: CFG (Classifier-Free Guidance) is embedded, no double inference required

Files

File Description Size
text_encoder.onnx Text to condition vector ~17MB
fm_decoder.onnx Flow Matching decoder (distilled) ~456MB
vocos_opset15.onnx Vocoder (mel to waveform) ~52MB

Recommended Settings

  • Steps: 4-8 (distilled model, default: 8)
  • Sample Rate: 24kHz
  • Mel Dimensions: 100
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ayousanz/uZipVoice-onnx

Base model

k2-fsa/ZipVoice
Quantized
(1)
this model