uZipVoice ONNX Models (Distilled)

ONNX models for uZipVoice - Unity implementation of ZipVoice, a lightweight zero-shot text-to-speech system using Flow Matching.

Model Description

These ONNX models are exported from ZipVoice-Distill (distilled version) for use with Unity AI Inference Engine (Sentis).

Zero-shot TTS: Generate speech in any voice using just a few seconds of reference audio
Fast Generation: Distilled model with embedded CFG enables high-quality synthesis in 4-8 steps
Lightweight: 123M parameters total
Distilled: CFG (Classifier-Free Guidance) is embedded, no double inference required

File	Description	Size
`text_encoder.onnx`	Text to condition vector	~17MB
`fm_decoder.onnx`	Flow Matching decoder (distilled)	~456MB
`vocos_opset15.onnx`	Vocoder (mel to waveform)	~52MB

Downloads last month: -; Downloads are not tracked for this model. How to track

Base model

Quantized

(1)

this model