uZipVoice ONNX Models (Distilled)
ONNX models for uZipVoice - Unity implementation of ZipVoice, a lightweight zero-shot text-to-speech system using Flow Matching.
Model Description
These ONNX models are exported from ZipVoice-Distill (distilled version) for use with Unity AI Inference Engine (Sentis).
- Zero-shot TTS: Generate speech in any voice using just a few seconds of reference audio
- Fast Generation: Distilled model with embedded CFG enables high-quality synthesis in 4-8 steps
- Lightweight: 123M parameters total
- Distilled: CFG (Classifier-Free Guidance) is embedded, no double inference required
Files
| File | Description | Size |
|---|---|---|
text_encoder.onnx |
Text to condition vector | ~17MB |
fm_decoder.onnx |
Flow Matching decoder (distilled) | ~456MB |
vocos_opset15.onnx |
Vocoder (mel to waveform) | ~52MB |
Recommended Settings
- Steps: 4-8 (distilled model, default: 8)
- Sample Rate: 24kHz
- Mel Dimensions: 100
Model tree for ayousanz/uZipVoice-onnx
Base model
k2-fsa/ZipVoice