--- license: apache-2.0 tags: - text-to-speech - tts - onnx - voice-cloning - browser - webassembly - webgpu language: - en - de - zh - ja - fr - es - multilingual library_name: onnxruntime base_model: k2-fsa/OmniVoice --- # VocoLoco — OmniVoice ONNX Models ONNX exports of [k2-fsa/OmniVoice](https://github.com/k2-fsa/OmniVoice) for browser-based text-to-speech inference via ONNX Runtime Web. ## Models | File | Size | Description | |------|------|-------------| | `omnivoice-main-split.onnx` + `_data_00`-`_04` | 2.3 GB | Main TTS model (FP32, sharded) | | `omnivoice-main-int8.onnx` | 586 MB | Main TTS model (INT8 quantized, for mobile/low-memory) | | `omnivoice-decoder.onnx` | 83 MB | Audio token decoder (tokens to waveform) | | `omnivoice-encoder-fixed.onnx` | 624 MB | Audio encoder for voice cloning | | `tokenizer.json` | 11 MB | Qwen2 BPE text tokenizer | ## Usage These models are designed to run in the browser via [VocoLoco](https://github.com/YOUR_USERNAME/vocoloco), a fully client-side TTS application. No server required. ## Architecture - **Backbone**: Qwen3-0.6B (28 transformer layers) - **Audio codec**: HiggsAudioV2 (8 codebooks, 24kHz output) - **Generation**: Iterative masked diffusion (configurable 8-32 steps) - **Voice cloning**: Zero-shot via reference audio encoding - **Voice design**: Text-based control (gender, pitch, accent) ## License Apache 2.0 — same as the original OmniVoice model. ## Attribution Based on [OmniVoice](https://github.com/k2-fsa/OmniVoice) by Xiaomi Corp (k2-fsa).