OpenVoice ONNX Models (v2)
This repository contains highly optimized ONNX models for OpenVoice v2 (Voice Cloning and Tone Color Conversion), specifically prepared for high-performance deployment in C#, C++, Python, or Rust environments using ONNX Runtime.
These models operate on a Zero-Allocation principle when used correctly, making them ideal for high-load server environments.
π¦ Repository Contents
tone_extract.onnxβ Model for extracting a 256-dimensional voice fingerprint (tone embedding) from an audio spectrogram.tone_color.onnxβ Model for transferring the voice characteristics (Latent Space Blending) from a source embedding to a destination embedding.tone_config.jsonβ Hyperparameters and structural configuration of the models.
π οΈ Technical Specifications & Tensor Shapes
If you are writing your own custom inference engine, use the following I/O specifications:
1. Tone Extractor (tone_extract.onnx)
Input:
input:Float [1, frames, 513]β Linear magnitude spectrogram of the source audio. (Hop length: 256, Win length: 1024, Sample rate: 22050).
Output:
tone_embedding:Float [1, 256]β The extracted voice fingerprint.
2. Tone Color Converter (tone_color.onnx)
Inputs:
audio:Float [1, 513, frames]β Linear magnitude spectrogram of the generated base audio (Note: axes are swapped compared to the extractor).audio_length:Int64 [1]β Number of frames in the spectrogram.src_tone:Float [1, 256, 1]β Tone embedding of the source audio (e.g., base TTS model voice).dest_tone:Float [1, 256, 1]β Tone embedding of the target cloned voice.tau:Float [1]β Temperature parameter (default is1.0f). Lowering this value (e.g.,0.8f) can smooth out high-frequency artifacts if the target recording was noisy.
Output:
converted_audio:Float [length]β The final cloned audio waveform as a 1D array of PCM float samples (Sample rate: 22050).
π Use Case Example
This repository is designed to be plug-and-play. You can automatically fetch these models at application startup. Example base URL for raw downloads:
https://huggingface.co/Hinotsuba/OpenVoice-ONNX-v2/resolve/main/{filename}
βοΈ License
These models are released under the MIT License, following the official license update of the OpenVoice v2 framework. Free for commercial use.