OpenVoice ONNX Models (v2)

This repository contains highly optimized ONNX models for OpenVoice v2 (Voice Cloning and Tone Color Conversion), specifically prepared for high-performance deployment in C#, C++, Python, or Rust environments using ONNX Runtime.

These models operate on a Zero-Allocation principle when used correctly, making them ideal for high-load server environments.

📦 Repository Contents

tone_extract.onnx — Model for extracting a 256-dimensional voice fingerprint (tone embedding) from an audio spectrogram.
tone_color.onnx — Model for transferring the voice characteristics (Latent Space Blending) from a source embedding to a destination embedding.
tone_config.json — Hyperparameters and structural configuration of the models.

🛠️ Technical Specifications & Tensor Shapes

If you are writing your own custom inference engine, use the following I/O specifications:

1. Tone Extractor (`tone_extract.onnx`)

Input:

input: Float [1, frames, 513] — Linear magnitude spectrogram of the source audio. (Hop length: 256, Win length: 1024, Sample rate: 22050).

Output:

tone_embedding: Float [1, 256] — The extracted voice fingerprint.

2. Tone Color Converter (`tone_color.onnx`)

Inputs:

audio: Float [1, 513, frames] — Linear magnitude spectrogram of the generated base audio (Note: axes are swapped compared to the extractor).
audio_length: Int64 [1] — Number of frames in the spectrogram.
src_tone: Float [1, 256, 1] — Tone embedding of the source audio (e.g., base TTS model voice).
dest_tone: Float [1, 256, 1] — Tone embedding of the target cloned voice.
tau: Float [1] — Temperature parameter (default is 1.0f). Lowering this value (e.g., 0.8f) can smooth out high-frequency artifacts if the target recording was noisy.

Output:

converted_audio: Float [length] — The final cloned audio waveform as a 1D array of PCM float samples (Sample rate: 22050).

🚀 Use Case Example

This repository is designed to be plug-and-play. You can automatically fetch these models at application startup. Example base URL for raw downloads: https://huggingface.co/Hinotsuba/OpenVoice-ONNX-v2/resolve/main/{filename}

⚖️ License

These models are released under the MIT License, following the official license update of the OpenVoice v2 framework. Free for commercial use.

Downloads last month: -; Downloads are not tracked for this model. How to track

Hinotsuba
/

OpenVoice-ONNX-v2

OpenVoice ONNX Models (v2)

📦 Repository Contents

🛠️ Technical Specifications & Tensor Shapes

1. Tone Extractor (`tone_extract.onnx`)

2. Tone Color Converter (`tone_color.onnx`)

🚀 Use Case Example

⚖️ License

Space using Hinotsuba/OpenVoice-ONNX-v2 1

OpenVoice ONNX Models (v2)

📦 Repository Contents

🛠️ Technical Specifications & Tensor Shapes

1. Tone Extractor (tone_extract.onnx)

2. Tone Color Converter (tone_color.onnx)

🚀 Use Case Example

⚖️ License

Space using Hinotsuba/OpenVoice-ONNX-v2 1

1. Tone Extractor (`tone_extract.onnx`)

2. Tone Color Converter (`tone_color.onnx`)