OpenVoice ONNX Models (v2)

This repository contains highly optimized ONNX models for OpenVoice v2 (Voice Cloning and Tone Color Conversion), specifically prepared for high-performance deployment in C#, C++, Python, or Rust environments using ONNX Runtime.

These models operate on a Zero-Allocation principle when used correctly, making them ideal for high-load server environments.

πŸ“¦ Repository Contents

  • tone_extract.onnx β€” Model for extracting a 256-dimensional voice fingerprint (tone embedding) from an audio spectrogram.
  • tone_color.onnx β€” Model for transferring the voice characteristics (Latent Space Blending) from a source embedding to a destination embedding.
  • tone_config.json β€” Hyperparameters and structural configuration of the models.

πŸ› οΈ Technical Specifications & Tensor Shapes

If you are writing your own custom inference engine, use the following I/O specifications:

1. Tone Extractor (tone_extract.onnx)

Input:

  • input: Float [1, frames, 513] β€” Linear magnitude spectrogram of the source audio. (Hop length: 256, Win length: 1024, Sample rate: 22050).

Output:

  • tone_embedding: Float [1, 256] β€” The extracted voice fingerprint.

2. Tone Color Converter (tone_color.onnx)

Inputs:

  • audio: Float [1, 513, frames] β€” Linear magnitude spectrogram of the generated base audio (Note: axes are swapped compared to the extractor).
  • audio_length: Int64 [1] β€” Number of frames in the spectrogram.
  • src_tone: Float [1, 256, 1] β€” Tone embedding of the source audio (e.g., base TTS model voice).
  • dest_tone: Float [1, 256, 1] β€” Tone embedding of the target cloned voice.
  • tau: Float [1] β€” Temperature parameter (default is 1.0f). Lowering this value (e.g., 0.8f) can smooth out high-frequency artifacts if the target recording was noisy.

Output:

  • converted_audio: Float [length] β€” The final cloned audio waveform as a 1D array of PCM float samples (Sample rate: 22050).

πŸš€ Use Case Example

This repository is designed to be plug-and-play. You can automatically fetch these models at application startup. Example base URL for raw downloads: https://huggingface.co/Hinotsuba/OpenVoice-ONNX-v2/resolve/main/{filename}

βš–οΈ License

These models are released under the MIT License, following the official license update of the OpenVoice v2 framework. Free for commercial use.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support