|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: |
|
|
- lj1995/VoiceConversionWebUI |
|
|
- facebook/hubert-base-ls960 |
|
|
pipeline_tag: feature-extraction |
|
|
library_name: fairseq |
|
|
tags: |
|
|
- rvc |
|
|
- audio |
|
|
--- |
|
|
|
|
|
# Hubert Base ONNX Model for Voice Conversion |
|
|
|
|
|
This is the **ONNX-exported version of the Hubert Base model**, fine-tuned for voice conversion and compatible with modern inference pipelines. This model allows fast and efficient audio processing in ONNX runtime environments. |
|
|
|
|
|
It builds upon the following models: |
|
|
- [lj1995/VoiceConversionWebUI](https://huggingface.co/lj1995/VoiceConversionWebUI) |
|
|
- [facebook/hubert-base-ls960](https://huggingface.co/facebook/hubert-base-ls960) |
|
|
|
|
|
--- |
|
|
|
|
|
## Features |
|
|
|
|
|
- Converts audio features into high-quality embeddings for voice conversion tasks. |
|
|
- Fully ONNX-compatible for optimized inference on CPUs and GPUs. |
|
|
- Lightweight and easy to integrate in custom voice processing pipelines. |
|
|
- No extra requirements needed, just **numpy** and **onnxruntime** |
|
|
|
|
|
## ONNX Model Report |
|
|
|
|
|
**Model:** `hubert_base.onnx` |
|
|
**Producer:** pytorch 2.0.0 |
|
|
**IR Version:** 8 |
|
|
**Opsets:** ai.onnx:18 |
|
|
**Parameters:** 94,370,816 |
|
|
|
|
|
--- |
|
|
|
|
|
### 🟦 Inputs |
|
|
- **source** | type: `float32` | shape: [batch_size, sequence_length] |
|
|
- *Waveform PCM 32 - SR 16,000 - Mono* |
|
|
- **padding_mask** | type: `bool` | shape: [batch_size, sequence_length] |
|
|
- It is usually a completely false array, with the same shape as the waveform. `padding_mask = np.zeros(waveform.shape, dtype=np.bool_)` |
|
|
|
|
|
### 🟩 Outputs |
|
|
- **features** | type: `float32` | shape: [batch_size, sequence_length, 768 ] |
|
|
|
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import numpy as np |
|
|
import onnxruntime as ort |
|
|
|
|
|
class OnnxHubert: |
|
|
""" |
|
|
Class to load and run the ONNX model exported by Hubert. |
|
|
|
|
|
Attributes: |
|
|
session (ort.InferenceSession): The ONNX Runtime session. |
|
|
input_name (str): The name of the input node. |
|
|
output_name (str): The name of the output node. |
|
|
|
|
|
Methods: |
|
|
extract_features_batch (source, padding_mask): Run the ONNX model and extract features from the batch. |
|
|
extract_features (source, padding_mask): Run the ONNX model and extract features from a single input. |
|
|
""" |
|
|
def __init__(self, model_path: str, thread_num: int = None): |
|
|
""" |
|
|
Initialize the OnnxHubert object. |
|
|
|
|
|
Parameters: |
|
|
model_path (str): The path to the ONNX model file. |
|
|
thread_num (int, optional): The number of threads to use for inference. Defaults to None. |
|
|
|
|
|
Attributes: |
|
|
session (ort.InferenceSession): The ONNX Runtime session. |
|
|
input_name (str): The name of the input node. |
|
|
output_name (str): The name of the output node. |
|
|
""" |
|
|
self.session = ort.InferenceSession(model_path) |
|
|
|
|
|
self.input_name = self.session.get_inputs()[0].name |
|
|
self.output_name = self.session.get_outputs()[0].name |
|
|
def extract_features( |
|
|
self, |
|
|
source: np.ndarray, |
|
|
padding_mask: np.ndarray |
|
|
) -> np.ndarray: |
|
|
""" |
|
|
Extract features from the batch using the ONNX model. |
|
|
|
|
|
Inputs: |
|
|
source: ndarray of shape (batch_size, sequence_length) float32 |
|
|
padding_mask: ndarray of shape (batch_size, sequence_length) bool |
|
|
|
|
|
Returns: |
|
|
ndarray of shape (D, 768) with the extracted features |
|
|
""" |
|
|
result = self.session.run(None, { |
|
|
"source": source, |
|
|
"padding_mask": padding_mask |
|
|
}) |
|
|
return result[0] |
|
|
``` |
|
|
|
|
|
## Installation |
|
|
|
|
|
You can install the required libraries with: |
|
|
|
|
|
```bash |
|
|
pip install onnxruntime numpy |
|
|
``` |