MidFord327
/

Hubert-Base-ONNX

+---
+license: apache-2.0
+base_model:
+- lj1995/VoiceConversionWebUI
+- facebook/hubert-base-ls960
+pipeline_tag: audio-classification
+library_name: fairseq
+tags:
+- rvc
+- audio
+---
+# Hubert Base ONNX Model for Voice Conversion
+This is the **ONNX-exported version of the Hubert Base model**, fine-tuned for voice conversion and compatible with modern inference pipelines. This model allows fast and efficient audio processing in ONNX runtime environments.
+It builds upon the following models:
+- [lj1995/VoiceConversionWebUI](https://huggingface.co/lj1995/VoiceConversionWebUI)
+- [facebook/hubert-base-ls960](https://huggingface.co/facebook/hubert-base-ls960)
+---
+## Features
+- Converts audio features into high-quality embeddings for voice conversion tasks.
+- Fully ONNX-compatible for optimized inference on CPUs and GPUs.
+- Lightweight and easy to integrate in custom voice processing pipelines.
+- No extra requirements needed, just **numpy** and **onnxruntime**
+## ONNX Model Report
+**Model:** `hubert_base.onnx`
+**Producer:** pytorch 2.0.0
+**IR Version:** 8
+**Opsets:** ai.onnx:18
+**Parameters:** 94,370,816
+---
+### 🟦 Inputs
+- **source** | type: `float32` | shape: [batch_size, sequence_length]
+  - *Waveform PCM 32 - SR 16,000*
+- **padding_mask** | type: `bool` | shape: [batch_size, sequence_length]
+  - It is usually a completely false array, with the same shape as the waveform. `padding_mask = np.zeros(waveform.shape, dtype=np.bool_)`
+### 🟩 Outputs
+- **features** | type: `float32` | shape: [batch_size, sequence_length, 768 ]
+---
+## Usage
+```python
+import numpy as np
+import onnxruntime as ort
+class OnnxHubert:
+    """
+    Class to load and run the ONNX model exported by Hubert.
+    Attributes:
+        session (ort.InferenceSession): The ONNX Runtime session.
+        input_name (str): The name of the input node.
+        output_name (str): The name of the output node.
+    Methods:
+        extract_features_batch (source, padding_mask): Run the ONNX model and extract features from the batch.
+        extract_features (source, padding_mask): Run the ONNX model and extract features from a single input.
+    """
+    def __init__(self, model_path: str, thread_num: int = None):
+        """
+        Initialize the OnnxHubert object.
+        Parameters:
+            model_path (str): The path to the ONNX model file.
+            thread_num (int, optional): The number of threads to use for inference. Defaults to None.
+        Attributes:
+            session (ort.InferenceSession): The ONNX Runtime session.
+            input_name (str): The name of the input node.
+            output_name (str): The name of the output node.
+        """
+        self.session = ort.InferenceSession(model_path)
+        self.input_name = self.session.get_inputs()[0].name
+        self.output_name = self.session.get_outputs()[0].name
+    def extract_features(
+        self,
+        source: np.ndarray,
+        padding_mask: np.ndarray
+    ) -> np.ndarray:
+        """
+        Extract features from the batch using the ONNX model.
+        Inputs:
+            source: ndarray of shape (batch_size, sequence_length) float32
+            padding_mask: ndarray of shape (batch_size, sequence_length) bool
+        Returns:
+            ndarray of shape (D, 768) with the extracted features
+        """
+        result = self.session.run(None, {
+            "source": source,
+            "padding_mask": padding_mask
+        })
+        return result[0]
+```
+## Installation
+You can install the required libraries with:
+```bash
+pip install onnxruntime numpy
+```