MidFord327
/

Hubert-Base-ONNX

Feature Extraction

Model card Files Files and versions

Hubert-Base-ONNX / README.md

MidFord327's picture

Update README.md

cfd3fea verified 5 months ago

|

history blame contribute delete

3.6 kB

	---
	license: apache-2.0
	base_model:
	- lj1995/VoiceConversionWebUI
	- facebook/hubert-base-ls960
	pipeline_tag: feature-extraction
	library_name: fairseq
	tags:
	- rvc
	- audio
	---

	# Hubert Base ONNX Model for Voice Conversion

	This is the ONNX-exported version of the Hubert Base model, fine-tuned for voice conversion and compatible with modern inference pipelines. This model allows fast and efficient audio processing in ONNX runtime environments.

	It builds upon the following models:
	- [lj1995/VoiceConversionWebUI](https://huggingface.co/lj1995/VoiceConversionWebUI)
	- [facebook/hubert-base-ls960](https://huggingface.co/facebook/hubert-base-ls960)

	---

	## Features

	- Converts audio features into high-quality embeddings for voice conversion tasks.
	- Fully ONNX-compatible for optimized inference on CPUs and GPUs.
	- Lightweight and easy to integrate in custom voice processing pipelines.
	- No extra requirements needed, just numpy and onnxruntime

	## ONNX Model Report

	Model: `hubert_base.onnx`
	Producer: pytorch 2.0.0
	IR Version: 8
	Opsets: ai.onnx:18
	Parameters: 94,370,816

	---

	### 🟦 Inputs
	- source \| type: `float32` \| shape: [batch_size, sequence_length]
	- Waveform PCM 32 - SR 16,000 - Mono
	- padding_mask \| type: `bool` \| shape: [batch_size, sequence_length]
	- It is usually a completely false array, with the same shape as the waveform. `padding_mask = np.zeros(waveform.shape, dtype=np.bool_)`

	### 🟩 Outputs
	- features \| type: `float32` \| shape: [batch_size, sequence_length, 768 ]



	---

	## Usage

	```python
	import numpy as np
	import onnxruntime as ort

	class OnnxHubert:
	"""
	Class to load and run the ONNX model exported by Hubert.

	Attributes:
	session (ort.InferenceSession): The ONNX Runtime session.
	input_name (str): The name of the input node.
	output_name (str): The name of the output node.

	Methods:
	extract_features_batch (source, padding_mask): Run the ONNX model and extract features from the batch.
	extract_features (source, padding_mask): Run the ONNX model and extract features from a single input.
	"""
	def __init__(self, model_path: str, thread_num: int = None):
	"""
	Initialize the OnnxHubert object.

	Parameters:
	model_path (str): The path to the ONNX model file.
	thread_num (int, optional): The number of threads to use for inference. Defaults to None.

	Attributes:
	session (ort.InferenceSession): The ONNX Runtime session.
	input_name (str): The name of the input node.
	output_name (str): The name of the output node.
	"""
	self.session = ort.InferenceSession(model_path)

	self.input_name = self.session.get_inputs()[0].name
	self.output_name = self.session.get_outputs()[0].name
	def extract_features(
	self,
	source: np.ndarray,
	padding_mask: np.ndarray
	) -> np.ndarray:
	"""
	Extract features from the batch using the ONNX model.

	Inputs:
	source: ndarray of shape (batch_size, sequence_length) float32
	padding_mask: ndarray of shape (batch_size, sequence_length) bool

	Returns:
	ndarray of shape (D, 768) with the extracted features
	"""
	result = self.session.run(None, {
	"source": source,
	"padding_mask": padding_mask
	})
	return result[0]
	```

	## Installation

	You can install the required libraries with:

	```bash
	pip install onnxruntime numpy
	```