SSLZip / models /SSLZip-256 /README.md

niobures

SSLZip (code, models, paper)

a12256d verified about 1 month ago

preview code

raw

history blame contribute delete

1.55 kB

metadata

license: cc-by-4.0
datasets:
  - openslr/librispeech_asr
language:
  - en
pipeline_tag: audio-to-audio

SSLZip

Usage

import onnxruntime as ort
from transformers import HubertModel
import torch

# Load the upstream HuBERT model.
upstream = HubertModel.from_pretrained("facebook/hubert-base-ls960")
upstream.eval()

# Load the autoencoder model.
postprocessor = ort.InferenceSession("sslzip_256.onnx")
node_name = postprocessor.get_inputs()[0].name

# Prepare an input waveform (assuming 16kHz audio).
x = torch.randn(1, 16000)

# Extract the latent representation for downstream tasks.
with torch.inference_mode():
    h = upstream(x, output_hidden_states=True).hidden_states[-1]
    z = postprocessor.run(None, {node_name: h.cpu().numpy()})[0]

# Use z as you like.
print(z.shape)

License

The pretrained model was developed using the LibriSpeech corpus and is distributed under the same license (CC BY 4.0).
Please include credit to Nagoya Institue of Technology and Techno-Speech, Inc. when using this model.

Citation

@InProceedings{yoshimura2025sslzip,
  author = {Takenori Yoshimura and Shinji Takaki and Kazuhiro Nakamura and Keiichiro Oura and Takato Fujimoto and Kei Hashimoto and Yoshihiko Nankaku and Keiichi Tokuda},
  title = {{SSLZip}: Simple autoencoding for enhancing self-supervised speech representations in speech generation},
  booktitle = {13th ISCA Speech Synthesis Workshop (SSW 2025)},
  pages = {xxx--xxx},
  year = {2025},
}