niobures
/

SSLZip

Model card Files Files and versions

SSLZip / models /SSLZip-256 /README.md

niobures's picture

SSLZip (code, models, paper)

a12256d verified about 1 month ago

|

history blame contribute delete

1.55 kB

	---
	license: cc-by-4.0
	datasets:
	- openslr/librispeech_asr
	language:
	- en
	pipeline_tag: audio-to-audio
	---

	# SSLZip

	## Usage

	```py
	import onnxruntime as ort
	from transformers import HubertModel
	import torch

	# Load the upstream HuBERT model.
	upstream = HubertModel.from_pretrained("facebook/hubert-base-ls960")
	upstream.eval()

	# Load the autoencoder model.
	postprocessor = ort.InferenceSession("sslzip_256.onnx")
	node_name = postprocessor.get_inputs()[0].name

	# Prepare an input waveform (assuming 16kHz audio).
	x = torch.randn(1, 16000)

	# Extract the latent representation for downstream tasks.
	with torch.inference_mode():
	h = upstream(x, output_hidden_states=True).hidden_states[-1]
	z = postprocessor.run(None, {node_name: h.cpu().numpy()})[0]

	# Use z as you like.
	print(z.shape)
	```

	## License

	The pretrained model was developed using the LibriSpeech corpus and is distributed under the same license (CC BY 4.0).
	Please include credit to Nagoya Institue of Technology and Techno-Speech, Inc. when using this model.

	## Citation

	```bibtex
	@InProceedings{yoshimura2025sslzip,
	author = {Takenori Yoshimura and Shinji Takaki and Kazuhiro Nakamura and Keiichiro Oura and Takato Fujimoto and Kei Hashimoto and Yoshihiko Nankaku and Keiichi Tokuda},
	title = {{SSLZip}: Simple autoencoding for enhancing self-supervised speech representations in speech generation},
	booktitle = {13th ISCA Speech Synthesis Workshop (SSW 2025)},
	pages = {xxx--xxx},
	year = {2025},
	}
	```