Robust Quantizer for HuBERT Base (Layer 9)

This model checkpoint contains a Robust Quantizer trained on top of the 9th layer of the hubert-base-ls960 model. It was developed as part of a reproduction and evaluation study on creating robust discrete speech units, originally proposed in Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling (Gat et al., 2023).

Model Details

This quantizer was trained to provide discrete pseudo-labels that are resilient to various acoustic perturbations. By applying data augmentations during the quantization process, the resulting discrete units become, and by extension downstream acoustic models, more robust to noise and varying acoustic conditions.

Training Procedure

The model was trained for 10 epochs using the iterative training/pseudo-labeling procedure described in the original paper.

Data Augmentations Applied:

  • Time Stretching
  • Pitch Shifting
  • Reverberation
  • Additive Noise

Intended Use

This checkpoint is intended to be used to extract sequence of discrete units (pseudo-labels/tokens) from raw audio waveforms.

# Pseudo-code for usage
import torch
from transformers import HubertModel

hubert = HubertModel.from_pretrained("facebook/hubert-base-ls960")
# Load this quantizer
quantizer = torch.load("path_to_downloaded_checkpoint.pt")

# ... Pass audio through HuBERT to get layer 9 hidden states
# ... Apply quantizer to get discrete units

Relevant Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train iliasslasri/robust_speech_quantizer