Update README.md

ad1c6a4 verified about 3 hours ago

2.02 kB

license: mit
language:
  - en
datasets:
  - librispeech_asr
metrics:
  - abx
  - wer
  - ued
pipeline_tag: audio-classification
tags:
  - speech
  - discrete-units
  - quantization
  - hubert
  - clustering
base_model:
  - facebook/hubert-base-ls960

Robust Quantizer from HuBERT Base (Layer 6)

This model checkpoint contains a Robust Quantizer trained on top of the 6th layer of the hubert-base-ls960 model. It was developed as part of a reproduction and evaluation study on creating robust discrete speech units, originally proposed in Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling (Gat et al., 2023).

Model Details

This quantizer was trained to provide discrete pseudo-labels that are resilient to various acoustic perturbations. By applying data augmentations during the quantization process, the resulting discrete units become, and by extension downstream acoustic models, more robust to noise and varying acoustic conditions.

Base Model: facebook/hubert-base-ls960
Layer: 6
Vocabulary Size (Clusters): 100, 200, 500
Algorithm: K-Means
Dataset: LibriSpeech (train-clean-100)

Usage

Download the Model

from huggingface_hub import hf_hub_download

model_path = hf_hub_download(repo_id="iliasslasri/robust_speech_quantizer", 
                              filename="500_vocab_size/round_1/E1_best.pt",
                              force_download=True)
config_path = hf_hub_download(repo_id="iliasslasri/robust_speech_quantizer", 
                               filename="500_vocab_size/config.yaml",
                               force_download=True)

Relevant Links

Original Paper: Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling (Gat et al., 2023)
Project Repository: github