iliasslasri
/

robust_speech_quantizer

Automatic Speech Recognition

Model card Files Files and versions

Metrics Training metrics Community

iliasslasri commited on Mar 11

Commit

ea546b2

·

verified ·

1 Parent(s): b5ab1c2

Upload folder using huggingface_hub

Files changed (2) hide show

E0_03-11-2026.pt +3 -0
README.md +63 -0

E0_03-11-2026.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:096c7e89c2fbb326b0348031477dc8e03a3ea5b4cbab9c8a2c13e058b8b2f17c
+size 4704270

README.md ADDED Viewed

	@@ -0,0 +1,63 @@

+---
+license: mit
+language:
+- en
+datasets:
+- librispeech_asr
+metrics:
+- abx
+- wer
+- ued
+pipeline_tag: audio-quantization
+tags:
+- speech
+- discrete-units
+- quantization
+- hubert
+- clustering
+---
+# Robust Quantizer for HuBERT Base (Layer 9)
+This model checkpoint contains a **Robust Quantizer** trained on top of the 9th layer of the `hubert-base-ls960` model. It was developed as part of a reproduction and evaluation study on creating robust discrete speech units, originally proposed in *Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling (Gat et al., 2023)*.
+## Model Details
+This quantizer was trained to provide discrete pseudo-labels that are resilient to various acoustic perturbations. By applying data augmentations during the quantization process, the resulting discrete units become, and by extension downstream acoustic models, more robust to noise and varying acoustic conditions.
+- **Base Model:** [facebook/hubert-base-ls960](https://huggingface.co/facebook/hubert-base-ls960)
+- **Layer:** 9
+- **Vocabulary Size (Clusters):** 500
+- **Algorithm:** K-Means
+- **Dataset:** [LibriSpeech](https://huggingface.co/datasets/librispeech_asr) (`train-clean-100`)
+## Training Procedure
+The model was trained for 10 epochs using the iterative training/pseudo-labeling procedure described in the original paper.
+**Data Augmentations Applied:**
+- Time Stretching
+- Pitch Shifting
+- Reverberation
+- Additive Noise
+## Intended Use
+This checkpoint is intended to be used to extract sequence of discrete units (pseudo-labels/tokens) from raw audio waveforms.
+```python
+# Pseudo-code for usage
+import torch
+from transformers import HubertModel
+hubert = HubertModel.from_pretrained("facebook/hubert-base-ls960")
+# Load this quantizer
+quantizer = torch.load("path_to_downloaded_checkpoint.pt")
+# ... Pass audio through HuBERT to get layer 9 hidden states
+# ... Apply quantizer to get discrete units
+```
+## Relevant Links
+- Original Paper: [Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling (Gat et al., 2023)](https://aclanthology.org/2023.iwslt-1.46/)
+- Project Repository: [github](https://github.com/iliasslasri/snlp_project)