Upload folder using huggingface_hub
Browse files- E0_03-11-2026.pt +3 -0
- README.md +63 -0
E0_03-11-2026.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:096c7e89c2fbb326b0348031477dc8e03a3ea5b4cbab9c8a2c13e058b8b2f17c
|
| 3 |
+
size 4704270
|
README.md
ADDED
|
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
datasets:
|
| 6 |
+
- librispeech_asr
|
| 7 |
+
metrics:
|
| 8 |
+
- abx
|
| 9 |
+
- wer
|
| 10 |
+
- ued
|
| 11 |
+
pipeline_tag: audio-quantization
|
| 12 |
+
tags:
|
| 13 |
+
- speech
|
| 14 |
+
- discrete-units
|
| 15 |
+
- quantization
|
| 16 |
+
- hubert
|
| 17 |
+
- clustering
|
| 18 |
+
---
|
| 19 |
+
|
| 20 |
+
# Robust Quantizer for HuBERT Base (Layer 9)
|
| 21 |
+
|
| 22 |
+
This model checkpoint contains a **Robust Quantizer** trained on top of the 9th layer of the `hubert-base-ls960` model. It was developed as part of a reproduction and evaluation study on creating robust discrete speech units, originally proposed in *Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling (Gat et al., 2023)*.
|
| 23 |
+
|
| 24 |
+
## Model Details
|
| 25 |
+
|
| 26 |
+
This quantizer was trained to provide discrete pseudo-labels that are resilient to various acoustic perturbations. By applying data augmentations during the quantization process, the resulting discrete units become, and by extension downstream acoustic models, more robust to noise and varying acoustic conditions.
|
| 27 |
+
|
| 28 |
+
- **Base Model:** [facebook/hubert-base-ls960](https://huggingface.co/facebook/hubert-base-ls960)
|
| 29 |
+
- **Layer:** 9
|
| 30 |
+
- **Vocabulary Size (Clusters):** 500
|
| 31 |
+
- **Algorithm:** K-Means
|
| 32 |
+
- **Dataset:** [LibriSpeech](https://huggingface.co/datasets/librispeech_asr) (`train-clean-100`)
|
| 33 |
+
|
| 34 |
+
## Training Procedure
|
| 35 |
+
|
| 36 |
+
The model was trained for 10 epochs using the iterative training/pseudo-labeling procedure described in the original paper.
|
| 37 |
+
|
| 38 |
+
**Data Augmentations Applied:**
|
| 39 |
+
- Time Stretching
|
| 40 |
+
- Pitch Shifting
|
| 41 |
+
- Reverberation
|
| 42 |
+
- Additive Noise
|
| 43 |
+
|
| 44 |
+
## Intended Use
|
| 45 |
+
|
| 46 |
+
This checkpoint is intended to be used to extract sequence of discrete units (pseudo-labels/tokens) from raw audio waveforms.
|
| 47 |
+
|
| 48 |
+
```python
|
| 49 |
+
# Pseudo-code for usage
|
| 50 |
+
import torch
|
| 51 |
+
from transformers import HubertModel
|
| 52 |
+
|
| 53 |
+
hubert = HubertModel.from_pretrained("facebook/hubert-base-ls960")
|
| 54 |
+
# Load this quantizer
|
| 55 |
+
quantizer = torch.load("path_to_downloaded_checkpoint.pt")
|
| 56 |
+
|
| 57 |
+
# ... Pass audio through HuBERT to get layer 9 hidden states
|
| 58 |
+
# ... Apply quantizer to get discrete units
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
## Relevant Links
|
| 62 |
+
- Original Paper: [Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling (Gat et al., 2023)](https://aclanthology.org/2023.iwslt-1.46/)
|
| 63 |
+
- Project Repository: [github](https://github.com/iliasslasri/snlp_project)
|