iliasslasri commited on
Commit
ea546b2
·
verified ·
1 Parent(s): b5ab1c2

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. E0_03-11-2026.pt +3 -0
  2. README.md +63 -0
E0_03-11-2026.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:096c7e89c2fbb326b0348031477dc8e03a3ea5b4cbab9c8a2c13e058b8b2f17c
3
+ size 4704270
README.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ datasets:
6
+ - librispeech_asr
7
+ metrics:
8
+ - abx
9
+ - wer
10
+ - ued
11
+ pipeline_tag: audio-quantization
12
+ tags:
13
+ - speech
14
+ - discrete-units
15
+ - quantization
16
+ - hubert
17
+ - clustering
18
+ ---
19
+
20
+ # Robust Quantizer for HuBERT Base (Layer 9)
21
+
22
+ This model checkpoint contains a **Robust Quantizer** trained on top of the 9th layer of the `hubert-base-ls960` model. It was developed as part of a reproduction and evaluation study on creating robust discrete speech units, originally proposed in *Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling (Gat et al., 2023)*.
23
+
24
+ ## Model Details
25
+
26
+ This quantizer was trained to provide discrete pseudo-labels that are resilient to various acoustic perturbations. By applying data augmentations during the quantization process, the resulting discrete units become, and by extension downstream acoustic models, more robust to noise and varying acoustic conditions.
27
+
28
+ - **Base Model:** [facebook/hubert-base-ls960](https://huggingface.co/facebook/hubert-base-ls960)
29
+ - **Layer:** 9
30
+ - **Vocabulary Size (Clusters):** 500
31
+ - **Algorithm:** K-Means
32
+ - **Dataset:** [LibriSpeech](https://huggingface.co/datasets/librispeech_asr) (`train-clean-100`)
33
+
34
+ ## Training Procedure
35
+
36
+ The model was trained for 10 epochs using the iterative training/pseudo-labeling procedure described in the original paper.
37
+
38
+ **Data Augmentations Applied:**
39
+ - Time Stretching
40
+ - Pitch Shifting
41
+ - Reverberation
42
+ - Additive Noise
43
+
44
+ ## Intended Use
45
+
46
+ This checkpoint is intended to be used to extract sequence of discrete units (pseudo-labels/tokens) from raw audio waveforms.
47
+
48
+ ```python
49
+ # Pseudo-code for usage
50
+ import torch
51
+ from transformers import HubertModel
52
+
53
+ hubert = HubertModel.from_pretrained("facebook/hubert-base-ls960")
54
+ # Load this quantizer
55
+ quantizer = torch.load("path_to_downloaded_checkpoint.pt")
56
+
57
+ # ... Pass audio through HuBERT to get layer 9 hidden states
58
+ # ... Apply quantizer to get discrete units
59
+ ```
60
+
61
+ ## Relevant Links
62
+ - Original Paper: [Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling (Gat et al., 2023)](https://aclanthology.org/2023.iwslt-1.46/)
63
+ - Project Repository: [github](https://github.com/iliasslasri/snlp_project)