Iliass Lasri commited on
Commit
4d02b99
·
1 Parent(s): e32d9e2

updated README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -28
README.md CHANGED
@@ -5,54 +5,52 @@ language:
5
  datasets:
6
  - librispeech_asr
7
  metrics:
8
- - abx
9
- - wer
10
  - ued
 
11
  pipeline_tag: automatic-speech-recognition
12
  tags:
13
  - speech
14
  - discrete-units
15
  - quantization
16
  - hubert
17
- - clustering
 
18
  base_model:
19
  - facebook/hubert-base-ls960
20
  ---
21
 
22
- # Robust Quantizer from HuBERT Base (Layer 6)
23
 
24
- This model checkpoint contains a **Robust Quantizer** trained on top of the 6th layer of the `hubert-base-ls960` model. It was developed as part of a reproduction and evaluation study on creating robust discrete speech units, originally proposed in *Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling (Gat et al., 2023)*.
25
 
26
- ## Model Details
27
 
28
- This quantizer was trained to provide discrete pseudo-labels that are resilient to various acoustic perturbations. By applying data augmentations during the quantization process, the resulting discrete units become, and by extension downstream acoustic models, more robust to noise and varying acoustic conditions.
29
 
30
- - **Base Model:** [facebook/hubert-base-ls960](https://huggingface.co/facebook/hubert-base-ls960)
31
- - **Layer:** 6
32
- - **Vocabulary Size (Clusters):** 100, 200, 500
33
- - **Algorithm:** K-Means
34
- - **Dataset:** [LibriSpeech](https://huggingface.co/datasets/librispeech_asr) (`train-clean-100`)
35
 
36
- ## Usage
37
-
38
- ### Download the Model
39
 
40
  ```python
41
  from huggingface_hub import hf_hub_download
42
 
43
- model_path = hf_hub_download(repo_id="iliasslasri/robust_speech_quantizer",
44
- filename="500_vocab_size/round_1/E1_best.pt",
45
- force_download=True)
46
- config_path = hf_hub_download(repo_id="iliasslasri/robust_speech_quantizer",
47
- filename="500_vocab_size/config.yaml",
48
- force_download=True)
 
 
49
  ```
50
 
51
- ## Augmentation Examples
52
-
53
- Here are examples of the data augmentations applied to the audio during the training of the quantizer:
54
 
55
- | Augmentation | Audio Example |
56
  |---|---|
57
  | Clean | <audio controls src="https://huggingface.co/iliasslasri/robust_speech_quantizer/resolve/main/augmentations/00_clean.wav"></audio> |
58
  | Time Stretch | <audio controls src="https://huggingface.co/iliasslasri/robust_speech_quantizer/resolve/main/augmentations/01_time_stretch.wav"></audio> |
@@ -70,6 +68,6 @@ Here are examples of the data augmentations applied to the audio during the trai
70
  | Duck Audio | <audio controls src="https://huggingface.co/iliasslasri/robust_speech_quantizer/resolve/main/augmentations/13_duck_audio.wav"></audio> |
71
  | Up-Down Resample | <audio controls src="https://huggingface.co/iliasslasri/robust_speech_quantizer/resolve/main/augmentations/14_updownresample.wav"></audio> |
72
 
73
- ## Relevant Links
74
- - Original Paper: [Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling (Gat et al., 2023)](https://aclanthology.org/2023.iwslt-1.46/)
75
- - Project Repository: [github](https://github.com/iliasslasri/snlp_project)
 
5
  datasets:
6
  - librispeech_asr
7
  metrics:
 
 
8
  - ued
9
+ - abx
10
  pipeline_tag: automatic-speech-recognition
11
  tags:
12
  - speech
13
  - discrete-units
14
  - quantization
15
  - hubert
16
+ - dinosr
17
+ - spidr
18
  base_model:
19
  - facebook/hubert-base-ls960
20
  ---
21
 
22
+ # Robust Speech Quantizer (HuBERT / DinoSR / SpidR)
23
 
24
+ **[GitHub Repository](https://github.com/iliasslasri/snlp_project)**
25
 
26
+ MLP-based robust speech quantizers trained with CTC loss and iterative pseudo-labeling on augmented audio, following [Algayres et al., Interspeech 2023](https://aclanthology.org/2023.iwslt-1.46/). Evaluated on K ∈ {100, 200, 500} vocabulary sizes.
27
 
28
+ ## Encoders
29
 
30
+ | Encoder | Checkpoint | Layer | Pre-training data |
31
+ |---|---|---|---|
32
+ | [HuBERT Base](https://huggingface.co/facebook/hubert-base-ls960) | `hubert-base-ls960` | 6 | LibriSpeech 960h |
33
+ | [DinoSR](https://arxiv.org/abs/2305.04582) | original + SpidR-reproduced | 5 | LibriSpeech 960h |
34
+ | [SpidR](https://arxiv.org/abs/2512.20308) | `spidr-base` | 6 | LibriSpeech 960h |
35
 
36
+ ## Quick Start
 
 
37
 
38
  ```python
39
  from huggingface_hub import hf_hub_download
40
 
41
+ model_path = hf_hub_download(
42
+ repo_id="iliasslasri/robust_speech_quantizer",
43
+ filename="500_vocab_size/round_1/E1_best.pt"
44
+ )
45
+ config_path = hf_hub_download(
46
+ repo_id="iliasslasri/robust_speech_quantizer",
47
+ filename="500_vocab_size/config.yaml"
48
+ )
49
  ```
50
 
51
+ ## Augmentations
 
 
52
 
53
+ | Augmentation | Audio |
54
  |---|---|
55
  | Clean | <audio controls src="https://huggingface.co/iliasslasri/robust_speech_quantizer/resolve/main/augmentations/00_clean.wav"></audio> |
56
  | Time Stretch | <audio controls src="https://huggingface.co/iliasslasri/robust_speech_quantizer/resolve/main/augmentations/01_time_stretch.wav"></audio> |
 
68
  | Duck Audio | <audio controls src="https://huggingface.co/iliasslasri/robust_speech_quantizer/resolve/main/augmentations/13_duck_audio.wav"></audio> |
69
  | Up-Down Resample | <audio controls src="https://huggingface.co/iliasslasri/robust_speech_quantizer/resolve/main/augmentations/14_updownresample.wav"></audio> |
70
 
71
+ ## Links
72
+ - Paper: [Algayres et al., Interspeech 2023](https://aclanthology.org/2023.iwslt-1.46/)
73
+ - Code: [GitHub](https://github.com/iliasslasri/snlp_project)