Iliass Lasri commited on
Commit ·
4d02b99
1
Parent(s): e32d9e2
updated README.md
Browse files
README.md
CHANGED
|
@@ -5,54 +5,52 @@ language:
|
|
| 5 |
datasets:
|
| 6 |
- librispeech_asr
|
| 7 |
metrics:
|
| 8 |
-
- abx
|
| 9 |
-
- wer
|
| 10 |
- ued
|
|
|
|
| 11 |
pipeline_tag: automatic-speech-recognition
|
| 12 |
tags:
|
| 13 |
- speech
|
| 14 |
- discrete-units
|
| 15 |
- quantization
|
| 16 |
- hubert
|
| 17 |
-
-
|
|
|
|
| 18 |
base_model:
|
| 19 |
- facebook/hubert-base-ls960
|
| 20 |
---
|
| 21 |
|
| 22 |
-
# Robust Quantizer
|
| 23 |
|
| 24 |
-
|
| 25 |
|
| 26 |
-
|
| 27 |
|
| 28 |
-
|
| 29 |
|
| 30 |
-
|
| 31 |
-
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
|
| 36 |
-
##
|
| 37 |
-
|
| 38 |
-
### Download the Model
|
| 39 |
|
| 40 |
```python
|
| 41 |
from huggingface_hub import hf_hub_download
|
| 42 |
|
| 43 |
-
model_path = hf_hub_download(
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
|
|
|
|
|
|
| 49 |
```
|
| 50 |
|
| 51 |
-
##
|
| 52 |
-
|
| 53 |
-
Here are examples of the data augmentations applied to the audio during the training of the quantizer:
|
| 54 |
|
| 55 |
-
| Augmentation | Audio
|
| 56 |
|---|---|
|
| 57 |
| Clean | <audio controls src="https://huggingface.co/iliasslasri/robust_speech_quantizer/resolve/main/augmentations/00_clean.wav"></audio> |
|
| 58 |
| Time Stretch | <audio controls src="https://huggingface.co/iliasslasri/robust_speech_quantizer/resolve/main/augmentations/01_time_stretch.wav"></audio> |
|
|
@@ -70,6 +68,6 @@ Here are examples of the data augmentations applied to the audio during the trai
|
|
| 70 |
| Duck Audio | <audio controls src="https://huggingface.co/iliasslasri/robust_speech_quantizer/resolve/main/augmentations/13_duck_audio.wav"></audio> |
|
| 71 |
| Up-Down Resample | <audio controls src="https://huggingface.co/iliasslasri/robust_speech_quantizer/resolve/main/augmentations/14_updownresample.wav"></audio> |
|
| 72 |
|
| 73 |
-
##
|
| 74 |
-
-
|
| 75 |
-
-
|
|
|
|
| 5 |
datasets:
|
| 6 |
- librispeech_asr
|
| 7 |
metrics:
|
|
|
|
|
|
|
| 8 |
- ued
|
| 9 |
+
- abx
|
| 10 |
pipeline_tag: automatic-speech-recognition
|
| 11 |
tags:
|
| 12 |
- speech
|
| 13 |
- discrete-units
|
| 14 |
- quantization
|
| 15 |
- hubert
|
| 16 |
+
- dinosr
|
| 17 |
+
- spidr
|
| 18 |
base_model:
|
| 19 |
- facebook/hubert-base-ls960
|
| 20 |
---
|
| 21 |
|
| 22 |
+
# Robust Speech Quantizer (HuBERT / DinoSR / SpidR)
|
| 23 |
|
| 24 |
+
**[GitHub Repository](https://github.com/iliasslasri/snlp_project)**
|
| 25 |
|
| 26 |
+
MLP-based robust speech quantizers trained with CTC loss and iterative pseudo-labeling on augmented audio, following [Algayres et al., Interspeech 2023](https://aclanthology.org/2023.iwslt-1.46/). Evaluated on K ∈ {100, 200, 500} vocabulary sizes.
|
| 27 |
|
| 28 |
+
## Encoders
|
| 29 |
|
| 30 |
+
| Encoder | Checkpoint | Layer | Pre-training data |
|
| 31 |
+
|---|---|---|---|
|
| 32 |
+
| [HuBERT Base](https://huggingface.co/facebook/hubert-base-ls960) | `hubert-base-ls960` | 6 | LibriSpeech 960h |
|
| 33 |
+
| [DinoSR](https://arxiv.org/abs/2305.04582) | original + SpidR-reproduced | 5 | LibriSpeech 960h |
|
| 34 |
+
| [SpidR](https://arxiv.org/abs/2512.20308) | `spidr-base` | 6 | LibriSpeech 960h |
|
| 35 |
|
| 36 |
+
## Quick Start
|
|
|
|
|
|
|
| 37 |
|
| 38 |
```python
|
| 39 |
from huggingface_hub import hf_hub_download
|
| 40 |
|
| 41 |
+
model_path = hf_hub_download(
|
| 42 |
+
repo_id="iliasslasri/robust_speech_quantizer",
|
| 43 |
+
filename="500_vocab_size/round_1/E1_best.pt"
|
| 44 |
+
)
|
| 45 |
+
config_path = hf_hub_download(
|
| 46 |
+
repo_id="iliasslasri/robust_speech_quantizer",
|
| 47 |
+
filename="500_vocab_size/config.yaml"
|
| 48 |
+
)
|
| 49 |
```
|
| 50 |
|
| 51 |
+
## Augmentations
|
|
|
|
|
|
|
| 52 |
|
| 53 |
+
| Augmentation | Audio |
|
| 54 |
|---|---|
|
| 55 |
| Clean | <audio controls src="https://huggingface.co/iliasslasri/robust_speech_quantizer/resolve/main/augmentations/00_clean.wav"></audio> |
|
| 56 |
| Time Stretch | <audio controls src="https://huggingface.co/iliasslasri/robust_speech_quantizer/resolve/main/augmentations/01_time_stretch.wav"></audio> |
|
|
|
|
| 68 |
| Duck Audio | <audio controls src="https://huggingface.co/iliasslasri/robust_speech_quantizer/resolve/main/augmentations/13_duck_audio.wav"></audio> |
|
| 69 |
| Up-Down Resample | <audio controls src="https://huggingface.co/iliasslasri/robust_speech_quantizer/resolve/main/augmentations/14_updownresample.wav"></audio> |
|
| 70 |
|
| 71 |
+
## Links
|
| 72 |
+
- Paper: [Algayres et al., Interspeech 2023](https://aclanthology.org/2023.iwslt-1.46/)
|
| 73 |
+
- Code: [GitHub](https://github.com/iliasslasri/snlp_project)
|