Feature Extraction
Transformers
Safetensors
English
usad
automatic-speech-recognition
audio-classification
audio
speech
music
custom_code
Instructions to use MIT-SLS/USAD-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MIT-SLS/USAD-Base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="MIT-SLS/USAD-Base", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("MIT-SLS/USAD-Base", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -25,17 +25,19 @@ Trained on 126k hours of mixed data, USAD delivers competitive performance acros
|
|
| 25 |
|
| 26 |
[π **Read Full Paper**](https://arxiv.org/abs/2506.18843)
|
| 27 |
|
|
|
|
|
|
|
| 28 |
---
|
| 29 |
|
| 30 |
## ποΈ Models
|
| 31 |
|
| 32 |
USAD models are all transformer encoders operating at **50Hz frame rate**. The teacher models are **WavLM Base+** and **ATST Frame**.
|
| 33 |
|
| 34 |
-
| Model
|
| 35 |
-
| ----------
|
| 36 |
-
| USAD Small
|
| 37 |
-
| USAD Base
|
| 38 |
-
| USAD Large
|
| 39 |
|
| 40 |
---
|
| 41 |
|
|
@@ -44,7 +46,7 @@ USAD models are all transformer encoders operating at **50Hz frame rate**. The t
|
|
| 44 |
|
| 45 |
**Installation**
|
| 46 |
```
|
| 47 |
-
pip install -U transformers
|
| 48 |
```
|
| 49 |
|
| 50 |
**Load Model and Extract Features**
|
|
@@ -77,10 +79,10 @@ See [usad_model.py](https://huggingface.co/MIT-SLS/USAD-Base/blob/main/usad_mode
|
|
| 77 |
## π Citation
|
| 78 |
|
| 79 |
```bibtex
|
| 80 |
-
@
|
| 81 |
title={{USAD}: Universal Speech and Audio Representation via Distillation},
|
| 82 |
author={Chang, Heng-Jui and Bhati, Saurabhchand and Glass, James and Liu, Alexander H.},
|
| 83 |
-
|
| 84 |
year={2025}
|
| 85 |
}
|
| 86 |
```
|
|
@@ -89,4 +91,4 @@ See [usad_model.py](https://huggingface.co/MIT-SLS/USAD-Base/blob/main/usad_mode
|
|
| 89 |
|
| 90 |
## π Acknowledgement
|
| 91 |
|
| 92 |
-
Our implementation is based on the awesome [facebookresearch/fairseq](https://github.com/facebookresearch/fairseq), [cwx-worst-one/EAT](https://github.com/cwx-worst-one/EAT), and [sooftware/conformer](https://github.com/sooftware/conformer) repositories.
|
|
|
|
| 25 |
|
| 26 |
[π **Read Full Paper**](https://arxiv.org/abs/2506.18843)
|
| 27 |
|
| 28 |
+
[π οΈ **GitHub**](https://github.com/vectominist/usad)
|
| 29 |
+
|
| 30 |
---
|
| 31 |
|
| 32 |
## ποΈ Models
|
| 33 |
|
| 34 |
USAD models are all transformer encoders operating at **50Hz frame rate**. The teacher models are **WavLM Base+** and **ATST Frame**.
|
| 35 |
|
| 36 |
+
| Model | Parameters | Dim | Layer |
|
| 37 |
+
| :-------------------------------------------------------- | ---------: | ---: | ----: |
|
| 38 |
+
| [USAD Small](https://huggingface.co/MIT-SLS/USAD-Small) | 24M | 384 | 12 |
|
| 39 |
+
| [USAD Base](https://huggingface.co/MIT-SLS/USAD-Base) | 94M | 768 | 12 |
|
| 40 |
+
| [USAD Large]((https://huggingface.co/MIT-SLS/USAD-Small)) | 330M | 1024 | 24 |
|
| 41 |
|
| 42 |
---
|
| 43 |
|
|
|
|
| 46 |
|
| 47 |
**Installation**
|
| 48 |
```
|
| 49 |
+
pip install -U torch torchaudio transformers
|
| 50 |
```
|
| 51 |
|
| 52 |
**Load Model and Extract Features**
|
|
|
|
| 79 |
## π Citation
|
| 80 |
|
| 81 |
```bibtex
|
| 82 |
+
@inproceedings{chang2025usad,
|
| 83 |
title={{USAD}: Universal Speech and Audio Representation via Distillation},
|
| 84 |
author={Chang, Heng-Jui and Bhati, Saurabhchand and Glass, James and Liu, Alexander H.},
|
| 85 |
+
booktitle={IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
|
| 86 |
year={2025}
|
| 87 |
}
|
| 88 |
```
|
|
|
|
| 91 |
|
| 92 |
## π Acknowledgement
|
| 93 |
|
| 94 |
+
Our implementation is based on the awesome [facebookresearch/fairseq](https://github.com/facebookresearch/fairseq), [cwx-worst-one/EAT](https://github.com/cwx-worst-one/EAT), and [sooftware/conformer](https://github.com/sooftware/conformer) repositories.
|