vectominist commited on
Commit
5d6eb67
Β·
verified Β·
1 Parent(s): e763a34

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -9
README.md CHANGED
@@ -25,17 +25,19 @@ Trained on 126k hours of mixed data, USAD delivers competitive performance acros
25
 
26
  [πŸ‘€ **Read Full Paper**](https://arxiv.org/abs/2506.18843)
27
 
 
 
28
  ---
29
 
30
  ## πŸ—‚οΈ Models
31
 
32
  USAD models are all transformer encoders operating at **50Hz frame rate**. The teacher models are **WavLM Base+** and **ATST Frame**.
33
 
34
- | Model | Parameters | Dim | Layer | Checkpoint |
35
- | ---------- | ---------- | ---- | ----- | ------------------------------------------------- |
36
- | USAD Small | 24M | 384 | 12 | [link](https://huggingface.co/MIT-SLS/USAD-Small) |
37
- | USAD Base | 94M | 768 | 12 | [link](https://huggingface.co/MIT-SLS/USAD-Base) |
38
- | USAD Large | 330M | 1024 | 24 | [link](https://huggingface.co/MIT-SLS/USAD-Large) |
39
 
40
  ---
41
 
@@ -44,7 +46,7 @@ USAD models are all transformer encoders operating at **50Hz frame rate**. The t
44
 
45
  **Installation**
46
  ```
47
- pip install -U transformers
48
  ```
49
 
50
  **Load Model and Extract Features**
@@ -77,10 +79,10 @@ See [usad_model.py](https://huggingface.co/MIT-SLS/USAD-Base/blob/main/usad_mode
77
  ## πŸ“– Citation
78
 
79
  ```bibtex
80
- @article{chang2025usad,
81
  title={{USAD}: Universal Speech and Audio Representation via Distillation},
82
  author={Chang, Heng-Jui and Bhati, Saurabhchand and Glass, James and Liu, Alexander H.},
83
- journal={arXiv preprint arXiv:2506.18843},
84
  year={2025}
85
  }
86
  ```
@@ -89,4 +91,4 @@ See [usad_model.py](https://huggingface.co/MIT-SLS/USAD-Base/blob/main/usad_mode
89
 
90
  ## πŸ™ Acknowledgement
91
 
92
- Our implementation is based on the awesome [facebookresearch/fairseq](https://github.com/facebookresearch/fairseq), [cwx-worst-one/EAT](https://github.com/cwx-worst-one/EAT), and [sooftware/conformer](https://github.com/sooftware/conformer) repositories.
 
25
 
26
  [πŸ‘€ **Read Full Paper**](https://arxiv.org/abs/2506.18843)
27
 
28
+ [πŸ› οΈ **GitHub**](https://github.com/vectominist/usad)
29
+
30
  ---
31
 
32
  ## πŸ—‚οΈ Models
33
 
34
  USAD models are all transformer encoders operating at **50Hz frame rate**. The teacher models are **WavLM Base+** and **ATST Frame**.
35
 
36
+ | Model | Parameters | Dim | Layer |
37
+ | :-------------------------------------------------------- | ---------: | ---: | ----: |
38
+ | [USAD Small](https://huggingface.co/MIT-SLS/USAD-Small) | 24M | 384 | 12 |
39
+ | [USAD Base](https://huggingface.co/MIT-SLS/USAD-Base) | 94M | 768 | 12 |
40
+ | [USAD Large]((https://huggingface.co/MIT-SLS/USAD-Small)) | 330M | 1024 | 24 |
41
 
42
  ---
43
 
 
46
 
47
  **Installation**
48
  ```
49
+ pip install -U torch torchaudio transformers
50
  ```
51
 
52
  **Load Model and Extract Features**
 
79
  ## πŸ“– Citation
80
 
81
  ```bibtex
82
+ @inproceedings{chang2025usad,
83
  title={{USAD}: Universal Speech and Audio Representation via Distillation},
84
  author={Chang, Heng-Jui and Bhati, Saurabhchand and Glass, James and Liu, Alexander H.},
85
+ booktitle={IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
86
  year={2025}
87
  }
88
  ```
 
91
 
92
  ## πŸ™ Acknowledgement
93
 
94
+ Our implementation is based on the awesome [facebookresearch/fairseq](https://github.com/facebookresearch/fairseq), [cwx-worst-one/EAT](https://github.com/cwx-worst-one/EAT), and [sooftware/conformer](https://github.com/sooftware/conformer) repositories.