lglg666's picture
Upload folder using huggingface_hub
6766eda verified
## Pre-training Representations for Speaker Verification
### Pre-trained models
| Model | Fix pre-train | Vox1-O | Vox1-E | Vox1-H |
| ------------------------------------------------------------ | ------------- | --------- | --------- | -------- |
| [ECAPA-TDNN](https://drive.google.com/file/d/1kWmLyTGkBExTdxtwmrXoP4DhWz_7ZAv3/view?usp=sharing) | - | 1.080 | 1.200 | 2.127 |
| [HuBERT large](https://drive.google.com/file/d/1cQAPIzg8DJASZyAYdaBN0gRa8piPQTMo/view?usp=sharing) | Yes | 0.888 | 0.912 | 1.853 |
| [Wav2Vec2.0 (XLSR)](https://drive.google.com/file/d/1FiGokGtF2d7rkD9OpqLiQxKSqppTSXbl/view?usp=sharing) | Yes | 0.915 | 0.945 | 1.895 |
| [**UniSpeech-SAT large**](https://drive.google.com/file/d/1W6KRt5Ci2T7_xPVdlE3JGdQG2KTrZ750/view?usp=sharing) | Yes | 0.771 | 0.781 | 1.669 |
| [HuBERT large](https://drive.google.com/file/d/1nit9Z6RyM8Sdb3n8ccaglOQVNnqsjnui/view?usp=sharing) | No | 0.585 | 0.654 | 1.342 |
| [Wav2Vec2.0 (XLSR)](https://drive.google.com/file/d/1TgKro9pp197TCgIF__IlE_rMVQOk50Eb/view?usp=sharing) | No | 0.564 | 0.605 | 1.23 |
| [**UniSpeech-SAT large**](https://drive.google.com/file/d/10o6NHZsPXJn2k8n57e8Z_FkKh3V4TC3g/view?usp=sharing) | No | **0.564** | **0.561** | **1.23** |
### How to use?
#### Environment Setup
1. `pip install -r requirements.txt`
2. Install fairseq code
- For HuBERT_Large and Wav2Vec2.0 (XLSR), we should install the official [fairseq](https://github.com/pytorch/fairseq).
- For UniSpeech-SAT large, we should install the [Unispeech-SAT](https://github.com/microsoft/UniSpeech/tree/main/UniSpeech-SAT) fairseq code.
#### Example
Take `unispeech_sat ` and `ecapa_tdnn` for example:
1. First, you should download the pre-trained model in the above table to `checkpoint_path`.
2. Then, run the following codes:
- The wav files are sampled from [voxceleb1](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html).
```bash
python verification.py --model_name unispeech_sat --wav1 vox1_data/David_Faustino/hn8GyCJIfLM_0000012.wav --wav2 vox1_data/Josh_Gad/HXUqYaOwrxA_0000015.wav --checkpoint $checkpoint_path
# output: The similarity score between two audios is 0.0317 (-1.0, 1.0).
python verification.py --model_name unispeech_sat --wav1 vox1_data/David_Faustino/hn8GyCJIfLM_0000012.wav --wav2 vox1_data/David_Faustino/xTOk1Jz-F_g_0000015.wav --checkpoint --checkpoint $checkpoint_path
# output: The similarity score between two audios is 0.5389 (-1.0, 1.0).
python verification.py --model_name ecapa_tdnn --wav1 vox1_data/David_Faustino/hn8GyCJIfLM_0000012.wav --wav2 vox1_data/Josh_Gad/HXUqYaOwrxA_0000015.wav --checkpoint $checkpoint_path
# output: The similarity score between two audios is 0.2053 (-1.0, 1.0).
python verification.py --model_name ecapa_tdnn --wav1 vox1_data/David_Faustino/hn8GyCJIfLM_0000012.wav --wav2 vox1_data/David_Faustino/xTOk1Jz-F_g_0000015.wav --checkpoint --checkpoint $checkpoint_path
# output: he similarity score between two audios is 0.5302 (-1.0, 1.0).
```