Pre-training Representations for Speaker Verification
Pre-trained models
| Model | Fix pre-train | Vox1-O | Vox1-E | Vox1-H |
|---|---|---|---|---|
| ECAPA-TDNN | - | 1.080 | 1.200 | 2.127 |
| HuBERT large | Yes | 0.888 | 0.912 | 1.853 |
| Wav2Vec2.0 (XLSR) | Yes | 0.915 | 0.945 | 1.895 |
| UniSpeech-SAT large | Yes | 0.771 | 0.781 | 1.669 |
| HuBERT large | No | 0.585 | 0.654 | 1.342 |
| Wav2Vec2.0 (XLSR) | No | 0.564 | 0.605 | 1.23 |
| UniSpeech-SAT large | No | 0.564 | 0.561 | 1.23 |
How to use?
Environment Setup
pip install -r requirements.txt- Install fairseq code
- For HuBERT_Large and Wav2Vec2.0 (XLSR), we should install the official fairseq.
- For UniSpeech-SAT large, we should install the Unispeech-SAT fairseq code.
Example
Take unispeech_sat and ecapa_tdnn for example:
- First, you should download the pre-trained model in the above table to
checkpoint_path. - Then, run the following codes:
- The wav files are sampled from voxceleb1.
python verification.py --model_name unispeech_sat --wav1 vox1_data/David_Faustino/hn8GyCJIfLM_0000012.wav --wav2 vox1_data/Josh_Gad/HXUqYaOwrxA_0000015.wav --checkpoint $checkpoint_path
# output: The similarity score between two audios is 0.0317 (-1.0, 1.0).
python verification.py --model_name unispeech_sat --wav1 vox1_data/David_Faustino/hn8GyCJIfLM_0000012.wav --wav2 vox1_data/David_Faustino/xTOk1Jz-F_g_0000015.wav --checkpoint --checkpoint $checkpoint_path
# output: The similarity score between two audios is 0.5389 (-1.0, 1.0).
python verification.py --model_name ecapa_tdnn --wav1 vox1_data/David_Faustino/hn8GyCJIfLM_0000012.wav --wav2 vox1_data/Josh_Gad/HXUqYaOwrxA_0000015.wav --checkpoint $checkpoint_path
# output: The similarity score between two audios is 0.2053 (-1.0, 1.0).
python verification.py --model_name ecapa_tdnn --wav1 vox1_data/David_Faustino/hn8GyCJIfLM_0000012.wav --wav2 vox1_data/David_Faustino/xTOk1Jz-F_g_0000015.wav --checkpoint --checkpoint $checkpoint_path
# output: he similarity score between two audios is 0.5302 (-1.0, 1.0).