lglg666
/

dpss-exp3-TTS

Model card Files Files and versions

dpss-exp3-TTS / eval /thirdparty /UniSpeech /src /examples /speaker_verification /README.md

lglg666's picture

Upload folder using huggingface_hub

6766eda verified 21 days ago

|

history blame contribute delete

3.18 kB

	## Pre-training Representations for Speaker Verification

	### Pre-trained models

	\| Model \| Fix pre-train \| Vox1-O \| Vox1-E \| Vox1-H \|
	\| ------------------------------------------------------------ \| ------------- \| --------- \| --------- \| -------- \|
	\| [ECAPA-TDNN](https://drive.google.com/file/d/1kWmLyTGkBExTdxtwmrXoP4DhWz_7ZAv3/view?usp=sharing) \| - \| 1.080 \| 1.200 \| 2.127 \|
	\| [HuBERT large](https://drive.google.com/file/d/1cQAPIzg8DJASZyAYdaBN0gRa8piPQTMo/view?usp=sharing) \| Yes \| 0.888 \| 0.912 \| 1.853 \|
	\| [Wav2Vec2.0 (XLSR)](https://drive.google.com/file/d/1FiGokGtF2d7rkD9OpqLiQxKSqppTSXbl/view?usp=sharing) \| Yes \| 0.915 \| 0.945 \| 1.895 \|
	\| [UniSpeech-SAT large](https://drive.google.com/file/d/1W6KRt5Ci2T7_xPVdlE3JGdQG2KTrZ750/view?usp=sharing) \| Yes \| 0.771 \| 0.781 \| 1.669 \|
	\| [HuBERT large](https://drive.google.com/file/d/1nit9Z6RyM8Sdb3n8ccaglOQVNnqsjnui/view?usp=sharing) \| No \| 0.585 \| 0.654 \| 1.342 \|
	\| [Wav2Vec2.0 (XLSR)](https://drive.google.com/file/d/1TgKro9pp197TCgIF__IlE_rMVQOk50Eb/view?usp=sharing) \| No \| 0.564 \| 0.605 \| 1.23 \|
	\| [UniSpeech-SAT large](https://drive.google.com/file/d/10o6NHZsPXJn2k8n57e8Z_FkKh3V4TC3g/view?usp=sharing) \| No \| 0.564 \| 0.561 \| 1.23 \|

	### How to use?

	#### Environment Setup

	1. `pip install -r requirements.txt`
	2. Install fairseq code
	- For HuBERT_Large and Wav2Vec2.0 (XLSR), we should install the official [fairseq](https://github.com/pytorch/fairseq).
	- For UniSpeech-SAT large, we should install the [Unispeech-SAT](https://github.com/microsoft/UniSpeech/tree/main/UniSpeech-SAT) fairseq code.

	#### Example

	Take `unispeech_sat ` and `ecapa_tdnn` for example:

	1. First, you should download the pre-trained model in the above table to `checkpoint_path`.
	2. Then, run the following codes:
	- The wav files are sampled from [voxceleb1](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html).

	```bash
	python verification.py --model_name unispeech_sat --wav1 vox1_data/David_Faustino/hn8GyCJIfLM_0000012.wav --wav2 vox1_data/Josh_Gad/HXUqYaOwrxA_0000015.wav --checkpoint $checkpoint_path
	# output: The similarity score between two audios is 0.0317 (-1.0, 1.0).

	python verification.py --model_name unispeech_sat --wav1 vox1_data/David_Faustino/hn8GyCJIfLM_0000012.wav --wav2 vox1_data/David_Faustino/xTOk1Jz-F_g_0000015.wav --checkpoint --checkpoint $checkpoint_path
	# output: The similarity score between two audios is 0.5389 (-1.0, 1.0).

	python verification.py --model_name ecapa_tdnn --wav1 vox1_data/David_Faustino/hn8GyCJIfLM_0000012.wav --wav2 vox1_data/Josh_Gad/HXUqYaOwrxA_0000015.wav --checkpoint $checkpoint_path
	# output: The similarity score between two audios is 0.2053 (-1.0, 1.0).

	python verification.py --model_name ecapa_tdnn --wav1 vox1_data/David_Faustino/hn8GyCJIfLM_0000012.wav --wav2 vox1_data/David_Faustino/xTOk1Jz-F_g_0000015.wav --checkpoint --checkpoint $checkpoint_path
	# output: he similarity score between two audios is 0.5302 (-1.0, 1.0).
	```