Fine‑Tuning WeSpeaker with CA‑MHFA WavLM‑Base‑Plus on VoxCeleb 2

This guide explains how to replace a few core WeSpeaker components with improved SSL back‑end code, plug in the CA‑MHFA_WavLM‑Base‑Plus_VoxCeleb2 checkpoint, and reproduce the reported speaker‑verification scores.

1 Prerequisites

Please install wespeaker toolkit.

2 Patch WeSpeaker

In the current directory you will find an updated wespeaker folder containing new and modified code. Replace the corresponding files in your original repository:

Frontend
- Replace wespeaker/frontend/s3prl.py with the version provided here.
SSL back‑end
- Create a new folder wespeaker/models/ssl_backend in the original repo.
- Copy all contents from wespeaker/models/ssl_backend (this directory) into the newly created folder.
Speaker model wrapper
- Overwrite wespeaker/models/speaker_model.py with the patched file.

# From the root of this (patched) repo
cp -f wespeaker/frontend/s3prl.py <PATH_TO_ORIG>/wespeaker/frontend/
mkdir -p <PATH_TO_ORIG>/wespeaker/models/ssl_backend
cp -r wespeaker/models/ssl_backend/* <PATH_TO_ORIG>/wespeaker/models/ssl_backend/
cp -f wespeaker/models/speaker_model.py <PATH_TO_ORIG>/wespeaker/models/

3 Add the Pre‑trained Checkpoint

Copy the directory CA-MHFA_WavLM-Base-Plus_VoxCeleb2 into WeSpeaker’s VoxCeleb v2 experiment folder:

cp -r CA-MHFA_WavLM-Base-Plus_VoxCeleb2 \
      <PATH_TO_ORIG>/examples/voxceleb/v2/exp/

The resulting tree should look like:

examples/voxceleb/v2
 └── exp
     └── CA-MHFA_WavLM-Base-Plus_VoxCeleb2
         ├── models
            └──avg_model.pt
         └── config.yaml

4 Edit `run.sh`

Inside examples/voxceleb/v2/run.sh make two quick edits:

# Where to write logs, checkpoints, and scores
exp_dir=exp/CA-MHFA_WavLM-Base-Plus_VoxCeleb2

# Which stage to start from (we only need scoring)
stage=4

Why stage 4? Stages 0‑3 perform data preparation and training, which are already complete in the supplied checkpoint.

5 Run the Recipe

cd examples/voxceleb/v2
bash run.sh

The script will score the VoxCeleb 1 cleaned trials and print results similar to the block below.

6 Expected Results

---- cali_vox2_dev_asnorm300_vox1_O_cleaned.kaldi.score -----
EER   = 0.627
minDCF (p_target:0.01  c_miss:1  c_fa:1) = 0.095

---- cali_vox2_dev_asnorm300_vox1_E_cleaned.kaldi.score -----
EER   = 0.674
minDCF (p_target:0.01  c_miss:1  c_fa:1) = 0.069

---- cali_vox2_dev_asnorm300_vox1_H_cleaned.kaldi.score -----
EER   = 1.314
minDCF (p_target:0.01  c_miss:1  c_fa:1) = 0.125

Fine‑Tuning WeSpeaker with CA‑MHFA WavLM‑Base‑Plus on VoxCeleb 2