Instructions to use microsoft/wavlm-base-plus-sv with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/wavlm-base-plus-sv with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForAudioXVector processor = AutoProcessor.from_pretrained("microsoft/wavlm-base-plus-sv") model = AutoModelForAudioXVector.from_pretrained("microsoft/wavlm-base-plus-sv") - Notebooks
- Google Colab
- Kaggle
Can't match expected performance on VoxCeleb test set
When evaluating this model on VoxCeleb-O, I get an EER of 4.9% (threshold around .87, which is very close to what's reported here). This seems high, and according to the WavLM paper it should be 0.84%. What EER do you get using this model?
What might be a clue is that when putting the input through the model I get the following warning:torch\nn\functional.py:5962: UserWarning: Support for mismatched key_padding_mask and attn_mask is deprecated. Use same type for both instead.
I have not been able to figure out what's causing this, though. I have also tried setting up the Unispeech git repository and comparing to the model checkpoints released there, but I have not been able to set the environment up successfully.