Add comprehensive model card for w2v-BERT 2.0 Speaker Verification model

This PR adds a comprehensive model card for the w2v-BERT 2.0 based Speaker Verification model, as described in the paper [Enhancing Speaker Verification with w2v-BERT 2.0 and Knowledge Distillation guided Structured Pruning](https://huggingface.co/papers/2510.04213).

The updates include:
- Adding the `pipeline_tag: audio-classification` for better discoverability of speaker verification models on the Hub.
- Specifying the `license: mit`.
- Including an additional `speaker-verification` tag.
- Linking to the academic paper and the official GitHub repository.
- Incorporating the paper's abstract for a quick overview.
- Adding key diagrams and performance tables directly from the GitHub README.
- Providing a BibTeX citation for the paper.

As per instructions, no direct Python usage snippet is included since the GitHub README only provides shell commands and refers to scripts, not a readily copy-pasteable inference snippet for a Hugging Face library.

Please review and merge if these improvements align with expectations.

Files changed (1) hide show

README.md +46 -0

README.md ADDED Viewed

	@@ -0,0 +1,46 @@

+---
+pipeline_tag: audio-classification
+license: mit
+tags:
+  - speaker-verification
+---
+# Enhancing Speaker Verification with w2v-BERT 2.0 and Knowledge Distillation guided Structured Pruning
+This repository contains the models and code presented in the paper [Enhancing Speaker Verification with w2v-BERT 2.0 and Knowledge Distillation guided Structured Pruning](https://huggingface.co/papers/2510.04213).
+The official GitHub repository can be found at: [https://github.com/ZXHY-82/w2v-BERT-2.0_SV](https://github.com/ZXHY-82/w2v-BERT-2.0_SV)
+## Abstract
+Large-scale self-supervised Pre-Trained Models (PTMs) have shown significant improvements in the speaker verification (SV) task by providing rich feature representations. In this paper, we utilize w2v-BERT 2.0, a model with approximately 600 million parameters trained on 450 million hours of unlabeled data across 143 languages, for the SV task. The MFA structure with Layer Adapter is employed to process the multi-layer feature outputs from the PTM and extract speaker embeddings. Additionally, we incorporate LoRA for efficient fine-tuning. Our model achieves state-of-the-art results with 0.12% and 0.55% EER on the Vox1-O and Vox1-H test sets, respectively. Furthermore, we apply knowledge distillation guided structured pruning, reducing the model size by 80% while achieving only a 0.04% EER degradation.
+## Framework
+![Framework Diagram](https://github.com/ZXHY-82/w2v-BERT-2.0_SV/raw/main/assets/framework.png)
+## Performance
+The model demonstrates state-of-the-art results on speaker verification tasks.
+### Speaker Verification Results
+| Vox1-O (EER) | Vox1-E (EER) | Vox1-H (EER) | LMFT |
+| :----------- | :----------- | :----------- | :--- |
+| 0.23%        | 0.38%        | 0.81%        | &times;    |
+| 0.14%        | 0.31%        | 0.73%        | &#x221A;    |
+### Pruning Results
+The paper also presents knowledge distillation guided structured pruning:
+![Pruning Diagram](https://github.com/ZXHY-82/w2v-BERT-2.0_SV/raw/main/assets/prune.png)
+## How to use
+For detailed instructions on preparation, training, pruning, and testing, please refer to the [GitHub repository](https://github.com/ZXHY-82/w2v-BERT-2.0_SV).
+The GitHub repository provides shell commands to run various stages, including a `get_embd_w2v.py` script for the test stage.
+## Citation
+If you find this work helpful or inspiring, please cite the paper:
+```bibtex
+@article{Hu2025EnhancingSV,
+  title={Enhancing Speaker Verification with w2v-BERT 2.0 and Knowledge Distillation guided Structured Pruning},
+  author={Zixuan Hu and Yi Hu and Pengcheng Wei and Hongting Bai and Li Yan},
+  journal={arXiv:2510.04213},
+  year={2025}
+}
+```