Upload folder using huggingface_hub
Browse files- config.yaml +21 -0
- embedding/README.md +20 -0
- embedding/pytorch_model.bin +3 -0
- plda/README.md +3 -0
- plda/plda.npz +3 -0
- plda/xvec_transform.npz +3 -0
- segmentation/pytorch_model.bin +3 -0
config.yaml
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
dependencies:
|
| 2 |
+
pyannote.audio: 4.0.0
|
| 3 |
+
|
| 4 |
+
pipeline:
|
| 5 |
+
name: pyannote.audio.pipelines.SpeakerDiarization
|
| 6 |
+
params:
|
| 7 |
+
clustering: VBxClustering
|
| 8 |
+
segmentation: $model/segmentation
|
| 9 |
+
segmentation_batch_size: 32
|
| 10 |
+
embedding: $model/embedding
|
| 11 |
+
embedding_batch_size: 32
|
| 12 |
+
embedding_exclude_overlap: true
|
| 13 |
+
plda: $model/plda
|
| 14 |
+
|
| 15 |
+
params:
|
| 16 |
+
clustering:
|
| 17 |
+
threshold: 0.6
|
| 18 |
+
Fa: 0.07
|
| 19 |
+
Fb: 0.8
|
| 20 |
+
segmentation:
|
| 21 |
+
min_duration_off: 0.0
|
embedding/README.md
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Copied from https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM
|
| 2 |
+
|
| 3 |
+
## License
|
| 4 |
+
|
| 5 |
+
According to [this page](https://github.com/wenet-e2e/wespeaker/blob/master/docs/pretrained.md):
|
| 6 |
+
|
| 7 |
+
> The pretrained model in WeNet follows the license of it's corresponding dataset. For example, the pretrained model on VoxCeleb follows Creative Commons Attribution 4.0 International License., since it is used as license of the VoxCeleb dataset, see https://mm.kaist.ac.kr/datasets/voxceleb/.
|
| 8 |
+
|
| 9 |
+
## Citation
|
| 10 |
+
|
| 11 |
+
```bibtex
|
| 12 |
+
@inproceedings{Wang2023,
|
| 13 |
+
title={Wespeaker: A research and production oriented speaker embedding learning toolkit},
|
| 14 |
+
author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},
|
| 15 |
+
booktitle={ICASSP 2023, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
|
| 16 |
+
pages={1--5},
|
| 17 |
+
year={2023},
|
| 18 |
+
organization={IEEE}
|
| 19 |
+
}
|
| 20 |
+
```
|
embedding/pytorch_model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6f10ff60898a1d185fa22e1d11e0bfa8a92efec811f11bca48cb8cafebefd929
|
| 3 |
+
size 26646242
|
plda/README.md
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
PLDA model trained by [BUT Speech@FIT](https://speech.fit.vut.cz/) group.
|
| 2 |
+
|
| 3 |
+
Thanks to [Jiangyu Han](https://github.com/jyhan03) and [Petr Pálka](https://github.com/Selesnyan) for the integration of VBx in pyannote.audio.
|
plda/plda.npz
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9b77bcd840692710dd3496f62ecfeed8d8e5f002fd991b785079b244eab7d255
|
| 3 |
+
size 133852
|
plda/xvec_transform.npz
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:325f1ce8e48f7e55e9c8aa47e05d2766b7c48c4b25b8de8dd751e7a4cc5fbe8f
|
| 3 |
+
size 134376
|
segmentation/pytorch_model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7ad24338d844fb95985486eb1a464e32d229f6d7a03c9abe60f978bacf3f816e
|
| 3 |
+
size 5906507
|