File size: 3,564 Bytes
8c838e7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
# Pyannote-audio : Speaker Diarization
## Input
Audio file (.wav format).
```
Example
input: data/demo.wav
```
(Wav file from https://github.com/pyannote/pyannote-audio/tree/develop/pyannote/audio/sample)
## Output
When and who spoke.

```
[ 00:00:06.714 --> 00:00:07.003] A speaker91
[ 00:00:07.003 --> 00:00:07.173] B speaker90
[ 00:00:07.580 --> 00:00:08.310] C speaker91
[ 00:00:08.310 --> 00:00:09.923] D speaker90
[ 00:00:09.923 --> 00:00:10.976] E speaker91
[ 00:00:10.466 --> 00:00:14.745] F speaker90
[ 00:00:14.303 --> 00:00:17.886] G speaker91
[ 00:00:18.022 --> 00:00:21.502] H speaker90
[ 00:00:18.157 --> 00:00:18.446] I speaker91
[ 00:00:21.774 --> 00:00:28.531] J speaker91
[ 00:00:27.886 --> 00:00:29.991] K speaker90
```
## Requirements
This model recommends additional module.
```bash
$ pip3 install -r requirements.txt
```
## Usage
Automatically downloads the onnx and prototxt files on the first run.
It is necessary to be connected to the Internet while downloading.
For the sample
```bash
$ python pyannote-audio.py -i ./data/sample.wav
```
For the sample with plot
```bash
$ python pyannote-audio.py -i ./data/sample.wav --plt
```
For the sample with verification
```bash
$ python pyannote-audio.py -i ./data/sample.wav -g ./data/sample.rttm
```
If you want to specify the audio, put the file path after the `--i` or `-input` option.
```bash
$ python pyannote-audio.py --i FILE_PATH
```
If you want to specify the ground truth, put the file path after the `--ig` or `-input_ground` option.
```bash
$ python pyannote-audio.py --ig FILE_PATH
```
If you want to specify the output file, put the file path after the `--o` or `-output` option.
```bash
$ python pyannote-audio.py --o FILE_PATH
```
If you want to specify the output ground truth file, put the file path after the `--og` or `-output_ground` option.
```bash
$ python pyannote-audio.py --og FILE_PATH
```
If you know the number of speakers, put the numper `--num` or `-num_speaker` option.
```bash
$ python pyannote-audio.py --num 2
```
If you know the maxisimum number of speakers, put the numper `--max` or `-max_speaker` option.
```bash
$ python pyannote-audio.py --max 4
```
If you know the minimum number of speakers, put the numper `--min` or `-min_speaker` option.
```bash
$ python pyannote-audio.py --min 2
```
By giving the `--e` or `-error` option, you can get diarization error rate.
```bash
$ python pyannote-audio.py --use_onnx
```
By giving the `--plt` option, you can visualize results.
```bash
$ python pyannote-audio.py --use_onnx
```
By giving the `--use_onnx` option, you can use onnx.
```bash
$ python pyannote-audio.py --use_onnx
```
By giving the `--embed` option, you can get embedding vector in the input file.
```bash
$ python pyannote-audio.py --embed
```
## Reference
- [Pyannote-audio](https://github.com/pyannote/pyannote-audio)
- [Hugging Face - pyannote in speaker-diariazation](https://huggingface.co/pyannote/speaker-diarization-3.1)
- [Hugging Face - hdbrain in wespeaker-voxceleb-resnet34-LM](https://huggingface.co/hbredin/wespeaker-voxceleb-resnet34-LM/tree/main)
- [KaldiFeat](https://github.com/yuyq96/kaldifeat)
## Framework
Pytorch
## Model Format
ONNX opset=14,17
## Netron
- [segmentation.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/pyannote-audio/segmentation.onnx.prototxt)
- [speaker-embedding.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/pyannote-audio/speaker-embedding.onnx.prototxt)
|