File size: 3,564 Bytes
8c838e7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# Pyannote-audio : Speaker Diarization

## Input

Audio file (.wav format).
```
Example
input: data/demo.wav
```
(Wav file from https://github.com/pyannote/pyannote-audio/tree/develop/pyannote/audio/sample)

## Output

When and who spoke.
![Output](output.png)

```
[ 00:00:06.714 -->  00:00:07.003] A speaker91
[ 00:00:07.003 -->  00:00:07.173] B speaker90
[ 00:00:07.580 -->  00:00:08.310] C speaker91
[ 00:00:08.310 -->  00:00:09.923] D speaker90
[ 00:00:09.923 -->  00:00:10.976] E speaker91
[ 00:00:10.466 -->  00:00:14.745] F speaker90
[ 00:00:14.303 -->  00:00:17.886] G speaker91
[ 00:00:18.022 -->  00:00:21.502] H speaker90
[ 00:00:18.157 -->  00:00:18.446] I speaker91
[ 00:00:21.774 -->  00:00:28.531] J speaker91
[ 00:00:27.886 -->  00:00:29.991] K speaker90
```

## Requirements

This model recommends additional module.
```bash
$ pip3 install -r requirements.txt
```

## Usage

Automatically downloads the onnx and prototxt files on the first run.
It is necessary to be connected to the Internet while downloading.

For the sample
```bash
$ python pyannote-audio.py -i ./data/sample.wav
```

For the sample with plot
```bash
$ python pyannote-audio.py -i ./data/sample.wav --plt
```

For the sample with verification
```bash
$ python pyannote-audio.py -i ./data/sample.wav -g ./data/sample.rttm
```

If you want to specify the audio, put the file path after the `--i` or `-input` option.

```bash
$ python pyannote-audio.py --i FILE_PATH
```

If you want to specify the ground truth, put the file path after the `--ig` or `-input_ground` option.

```bash
$ python pyannote-audio.py --ig FILE_PATH
```

If you want to specify the output file, put the file path after the `--o` or `-output` option.

```bash
$ python pyannote-audio.py --o FILE_PATH
```

If you want to specify the output ground truth file, put the file path after the `--og` or `-output_ground` option.

```bash
$ python pyannote-audio.py --og FILE_PATH
```

If you know the number of speakers, put the numper `--num` or `-num_speaker` option.
```bash
$ python pyannote-audio.py --num 2
```

If you know the maxisimum number of speakers, put the numper `--max` or `-max_speaker` option.
```bash
$ python pyannote-audio.py --max 4
```

If you know the minimum number of speakers, put the numper `--min` or `-min_speaker` option.
```bash
$ python pyannote-audio.py --min 2
```

By giving the `--e` or `-error` option, you can get diarization error rate.
```bash
$ python pyannote-audio.py --use_onnx
```

By giving the `--plt` option, you can visualize results.
```bash
$ python pyannote-audio.py --use_onnx
```

By giving the `--use_onnx` option, you can use onnx.
```bash
$ python pyannote-audio.py --use_onnx
```

By giving the `--embed` option, you can get embedding vector in the input file.
```bash
$ python pyannote-audio.py --embed
```

## Reference

- [Pyannote-audio](https://github.com/pyannote/pyannote-audio)
- [Hugging Face - pyannote in speaker-diariazation](https://huggingface.co/pyannote/speaker-diarization-3.1)
- [Hugging Face - hdbrain in wespeaker-voxceleb-resnet34-LM](https://huggingface.co/hbredin/wespeaker-voxceleb-resnet34-LM/tree/main)
- [KaldiFeat](https://github.com/yuyq96/kaldifeat)

## Framework

Pytorch

## Model Format

ONNX opset=14,17

## Netron

- [segmentation.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/pyannote-audio/segmentation.onnx.prototxt)
- [speaker-embedding.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/pyannote-audio/speaker-embedding.onnx.prototxt)