tezuesh commited on
Commit
4b027ff
·
verified ·
1 Parent(s): 6d1a048

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -132
README.md DELETED
@@ -1,132 +0,0 @@
1
- ---
2
- tags:
3
- - pyannote
4
- - audio
5
- - voice
6
- - speech
7
- - speaker
8
- - speaker-segmentation
9
- - voice-activity-detection
10
- - overlapped-speech-detection
11
- - resegmentation
12
- datasets:
13
- - ami
14
- - dihard
15
- - voxconverse
16
- license: mit
17
- inference: false
18
- ---
19
-
20
- # pyannote.audio // speaker segmentation
21
-
22
- ![Example](example.png)
23
-
24
- Model from *[End-to-end speaker segmentation for overlap-aware resegmentation](http://arxiv.org/abs/2104.04045)*,
25
- by Hervé Bredin and Antoine Laurent.
26
-
27
- Relies on pyannote.audio 2.0 currently in development: see [installation instructions](https://github.com/pyannote/pyannote-audio/tree/develop#installation).
28
-
29
- ## Support
30
-
31
- For commercial enquiries and scientific consulting, please contact [me](mailto:herve@niderb.fr).
32
- For [technical questions](https://github.com/pyannote/pyannote-audio/discussions) and [bug reports](https://github.com/pyannote/pyannote-audio/issues), please check [pyannote.audio](https://github.com/pyannote/pyannote-audio) Github repository.
33
-
34
- ## Usage
35
-
36
- ### Voice activity detection
37
-
38
- ```python
39
- from pyannote.audio.pipelines import VoiceActivityDetection
40
- pipeline = VoiceActivityDetection(segmentation="pyannote/segmentation")
41
- HYPER_PARAMETERS = {
42
- # onset/offset activation thresholds
43
- "onset": 0.5, "offset": 0.5,
44
- # remove speech regions shorter than that many seconds.
45
- "min_duration_on": 0.0,
46
- # fill non-speech regions shorter than that many seconds.
47
- "min_duration_off": 0.0
48
- }
49
- pipeline.instantiate(HYPER_PARAMETERS)
50
- vad = pipeline("audio.wav")
51
- # `vad` is a pyannote.core.Annotation instance containing speech regions
52
- ```
53
-
54
- ### Overlapped speech detection
55
-
56
- ```python
57
- from pyannote.audio.pipelines import OverlappedSpeechDetection
58
- pipeline = OverlappedSpeechDetection(segmentation="pyannote/segmentation")
59
- pipeline.instantiate(HYPER_PARAMETERS)
60
- osd = pipeline("audio.wav")
61
- # `osd` is a pyannote.core.Annotation instance containing overlapped speech regions
62
- ```
63
-
64
- ### Resegmentation
65
-
66
- ```python
67
- from pyannote.audio.pipelines import Resegmentation
68
- pipeline = Resegmentation(segmentation="pyannote/segmentation",
69
- diarization="baseline")
70
- pipeline.instantiate(HYPER_PARAMETERS)
71
- resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline})
72
- # where `baseline` should be provided as a pyannote.core.Annotation instance
73
- ```
74
-
75
- ### Raw scores
76
-
77
- ```python
78
- from pyannote.audio import Inference
79
- inference = Inference("pyannote/segmentation")
80
- segmentation = inference("audio.wav")
81
- # `segmentation` is a pyannote.core.SlidingWindowFeature
82
- # instance containing raw segmentation scores like the
83
- # one pictured above (output)
84
- ```
85
-
86
- ## Reproducible research
87
-
88
- In order to reproduce the results of the paper ["End-to-end speaker segmentation for overlap-aware resegmentation
89
- "](https://arxiv.org/abs/2104.04045), use the following hyper-parameters:
90
-
91
- Voice activity detection | `onset` | `offset` | `min_duration_on` | `min_duration_off`
92
- ----------------|---------|----------|-------------------|-------------------
93
- AMI Mix-Headset | 0.684 | 0.577 | 0.181 | 0.037
94
- DIHARD3 | 0.767 | 0.377 | 0.136 | 0.067
95
- VoxConverse | 0.767 | 0.713 | 0.182 | 0.501
96
-
97
- Overlapped speech detection | `onset` | `offset` | `min_duration_on` | `min_duration_off`
98
- ----------------|---------|----------|-------------------|-------------------
99
- AMI Mix-Headset | 0.448 | 0.362 | 0.116 | 0.187
100
- DIHARD3 | 0.430 | 0.320 | 0.091 | 0.144
101
- VoxConverse | 0.587 | 0.426 | 0.337 | 0.112
102
-
103
- Resegmentation of VBx | `onset` | `offset` | `min_duration_on` | `min_duration_off`
104
- ----------------|---------|----------|-------------------|-------------------
105
- AMI Mix-Headset | 0.542 | 0.527 | 0.044 | 0.705
106
- DIHARD3 | 0.592 | 0.489 | 0.163 | 0.182
107
- VoxConverse | 0.537 | 0.724 | 0.410 | 0.563
108
-
109
- Expected outputs (and VBx baseline) are also provided in the `/reproducible_research` sub-directories.
110
-
111
- ## Citation
112
-
113
- ```bibtex
114
- @inproceedings{Bredin2021,
115
- Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
116
- Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
117
- Booktitle = {Proc. Interspeech 2021},
118
- Address = {Brno, Czech Republic},
119
- Month = {August},
120
- Year = {2021},
121
- ```
122
-
123
- ```bibtex
124
- @inproceedings{Bredin2020,
125
- Title = {{pyannote.audio: neural building blocks for speaker diarization}},
126
- Author = {{Bredin}, Herv{\\\\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
127
- Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
128
- Address = {Barcelona, Spain},
129
- Month = {May},
130
- Year = {2020},
131
- }
132
- ```