Delete README.md
Browse files
README.md
DELETED
|
@@ -1,132 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
tags:
|
| 3 |
-
- pyannote
|
| 4 |
-
- audio
|
| 5 |
-
- voice
|
| 6 |
-
- speech
|
| 7 |
-
- speaker
|
| 8 |
-
- speaker-segmentation
|
| 9 |
-
- voice-activity-detection
|
| 10 |
-
- overlapped-speech-detection
|
| 11 |
-
- resegmentation
|
| 12 |
-
datasets:
|
| 13 |
-
- ami
|
| 14 |
-
- dihard
|
| 15 |
-
- voxconverse
|
| 16 |
-
license: mit
|
| 17 |
-
inference: false
|
| 18 |
-
---
|
| 19 |
-
|
| 20 |
-
# pyannote.audio // speaker segmentation
|
| 21 |
-
|
| 22 |
-

|
| 23 |
-
|
| 24 |
-
Model from *[End-to-end speaker segmentation for overlap-aware resegmentation](http://arxiv.org/abs/2104.04045)*,
|
| 25 |
-
by Hervé Bredin and Antoine Laurent.
|
| 26 |
-
|
| 27 |
-
Relies on pyannote.audio 2.0 currently in development: see [installation instructions](https://github.com/pyannote/pyannote-audio/tree/develop#installation).
|
| 28 |
-
|
| 29 |
-
## Support
|
| 30 |
-
|
| 31 |
-
For commercial enquiries and scientific consulting, please contact [me](mailto:herve@niderb.fr).
|
| 32 |
-
For [technical questions](https://github.com/pyannote/pyannote-audio/discussions) and [bug reports](https://github.com/pyannote/pyannote-audio/issues), please check [pyannote.audio](https://github.com/pyannote/pyannote-audio) Github repository.
|
| 33 |
-
|
| 34 |
-
## Usage
|
| 35 |
-
|
| 36 |
-
### Voice activity detection
|
| 37 |
-
|
| 38 |
-
```python
|
| 39 |
-
from pyannote.audio.pipelines import VoiceActivityDetection
|
| 40 |
-
pipeline = VoiceActivityDetection(segmentation="pyannote/segmentation")
|
| 41 |
-
HYPER_PARAMETERS = {
|
| 42 |
-
# onset/offset activation thresholds
|
| 43 |
-
"onset": 0.5, "offset": 0.5,
|
| 44 |
-
# remove speech regions shorter than that many seconds.
|
| 45 |
-
"min_duration_on": 0.0,
|
| 46 |
-
# fill non-speech regions shorter than that many seconds.
|
| 47 |
-
"min_duration_off": 0.0
|
| 48 |
-
}
|
| 49 |
-
pipeline.instantiate(HYPER_PARAMETERS)
|
| 50 |
-
vad = pipeline("audio.wav")
|
| 51 |
-
# `vad` is a pyannote.core.Annotation instance containing speech regions
|
| 52 |
-
```
|
| 53 |
-
|
| 54 |
-
### Overlapped speech detection
|
| 55 |
-
|
| 56 |
-
```python
|
| 57 |
-
from pyannote.audio.pipelines import OverlappedSpeechDetection
|
| 58 |
-
pipeline = OverlappedSpeechDetection(segmentation="pyannote/segmentation")
|
| 59 |
-
pipeline.instantiate(HYPER_PARAMETERS)
|
| 60 |
-
osd = pipeline("audio.wav")
|
| 61 |
-
# `osd` is a pyannote.core.Annotation instance containing overlapped speech regions
|
| 62 |
-
```
|
| 63 |
-
|
| 64 |
-
### Resegmentation
|
| 65 |
-
|
| 66 |
-
```python
|
| 67 |
-
from pyannote.audio.pipelines import Resegmentation
|
| 68 |
-
pipeline = Resegmentation(segmentation="pyannote/segmentation",
|
| 69 |
-
diarization="baseline")
|
| 70 |
-
pipeline.instantiate(HYPER_PARAMETERS)
|
| 71 |
-
resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline})
|
| 72 |
-
# where `baseline` should be provided as a pyannote.core.Annotation instance
|
| 73 |
-
```
|
| 74 |
-
|
| 75 |
-
### Raw scores
|
| 76 |
-
|
| 77 |
-
```python
|
| 78 |
-
from pyannote.audio import Inference
|
| 79 |
-
inference = Inference("pyannote/segmentation")
|
| 80 |
-
segmentation = inference("audio.wav")
|
| 81 |
-
# `segmentation` is a pyannote.core.SlidingWindowFeature
|
| 82 |
-
# instance containing raw segmentation scores like the
|
| 83 |
-
# one pictured above (output)
|
| 84 |
-
```
|
| 85 |
-
|
| 86 |
-
## Reproducible research
|
| 87 |
-
|
| 88 |
-
In order to reproduce the results of the paper ["End-to-end speaker segmentation for overlap-aware resegmentation
|
| 89 |
-
"](https://arxiv.org/abs/2104.04045), use the following hyper-parameters:
|
| 90 |
-
|
| 91 |
-
Voice activity detection | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 92 |
-
----------------|---------|----------|-------------------|-------------------
|
| 93 |
-
AMI Mix-Headset | 0.684 | 0.577 | 0.181 | 0.037
|
| 94 |
-
DIHARD3 | 0.767 | 0.377 | 0.136 | 0.067
|
| 95 |
-
VoxConverse | 0.767 | 0.713 | 0.182 | 0.501
|
| 96 |
-
|
| 97 |
-
Overlapped speech detection | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 98 |
-
----------------|---------|----------|-------------------|-------------------
|
| 99 |
-
AMI Mix-Headset | 0.448 | 0.362 | 0.116 | 0.187
|
| 100 |
-
DIHARD3 | 0.430 | 0.320 | 0.091 | 0.144
|
| 101 |
-
VoxConverse | 0.587 | 0.426 | 0.337 | 0.112
|
| 102 |
-
|
| 103 |
-
Resegmentation of VBx | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 104 |
-
----------------|---------|----------|-------------------|-------------------
|
| 105 |
-
AMI Mix-Headset | 0.542 | 0.527 | 0.044 | 0.705
|
| 106 |
-
DIHARD3 | 0.592 | 0.489 | 0.163 | 0.182
|
| 107 |
-
VoxConverse | 0.537 | 0.724 | 0.410 | 0.563
|
| 108 |
-
|
| 109 |
-
Expected outputs (and VBx baseline) are also provided in the `/reproducible_research` sub-directories.
|
| 110 |
-
|
| 111 |
-
## Citation
|
| 112 |
-
|
| 113 |
-
```bibtex
|
| 114 |
-
@inproceedings{Bredin2021,
|
| 115 |
-
Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
|
| 116 |
-
Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
|
| 117 |
-
Booktitle = {Proc. Interspeech 2021},
|
| 118 |
-
Address = {Brno, Czech Republic},
|
| 119 |
-
Month = {August},
|
| 120 |
-
Year = {2021},
|
| 121 |
-
```
|
| 122 |
-
|
| 123 |
-
```bibtex
|
| 124 |
-
@inproceedings{Bredin2020,
|
| 125 |
-
Title = {{pyannote.audio: neural building blocks for speaker diarization}},
|
| 126 |
-
Author = {{Bredin}, Herv{\\\\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
|
| 127 |
-
Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
|
| 128 |
-
Address = {Barcelona, Spain},
|
| 129 |
-
Month = {May},
|
| 130 |
-
Year = {2020},
|
| 131 |
-
}
|
| 132 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|