Hervé Bredin
commited on
Commit
·
2dbbe55
1
Parent(s):
52200fc
doc: update README
Browse files
README.md
CHANGED
|
@@ -31,44 +31,24 @@ Relies on pyannote.audio 2.0 currently in development: see [installation instruc
|
|
| 31 |
For commercial enquiries and scientific consulting, please contact [me](mailto:herve@niderb.fr).
|
| 32 |
For [technical questions](https://github.com/pyannote/pyannote-audio/discussions) and [bug reports](https://github.com/pyannote/pyannote-audio/issues), please check [pyannote.audio](https://github.com/pyannote/pyannote-audio) Github repository.
|
| 33 |
|
| 34 |
-
##
|
| 35 |
|
| 36 |
-
|
| 37 |
-
from pyannote.audio import Inference
|
| 38 |
-
inference = Inference("pyannote/segmentation")
|
| 39 |
-
segmentation = inference("audio.wav")
|
| 40 |
-
# `segmentation` is a pyannote.core.SlidingWindowFeature
|
| 41 |
-
# instance containing raw segmentation scores like the
|
| 42 |
-
# one pictured above (output)
|
| 43 |
|
| 44 |
-
|
| 45 |
-
|
|
|
|
| 46 |
HYPER_PARAMETERS = {
|
| 47 |
# onset/offset activation thresholds
|
| 48 |
"onset": 0.5, "offset": 0.5,
|
| 49 |
-
# remove
|
| 50 |
"min_duration_on": 0.0,
|
| 51 |
-
# fill
|
| 52 |
"min_duration_off": 0.0
|
| 53 |
}
|
| 54 |
-
|
| 55 |
-
pipeline.instantiate(HYPER_PARAMETERS)
|
| 56 |
-
segmentation = pipeline("audio.wav")
|
| 57 |
-
# `segmentation` now is a pyannote.core.Annotation
|
| 58 |
-
# instance containing a hard binary segmentation
|
| 59 |
-
# like the one picutred above (reference)
|
| 60 |
-
```
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
## Advanced usage
|
| 64 |
-
|
| 65 |
-
### Voice activity detection
|
| 66 |
-
|
| 67 |
-
```python
|
| 68 |
-
from pyannote.audio.pipelines import VoiceActivityDetection
|
| 69 |
-
pipeline = VoiceActivityDetection(segmentation="pyannote/segmentation")
|
| 70 |
pipeline.instantiate(HYPER_PARAMETERS)
|
| 71 |
vad = pipeline("audio.wav")
|
|
|
|
| 72 |
```
|
| 73 |
|
| 74 |
### Overlapped speech detection
|
|
@@ -78,6 +58,7 @@ from pyannote.audio.pipelines import OverlappedSpeechDetection
|
|
| 78 |
pipeline = OverlappedSpeechDetection(segmentation="pyannote/segmentation")
|
| 79 |
pipeline.instantiate(HYPER_PARAMETERS)
|
| 80 |
osd = pipeline("audio.wav")
|
|
|
|
| 81 |
```
|
| 82 |
|
| 83 |
### Resegmentation
|
|
@@ -91,6 +72,17 @@ resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline})
|
|
| 91 |
# where `baseline` should be provided as a pyannote.core.Annotation instance
|
| 92 |
```
|
| 93 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
## Reproducible research
|
| 95 |
|
| 96 |
In order to reproduce the results of the paper ["End-to-end speaker segmentation for overlap-aware resegmentation
|
|
@@ -118,6 +110,16 @@ Expected outputs (and VBx baseline) are also provided in the `/reproducible_rese
|
|
| 118 |
|
| 119 |
## Citation
|
| 120 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
```bibtex
|
| 122 |
@inproceedings{Bredin2020,
|
| 123 |
Title = {{pyannote.audio: neural building blocks for speaker diarization}},
|
|
|
|
| 31 |
For commercial enquiries and scientific consulting, please contact [me](mailto:herve@niderb.fr).
|
| 32 |
For [technical questions](https://github.com/pyannote/pyannote-audio/discussions) and [bug reports](https://github.com/pyannote/pyannote-audio/issues), please check [pyannote.audio](https://github.com/pyannote/pyannote-audio) Github repository.
|
| 33 |
|
| 34 |
+
## Usage
|
| 35 |
|
| 36 |
+
### Voice activity detection
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
+
```python
|
| 39 |
+
from pyannote.audio.pipelines import VoiceActivityDetection
|
| 40 |
+
pipeline = VoiceActivityDetection(segmentation="pyannote/segmentation")
|
| 41 |
HYPER_PARAMETERS = {
|
| 42 |
# onset/offset activation thresholds
|
| 43 |
"onset": 0.5, "offset": 0.5,
|
| 44 |
+
# remove speech regions shorter than that many seconds.
|
| 45 |
"min_duration_on": 0.0,
|
| 46 |
+
# fill non-speech regions shorter than that many seconds.
|
| 47 |
"min_duration_off": 0.0
|
| 48 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
pipeline.instantiate(HYPER_PARAMETERS)
|
| 50 |
vad = pipeline("audio.wav")
|
| 51 |
+
# `vad` is a pyannote.core.Annotation instance containing speech regions
|
| 52 |
```
|
| 53 |
|
| 54 |
### Overlapped speech detection
|
|
|
|
| 58 |
pipeline = OverlappedSpeechDetection(segmentation="pyannote/segmentation")
|
| 59 |
pipeline.instantiate(HYPER_PARAMETERS)
|
| 60 |
osd = pipeline("audio.wav")
|
| 61 |
+
# `osd` is a pyannote.core.Annotation instance containing overlapped speech regions
|
| 62 |
```
|
| 63 |
|
| 64 |
### Resegmentation
|
|
|
|
| 72 |
# where `baseline` should be provided as a pyannote.core.Annotation instance
|
| 73 |
```
|
| 74 |
|
| 75 |
+
### Raw scores
|
| 76 |
+
|
| 77 |
+
```python
|
| 78 |
+
from pyannote.audio import Inference
|
| 79 |
+
inference = Inference("pyannote/segmentation")
|
| 80 |
+
segmentation = inference("audio.wav")
|
| 81 |
+
# `segmentation` is a pyannote.core.SlidingWindowFeature
|
| 82 |
+
# instance containing raw segmentation scores like the
|
| 83 |
+
# one pictured above (output)
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
## Reproducible research
|
| 87 |
|
| 88 |
In order to reproduce the results of the paper ["End-to-end speaker segmentation for overlap-aware resegmentation
|
|
|
|
| 110 |
|
| 111 |
## Citation
|
| 112 |
|
| 113 |
+
```bibtex
|
| 114 |
+
@inproceedings{Bredin2021,
|
| 115 |
+
Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
|
| 116 |
+
Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
|
| 117 |
+
Booktitle = {Proc. Interspeech 2021},
|
| 118 |
+
Address = {Brno, Czech Republic},
|
| 119 |
+
Month = {August},
|
| 120 |
+
Year = {2021},
|
| 121 |
+
```
|
| 122 |
+
|
| 123 |
```bibtex
|
| 124 |
@inproceedings{Bredin2020,
|
| 125 |
Title = {{pyannote.audio: neural building blocks for speaker diarization}},
|