Hervé Bredin
commited on
Commit
·
df706b9
1
Parent(s):
f47dcce
fix: fix README
Browse files
README.md
CHANGED
|
@@ -17,7 +17,7 @@ license: mit
|
|
| 17 |
inference: false
|
| 18 |
---
|
| 19 |
|
| 20 |
-
#
|
| 21 |
|
| 22 |
This model relies on `pyannote.audio` 2.0 (which is still in development):
|
| 23 |
|
|
@@ -29,7 +29,7 @@ $ pip install https://github.com/pyannote/pyannote-audio/archive/develop.zip
|
|
| 29 |
|
| 30 |
```python
|
| 31 |
>>> from pyannote.audio import Inference
|
| 32 |
-
>>> inference = Inference("pyannote/
|
| 33 |
>>> segmentation = inference("audio.wav")
|
| 34 |
```
|
| 35 |
|
|
@@ -40,25 +40,30 @@ $ pip install https://github.com/pyannote/pyannote-audio/archive/develop.zip
|
|
| 40 |
```python
|
| 41 |
>>> from pyannote.audio.pipelines import VoiceActivityDetection
|
| 42 |
>>> HYPER_PARAMETERS = {"onset": 0.5, "offset": 0.5, "min_duration_on": 0.0, "min_duration_off": 0.0}
|
| 43 |
-
>>> pipeline = VoiceActivityDetection(segmentation="pyannote/
|
|
|
|
| 44 |
>>> vad = pipeline("audio.wav")
|
| 45 |
```
|
| 46 |
|
|
|
|
|
|
|
| 47 |
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 48 |
----------------|---------|----------|-------------------|-------------------
|
| 49 |
AMI Mix-Headset | TODO | TODO | TODO | TODO
|
| 50 |
DIHARD3 | TODO | TODO | TODO | TODO
|
| 51 |
VoxConverse | TODO | TODO | TODO | TODO
|
| 52 |
|
| 53 |
-
|
| 54 |
### Overlapped speech detection
|
| 55 |
|
| 56 |
```python
|
| 57 |
>>> from pyannote.audio.pipelines import OverlappedSpeechDetection
|
| 58 |
-
>>> pipeline = OverlappedSpeechDetection(segmentation="pyannote/
|
|
|
|
| 59 |
>>> osd = pipeline("audio.wav")
|
| 60 |
```
|
| 61 |
|
|
|
|
|
|
|
| 62 |
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 63 |
----------------|---------|----------|-------------------|-------------------
|
| 64 |
AMI Mix-Headset | TODO | TODO | TODO | TODO
|
|
@@ -70,9 +75,12 @@ VoxConverse | TODO | TODO | TODO | TODO
|
|
| 70 |
|
| 71 |
```python
|
| 72 |
>>> from pyannote.audio.pipelines import Segmentation
|
| 73 |
-
>>> pipeline = Segmentation(segmentation="pyannote/
|
|
|
|
| 74 |
>>> seg = pipeline("audio.wav")
|
| 75 |
```
|
|
|
|
|
|
|
| 76 |
|
| 77 |
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 78 |
----------------|---------|----------|-------------------|-------------------
|
|
@@ -84,11 +92,22 @@ VoxConverse | TODO | TODO | TODO | TODO
|
|
| 84 |
|
| 85 |
```python
|
| 86 |
>>> from pyannote.audio.pipelines import Resegmentation
|
| 87 |
-
>>> pipeline = Resegmentation(segmentation="pyannote/
|
| 88 |
-
|
| 89 |
-
>>>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
```
|
| 91 |
|
|
|
|
|
|
|
| 92 |
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 93 |
----------------|---------|----------|-------------------|-------------------
|
| 94 |
AMI Mix-Headset | TODO | TODO | TODO | TODO
|
|
@@ -97,7 +116,6 @@ VoxConverse | TODO | TODO | TODO | TODO
|
|
| 97 |
|
| 98 |
## Citations
|
| 99 |
|
| 100 |
-
|
| 101 |
```bibtex
|
| 102 |
@inproceedings{Bredin2020,
|
| 103 |
Title = {{pyannote.audio: neural building blocks for speaker diarization}},
|
|
|
|
| 17 |
inference: false
|
| 18 |
---
|
| 19 |
|
| 20 |
+
# pyannote.audio // speaker segmentation
|
| 21 |
|
| 22 |
This model relies on `pyannote.audio` 2.0 (which is still in development):
|
| 23 |
|
|
|
|
| 29 |
|
| 30 |
```python
|
| 31 |
>>> from pyannote.audio import Inference
|
| 32 |
+
>>> inference = Inference("pyannote/segmentation")
|
| 33 |
>>> segmentation = inference("audio.wav")
|
| 34 |
```
|
| 35 |
|
|
|
|
| 40 |
```python
|
| 41 |
>>> from pyannote.audio.pipelines import VoiceActivityDetection
|
| 42 |
>>> HYPER_PARAMETERS = {"onset": 0.5, "offset": 0.5, "min_duration_on": 0.0, "min_duration_off": 0.0}
|
| 43 |
+
>>> pipeline = VoiceActivityDetection(segmentation="pyannote/segmentation")
|
| 44 |
+
>>> pipeline.instantiate(HYPER_PARAMETERS)
|
| 45 |
>>> vad = pipeline("audio.wav")
|
| 46 |
```
|
| 47 |
|
| 48 |
+
In order to reproduce results of the paper, one should use the following hyper-parameter values:
|
| 49 |
+
|
| 50 |
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 51 |
----------------|---------|----------|-------------------|-------------------
|
| 52 |
AMI Mix-Headset | TODO | TODO | TODO | TODO
|
| 53 |
DIHARD3 | TODO | TODO | TODO | TODO
|
| 54 |
VoxConverse | TODO | TODO | TODO | TODO
|
| 55 |
|
|
|
|
| 56 |
### Overlapped speech detection
|
| 57 |
|
| 58 |
```python
|
| 59 |
>>> from pyannote.audio.pipelines import OverlappedSpeechDetection
|
| 60 |
+
>>> pipeline = OverlappedSpeechDetection(segmentation="pyannote/segmentation")
|
| 61 |
+
>>> pipeline.instantiate(HYPER_PARAMETERS)
|
| 62 |
>>> osd = pipeline("audio.wav")
|
| 63 |
```
|
| 64 |
|
| 65 |
+
In order to reproduce results of the paper, one should use the following hyper-parameter values:
|
| 66 |
+
|
| 67 |
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 68 |
----------------|---------|----------|-------------------|-------------------
|
| 69 |
AMI Mix-Headset | TODO | TODO | TODO | TODO
|
|
|
|
| 75 |
|
| 76 |
```python
|
| 77 |
>>> from pyannote.audio.pipelines import Segmentation
|
| 78 |
+
>>> pipeline = Segmentation(segmentation="pyannote/segmentation")
|
| 79 |
+
>>> pipeline.instantiate(HYPER_PARAMETERS)
|
| 80 |
>>> seg = pipeline("audio.wav")
|
| 81 |
```
|
| 82 |
+
In order to reproduce results of the paper, one should use the following hyper-parameter values:
|
| 83 |
+
|
| 84 |
|
| 85 |
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 86 |
----------------|---------|----------|-------------------|-------------------
|
|
|
|
| 92 |
|
| 93 |
```python
|
| 94 |
>>> from pyannote.audio.pipelines import Resegmentation
|
| 95 |
+
>>> pipeline = Resegmentation(segmentation="pyannote/segmentation",
|
| 96 |
+
... diarization="baseline")
|
| 97 |
+
>>> pipeline.instantiate(HYPER_PARAMETERS)
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
VBx RTTM files are also provided in this repository for convenience:
|
| 101 |
+
|
| 102 |
+
```python
|
| 103 |
+
>>> from pyannote.database.utils import load_rttm
|
| 104 |
+
>>> vbx = load_rttm("/path/to/vbx.rttm")
|
| 105 |
+
>>> resegmented_vbx = pipeline({"audio": "DH_EVAL_000.wav",
|
| 106 |
+
... "baseline": vbx["DH_EVAL_000"]})
|
| 107 |
```
|
| 108 |
|
| 109 |
+
In order to reproduce (VBx) results of the paper, one should use the following hyper-parameter values:
|
| 110 |
+
|
| 111 |
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 112 |
----------------|---------|----------|-------------------|-------------------
|
| 113 |
AMI Mix-Headset | TODO | TODO | TODO | TODO
|
|
|
|
| 116 |
|
| 117 |
## Citations
|
| 118 |
|
|
|
|
| 119 |
```bibtex
|
| 120 |
@inproceedings{Bredin2020,
|
| 121 |
Title = {{pyannote.audio: neural building blocks for speaker diarization}},
|