source: ./README.md
Browse files
README.md
CHANGED
|
@@ -21,7 +21,7 @@ extra_gated_fields:
|
|
| 21 |
Website: text
|
| 22 |
---
|
| 23 |
|
| 24 |
-
Using this open-source model in production?
|
| 25 |
Consider switching to [pyannoteAI](https://www.pyannote.ai) for better and faster options.
|
| 26 |
|
| 27 |
# 🎹 "Powerset" speaker segmentation
|
|
@@ -33,7 +33,7 @@ This model ingests 10 seconds of mono audio sampled at 16kHz and outputs speaker
|
|
| 33 |
```python
|
| 34 |
# waveform (first row)
|
| 35 |
duration, sample_rate, num_channels = 10, 16000, 1
|
| 36 |
-
waveform = torch.randn(batch_size, num_channels, duration * sample_rate)
|
| 37 |
|
| 38 |
# powerset multi-class encoding (second row)
|
| 39 |
powerset_encoding = model(waveform)
|
|
@@ -42,7 +42,7 @@ powerset_encoding = model(waveform)
|
|
| 42 |
from pyannote.audio.utils.powerset import Powerset
|
| 43 |
max_speakers_per_chunk, max_speakers_per_frame = 3, 2
|
| 44 |
to_multilabel = Powerset(
|
| 45 |
-
max_speakers_per_chunk,
|
| 46 |
max_speakers_per_frame).to_multilabel
|
| 47 |
multilabel_encoding = to_multilabel(powerset_encoding)
|
| 48 |
```
|
|
@@ -66,13 +66,13 @@ This [companion repository](https://github.com/FrenchKrab/IS2023-powerset-diariz
|
|
| 66 |
# instantiate the model
|
| 67 |
from pyannote.audio import Model
|
| 68 |
model = Model.from_pretrained(
|
| 69 |
-
"pyannote/segmentation-3.0",
|
| 70 |
use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")
|
| 71 |
```
|
| 72 |
|
| 73 |
### Speaker diarization
|
| 74 |
|
| 75 |
-
This model cannot be used to perform speaker diarization of full recordings on its own (it only processes 10s chunks).
|
| 76 |
|
| 77 |
See [pyannote/speaker-diarization-3.0](https://hf.co/pyannote/speaker-diarization-3.0) pipeline that uses an additional speaker embedding model to perform full recording speaker diarization.
|
| 78 |
|
|
|
|
| 21 |
Website: text
|
| 22 |
---
|
| 23 |
|
| 24 |
+
Using this open-source model in production?
|
| 25 |
Consider switching to [pyannoteAI](https://www.pyannote.ai) for better and faster options.
|
| 26 |
|
| 27 |
# 🎹 "Powerset" speaker segmentation
|
|
|
|
| 33 |
```python
|
| 34 |
# waveform (first row)
|
| 35 |
duration, sample_rate, num_channels = 10, 16000, 1
|
| 36 |
+
waveform = torch.randn(batch_size, num_channels, duration * sample_rate)
|
| 37 |
|
| 38 |
# powerset multi-class encoding (second row)
|
| 39 |
powerset_encoding = model(waveform)
|
|
|
|
| 42 |
from pyannote.audio.utils.powerset import Powerset
|
| 43 |
max_speakers_per_chunk, max_speakers_per_frame = 3, 2
|
| 44 |
to_multilabel = Powerset(
|
| 45 |
+
max_speakers_per_chunk,
|
| 46 |
max_speakers_per_frame).to_multilabel
|
| 47 |
multilabel_encoding = to_multilabel(powerset_encoding)
|
| 48 |
```
|
|
|
|
| 66 |
# instantiate the model
|
| 67 |
from pyannote.audio import Model
|
| 68 |
model = Model.from_pretrained(
|
| 69 |
+
"pyannote/segmentation-3.0",
|
| 70 |
use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")
|
| 71 |
```
|
| 72 |
|
| 73 |
### Speaker diarization
|
| 74 |
|
| 75 |
+
This model cannot be used to perform speaker diarization of full recordings on its own (it only processes 10s chunks).
|
| 76 |
|
| 77 |
See [pyannote/speaker-diarization-3.0](https://hf.co/pyannote/speaker-diarization-3.0) pipeline that uses an additional speaker embedding model to perform full recording speaker diarization.
|
| 78 |
|