objects76
/

speaker-diarization-v1

Voice Activity Detection

pyannote-audio-model

speaker-diarization

speaker-change-detection

speaker-segmentation

overlapped-speech-detection

Model card Files Files and versions

objects76 commited on Sep 9, 2024

Commit

d6695c1

·

verified ·

1 Parent(s): f5ed452

source: ./README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ extra_gated_fields:
   Website: text
 ---
-Using this open-source model in production?
 Consider switching to [pyannoteAI](https://www.pyannote.ai) for better and faster options.
 # 🎹 "Powerset" speaker segmentation
@@ -33,7 +33,7 @@ This model ingests 10 seconds of mono audio sampled at 16kHz and outputs speaker
 ```python
 # waveform (first row)
 duration, sample_rate, num_channels = 10, 16000, 1
-waveform = torch.randn(batch_size, num_channels, duration * sample_rate)
 # powerset multi-class encoding (second row)
 powerset_encoding = model(waveform)
@@ -42,7 +42,7 @@ powerset_encoding = model(waveform)
 from pyannote.audio.utils.powerset import Powerset
 max_speakers_per_chunk, max_speakers_per_frame = 3, 2
 to_multilabel = Powerset(
-    max_speakers_per_chunk,
     max_speakers_per_frame).to_multilabel
 multilabel_encoding = to_multilabel(powerset_encoding)
 ```
@@ -66,13 +66,13 @@ This [companion repository](https://github.com/FrenchKrab/IS2023-powerset-diariz
 # instantiate the model
 from pyannote.audio import Model
 model = Model.from_pretrained(
-  "pyannote/segmentation-3.0",
   use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")
 ```
 ### Speaker diarization
-This model cannot be used to perform speaker diarization of full recordings on its own (it only processes 10s chunks).
 See [pyannote/speaker-diarization-3.0](https://hf.co/pyannote/speaker-diarization-3.0) pipeline that uses an additional speaker embedding model to perform full recording speaker diarization.

   Website: text
 ---
+Using this open-source model in production?
 Consider switching to [pyannoteAI](https://www.pyannote.ai) for better and faster options.
 # 🎹 "Powerset" speaker segmentation
 ```python
 # waveform (first row)
 duration, sample_rate, num_channels = 10, 16000, 1
+waveform = torch.randn(batch_size, num_channels, duration * sample_rate)
 # powerset multi-class encoding (second row)
 powerset_encoding = model(waveform)
 from pyannote.audio.utils.powerset import Powerset
 max_speakers_per_chunk, max_speakers_per_frame = 3, 2
 to_multilabel = Powerset(
+    max_speakers_per_chunk,
     max_speakers_per_frame).to_multilabel
 multilabel_encoding = to_multilabel(powerset_encoding)
 ```
 # instantiate the model
 from pyannote.audio import Model
 model = Model.from_pretrained(
+  "pyannote/segmentation-3.0",
   use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")
 ```
 ### Speaker diarization
+This model cannot be used to perform speaker diarization of full recordings on its own (it only processes 10s chunks).
 See [pyannote/speaker-diarization-3.0](https://hf.co/pyannote/speaker-diarization-3.0) pipeline that uses an additional speaker embedding model to perform full recording speaker diarization.