Voice Activity Detection
pyannote.audio
PyTorch
pyannote
pyannote-audio-model
audio
voice
speech
speaker
speaker-diarization
speaker-change-detection
speaker-segmentation
overlapped-speech-detection
resegmentation
Instructions to use objects76/speaker-diarization-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- pyannote.audio
How to use objects76/speaker-diarization-v1 with pyannote.audio:
from pyannote.audio import Model, Inference model = Model.from_pretrained("objects76/speaker-diarization-v1") inference = Inference(model) # inference on the whole file inference("file.wav") # inference on an excerpt from pyannote.core import Segment excerpt = Segment(start=2.0, end=5.0) inference.crop("file.wav", excerpt) - Notebooks
- Google Colab
- Kaggle
source: /tmp/config.yaml
Browse files- config.yaml +19 -0
config.yaml
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
task:
|
| 2 |
+
_target_: pyannote.audio.tasks.SpeakerDiarization
|
| 3 |
+
duration: 10.0
|
| 4 |
+
max_speakers_per_chunk: 3
|
| 5 |
+
max_speakers_per_frame: 2
|
| 6 |
+
model:
|
| 7 |
+
_target_: pyannote.audio.models.segmentation.PyanNet
|
| 8 |
+
sample_rate: 16000
|
| 9 |
+
num_channels: 1
|
| 10 |
+
sincnet:
|
| 11 |
+
stride: 10
|
| 12 |
+
lstm:
|
| 13 |
+
hidden_size: 128
|
| 14 |
+
num_layers: 4
|
| 15 |
+
bidirectional: true
|
| 16 |
+
monolithic: true
|
| 17 |
+
linear:
|
| 18 |
+
hidden_size: 128
|
| 19 |
+
num_layers: 2
|