speechbrain
/

vad-crdnn-libriparty

Voice Activity Detection

Speech Activity Detection

Speaker Diarization

Model card Files Files and versions

Jimmy-test

#3

by Zhongzhimin - opened Mar 3, 2023

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

This PR is in draft mode

Files changed (2) hide show

README.md +4 -4
hyperparams.yaml +1 -1

README.md CHANGED Viewed

@@ -72,7 +72,7 @@ Please notice that we encourage you to read our tutorials and learn more about
 ### Perform Voice Activity Detection
 ```
-from speechbrain.inference.VAD import VAD
 VAD = VAD.from_hparams(source="speechbrain/vad-crdnn-libriparty", savedir="pretrained_models/vad-crdnn-libriparty")
 boundaries = VAD.get_speech_segments("speechbrain/vad-crdnn-libriparty/example_vad.wav")
@@ -93,7 +93,7 @@ To do it:
 ```
 import torchaudio
-upsampled_boundaries = VAD.upsample_boundaries(boundaries, 'example_vad.wav')
 torchaudio.save('vad_final.wav', upsampled_boundaries.cpu(), 16000)
 ```
@@ -116,11 +116,11 @@ We designed the VAD such that you can have access to all of these steps (this mi
 ```python
-from speechbrain.inference.VAD import VAD
 VAD = VAD.from_hparams(source="speechbrain/vad-crdnn-libriparty", savedir="pretrained_models/vad-crdnn-libriparty")
 # 1- Let's compute frame-level posteriors first
-audio_file = "example.wav"
 prob_chunks = VAD.get_speech_prob_file(audio_file)
 # 2- Let's apply a threshold on top of the posteriors

 ### Perform Voice Activity Detection
 ```
+from speechbrain.pretrained import VAD
 VAD = VAD.from_hparams(source="speechbrain/vad-crdnn-libriparty", savedir="pretrained_models/vad-crdnn-libriparty")
 boundaries = VAD.get_speech_segments("speechbrain/vad-crdnn-libriparty/example_vad.wav")
 ```
 import torchaudio
+upsampled_boundaries = VAD.upsample_boundaries(boundaries, 'pretrained_model_checkpoints/example_vad.wav')
 torchaudio.save('vad_final.wav', upsampled_boundaries.cpu(), 16000)
 ```
 ```python
+from speechbrain.pretrained import VAD
 VAD = VAD.from_hparams(source="speechbrain/vad-crdnn-libriparty", savedir="pretrained_models/vad-crdnn-libriparty")
 # 1- Let's compute frame-level posteriors first
+audio_file = 'pretrained_model_checkpoints/example_vad.wav'
 prob_chunks = VAD.get_speech_prob_file(audio_file)
 # 2- Let's apply a threshold on top of the posteriors

hyperparams.yaml CHANGED Viewed

@@ -21,7 +21,7 @@ rnn_bidirectional: True
 dnn_blocks: 1
 dnn_neurons: 16
 output_neurons: 1
-device: 'cpu' # or 'cuda'
 # Feature/Model objects
 compute_features: !new:speechbrain.lobes.features.Fbank

 dnn_blocks: 1
 dnn_neurons: 16
 output_neurons: 1
+device: 'cpu' # set 'cuda:0' for gpu
 # Feature/Model objects
 compute_features: !new:speechbrain.lobes.features.Fbank