Ziyi Lin commited on
Commit
97e7f89
·
1 Parent(s): 8cbbe9b

Update testset name

Browse files
README.md CHANGED
@@ -15,11 +15,11 @@ license: apache-2.0
15
 
16
  ### **1. High-Performance:**
17
 
18
- The precision-recall curves comparing the performance of WebRTC VAD (pitch-based), Silero VAD, and TEN VAD are shown below. The evaluation is conducted on the precisely manually annotated TEN-VAD-TestSet. The audio files are from librispeech, gigaspeech, DNS Challenge etc. As demonstrated, TEN VAD achieves the best performance. Additionally, cross-validation experiments conducted on large internal real-world datasets demonstrate the reproducibility of these findings. The **TEN-VAD-TestSet with annotated labels** is released in directory "TEN-VAD-TestSet" of this repository.
19
 
20
 
21
  <div style="text-align:">
22
- <img src="./examples/images/PR_Curves_TEN-VAD-TestSet.png" width="800">
23
  </div>
24
 
25
  Note that the default threshold of 0.5 is used to generate binary speech indicators (0 for non-speech signal, 1 for speech signal). This threshold needs to be tuned according to your domain-specific task. The precision-recall curve can be obtained by executing the following script on Linux x64. The output figure will be saved in the same directory as the script.
 
15
 
16
  ### **1. High-Performance:**
17
 
18
+ The precision-recall curves comparing the performance of WebRTC VAD (pitch-based), Silero VAD, and TEN VAD are shown below. The evaluation is conducted on the precisely manually annotated testset. The audio files are from librispeech, gigaspeech, DNS Challenge etc. As demonstrated, TEN VAD achieves the best performance. Additionally, cross-validation experiments conducted on large internal real-world datasets demonstrate the reproducibility of these findings. The **testset with annotated labels** is released in directory "testset" of this repository.
19
 
20
 
21
  <div style="text-align:">
22
+ <img src="./examples/images/PR_Curves_testset.png" width="800">
23
  </div>
24
 
25
  Note that the default threshold of 0.5 is used to generate binary speech indicators (0 for non-speech signal, 1 for speech signal). This threshold needs to be tuned according to your domain-specific task. The precision-recall curve can be obtained by executing the following script on Linux x64. The output figure will be saved in the same directory as the script.
examples/images/{PR_Curves_TEN-VAD-TestSet.png → PR_Curves_testset.png} RENAMED
File without changes
examples/plot_pr_curves.py CHANGED
@@ -114,8 +114,8 @@ if __name__ == "__main__":
114
  # Get the directory of the script
115
  script_dir = os.path.dirname(os.path.abspath(__file__))
116
 
117
- # TEN-VAD-TestSet dir
118
- test_dir = f"{script_dir}/../TEN-VAD-TestSet"
119
 
120
  # Initialization
121
  hop_size = 256
 
114
  # Get the directory of the script
115
  script_dir = os.path.dirname(os.path.abspath(__file__))
116
 
117
+ # testset dir
118
+ test_dir = f"{script_dir}/../testset"
119
 
120
  # Initialization
121
  hop_size = 256