Voice Activity Detection
ONNX
speech-processing
semantic-vad
multilingual
marcus-daily commited on
Commit
e6cfb72
·
1 Parent(s): 27ab809

Add initial model card

Browse files
Files changed (1) hide show
  1. README.md +34 -3
README.md CHANGED
@@ -1,3 +1,34 @@
1
- ---
2
- license: bsd-2-clause
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: voice-activity-detection
3
+ license: bsd-2-clause
4
+ tags:
5
+ - speech-processing
6
+ - semantic-vad
7
+ - multilingual
8
+ datasets:
9
+ - pipecat-ai/smart-turn-data-v3-train
10
+ - pipecat-ai/smart-turn-data-v3-test
11
+ ---
12
+
13
+ # Smart Turn v3
14
+
15
+ **Smart Turn v3** is an open‑source semantic Voice Activity Detection (VAD) model that tells you whether a speaker has finished their turn by analysing the raw waveform, not the transcript.
16
+
17
+ ## Links
18
+
19
+ * [Blog post: Smart Turn v3](https://www.daily.co/blog/)
20
+ * [GitHub repo](https://github.com/pipecat-ai/smart-turn) with training and inference code
21
+ * [Datasets](https://github.com/pipecat-ai/datasets) with training and inference code
22
+
23
+
24
+ ## Model architecture
25
+
26
+ * Backbone : Whisper Tiny encoder
27
+ * Head     : shallow linear classifier
28
+ * Params   : 8 M (int8)
29
+ * Checkpoint: 8 MB ONNX
30
+
31
+
32
+ ## How to use
33
+
34
+ Please see the blog post and GitHub repo for more information on using the model, either standalone or with Pipecat.