|
|
--- |
|
|
pipeline_tag: voice-activity-detection |
|
|
license: bsd-2-clause |
|
|
tags: |
|
|
- speech-processing |
|
|
- semantic-vad |
|
|
- multilingual |
|
|
datasets: |
|
|
- pipecat-ai/smart-turn-data-v3.1-train |
|
|
- pipecat-ai/smart-turn-data-v3.1-test |
|
|
--- |
|
|
|
|
|
# Smart Turn v3.x |
|
|
|
|
|
**Smart Turn** is an open‑source semantic Voice Activity Detection (VAD) model that tells you whether a speaker has finished their turn by analysing the raw waveform, not the transcript. |
|
|
|
|
|
## Links |
|
|
|
|
|
* [Blog post: Smart Turn v3](https://www.daily.co/blog/announcing-smart-turn-v3-with-cpu-inference-in-just-12ms/) |
|
|
* [GitHub repo](https://github.com/pipecat-ai/smart-turn) with training and inference code, and more information |
|
|
* [Datasets](https://huggingface.co/pipecat-ai/datasets) |
|
|
|
|
|
|
|
|
## Model architecture |
|
|
|
|
|
* Backbone: Whisper Tiny encoder |
|
|
* Head: shallow linear classifier |
|
|
* Params: 8M |
|
|
* Checkpoint: 8 MB ONNX (int8 quantized), 32MB ONNX (unquantized) |
|
|
|
|
|
|
|
|
## How to use |
|
|
|
|
|
Please see the blog post and GitHub repo for more information on using the model, either standalone or with Pipecat. |
|
|
|
|
|
|
|
|
## Thanks |
|
|
|
|
|
Thank you to the following organisations for contributing audio datasets: |
|
|
|
|
|
- [Liva AI](https://www.theliva.ai/) |
|
|
- [Midcentury](https://www.midcentury.xyz/) |
|
|
- [MundoAI](https://mundoai.world/) |
|
|
|
|
|
|