File size: 2,472 Bytes
d409409
 
 
 
 
 
 
 
 
 
 
 
1a7848a
 
 
 
 
 
 
 
 
 
 
d409409
bf5d140
27995f9
85e4a61
bf5d140
 
27995f9
d409409
 
 
 
27995f9
37e45d6
 
d409409
27995f9
d409409
 
53219e2
d409409
53219e2
d409409
53219e2
d409409
 
27995f9
3f83896
 
 
26ad3db
 
3f83896
3a19172
 
 
 
26ad3db
3f83896
 
37b639c
37e45d6
d409409
27995f9
d409409
 
 
 
 
27995f9
d409409
27995f9
d409409
27995f9
d409409
 
 
 
27995f9
d409409
 
27995f9
d409409
 
 
 
 
 
27995f9
 
d409409
27995f9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
license: mit
tags:
- audio
- voice-activity-detection
- coreml
- silero
- speech
- ios
- macos
- swift
library_name: coreml
pipeline_tag: voice-activity-detection
datasets:
- alexwengg/musan_mini50
- alexwengg/musan_mini100
metrics:
- accuracy
- f1
language:
- en
base_model:
- onnx-community/silero-vad
---


# **<span style="color:#5DAF8D">🧃 CoreML Silero VAD </span>**
[![Discord](https://img.shields.io/badge/Discord-Join%20Chat-7289da.svg)](https://discord.gg/WNsvaCtmDe)
[![GitHub Repo stars](https://img.shields.io/github/stars/FluidInference/FluidAudio?style=flat&logo=github)](https://github.com/FluidInference/FluidAudio)

A CoreML implementation of the Silero Voice Activity
Detection (VAD) model, optimized for Apple platforms
(iOS/macOS). This repository contains pre-converted
CoreML models ready for use in Swift applications.

See FluidAudio Repo link at the top for more information

## Model Description

**Developed by:** Silero Team (original), converted by
FluidAudio

**Model type:** Voice Activity Detection

**License:** MIT

**Parent Model:**
[silero-vad](https://github.com/snakers4/silero-vad)


This is how the model performs against the silero-vad v6.0.0 basline Pytorch JIT version 

![graphs/yc_standard_comparison_20250915_205721_2c04b81.png](graphs/yc_standard_comparison_20250915_205721_2c04b81.png)
![graphs/yc_256ms_comparison_20250915_205721_2c04b81.png](graphs/yc_256ms_comparison_20250915_205721_2c04b81.png)

Note that we tested the quantized versions, as the model is already tiny, theres no performance imporvement at all. 


This is how the different models compare in terms of speed, the 256s takes in 8 chunks of 32ms and processes it in batches so its much faster
![graphs/yc_performance_20250915_205721_2c04b81.png](graphs/yc_performance_20250915_205721_2c04b81.png)


Conversion code is available here: [FluidInference/mobius](https://github.com/FluidInference/mobius)

## Intended Use

### Primary Use Cases
- Real-time voice activity detection in iOS/macOS
applications
- Speech preprocessing for ASR systems
- Audio segmentation and filtering

## How to Use

Citation

@misc{silero-vad-coreml,
  title={CoreML Silero VAD},
  author={FluidAudio Team},
  year={2024},

url={https://huggingface.co/alexwengg/coreml-silero-vad}
}

@misc{silero-vad,
  title={Silero VAD},
  author={Silero Team},
  year={2021},
  url={https://github.com/snakers4/silero-vad}
}


- GitHub: https://github.com/FluidAudio/FluidAudioSwift