FluidInference
/

silero-vad-coreml

Voice Activity Detection

Model card Files Files and versions

silero-vad-coreml / README.md

bweng's picture

Update README.md

37b639c verified 4 months ago

|

history blame contribute delete

2.47 kB

	---
	license: mit
	tags:
	- audio
	- voice-activity-detection
	- coreml
	- silero
	- speech
	- ios
	- macos
	- swift
	library_name: coreml
	pipeline_tag: voice-activity-detection
	datasets:
	- alexwengg/musan_mini50
	- alexwengg/musan_mini100
	metrics:
	- accuracy
	- f1
	language:
	- en
	base_model:
	- onnx-community/silero-vad
	---


	# <span style="color:#5DAF8D">🧃 CoreML Silero VAD </span>
	[![Discord](https://img.shields.io/badge/Discord-Join%20Chat-7289da.svg)](https://discord.gg/WNsvaCtmDe)
	[![GitHub Repo stars](https://img.shields.io/github/stars/FluidInference/FluidAudio?style=flat&logo=github)](https://github.com/FluidInference/FluidAudio)

	A CoreML implementation of the Silero Voice Activity
	Detection (VAD) model, optimized for Apple platforms
	(iOS/macOS). This repository contains pre-converted
	CoreML models ready for use in Swift applications.

	See FluidAudio Repo link at the top for more information

	## Model Description

	Developed by: Silero Team (original), converted by
	FluidAudio

	Model type: Voice Activity Detection

	License: MIT

	Parent Model:
	[silero-vad](https://github.com/snakers4/silero-vad)


	This is how the model performs against the silero-vad v6.0.0 basline Pytorch JIT version

	![graphs/yc_standard_comparison_20250915_205721_2c04b81.png](graphs/yc_standard_comparison_20250915_205721_2c04b81.png)
	![graphs/yc_256ms_comparison_20250915_205721_2c04b81.png](graphs/yc_256ms_comparison_20250915_205721_2c04b81.png)

	Note that we tested the quantized versions, as the model is already tiny, theres no performance imporvement at all.


	This is how the different models compare in terms of speed, the 256s takes in 8 chunks of 32ms and processes it in batches so its much faster
	![graphs/yc_performance_20250915_205721_2c04b81.png](graphs/yc_performance_20250915_205721_2c04b81.png)


	Conversion code is available here: [FluidInference/mobius](https://github.com/FluidInference/mobius)

	## Intended Use

	### Primary Use Cases
	- Real-time voice activity detection in iOS/macOS
	applications
	- Speech preprocessing for ASR systems
	- Audio segmentation and filtering

	## How to Use

	Citation

	@misc{silero-vad-coreml,
	title={CoreML Silero VAD},
	author={FluidAudio Team},
	year={2024},

	url={https://huggingface.co/alexwengg/coreml-silero-vad}
	}

	@misc{silero-vad,
	title={Silero VAD},
	author={Silero Team},
	year={2021},
	url={https://github.com/snakers4/silero-vad}
	}


	- GitHub: https://github.com/FluidAudio/FluidAudioSwift