silero-vad-coreml / README.md
alexwengg's picture
Update README.md
53219e2 verified
|
raw
history blame
4.26 kB
---
license: mit
tags:
- audio
- voice-activity-detection
- coreml
- silero
- speech
- ios
- macos
- swift
library_name: coreml
pipeline_tag: voice-activity-detection
datasets:
- alexwengg/musan_mini50
- alexwengg/musan_mini100
metrics:
- accuracy
- f1
language:
- en
base_model:
- onnx-community/silero-vad
---
# CoreML Silero VAD
A CoreML implementation of the Silero Voice Activity
Detection (VAD) model, optimized for Apple platforms
(iOS/macOS). This repository contains pre-converted
CoreML models ready for use in Swift applications.
## Model Description
**Developed by:** Silero Team (original), converted by
FluidAudio
**Model type:** Voice Activity Detection
**License:** MIT
**Parent Model:**
[silero-vad](https://github.com/snakers4/silero-vad)
### Model Details
- **Architecture:** STFT + Encoder + RNN Decoder pipeline
- **Input:** 16kHz mono audio chunks (512 samples / 32ms)
- **Output:** Voice activity probability (0.0-1.0)
- **Memory:** ~2MB total model size
## Intended Use
### Primary Use Cases
- Real-time voice activity detection in iOS/macOS
applications
- Speech preprocessing for ASR systems
- Audio segmentation and filtering
## How to Use
### Swift Integration
```swift
import FluidAudio
let config = VADConfig(
threshold: 0.3,
chunkSize: 512, // 512 being the most optimal
sampleRate: 16000
)
let vadManager = VADManager(config: config)
try await vadManager.initialize()
// Process audio chunk
let result = try await
vadManager.processChunk(audioChunk)
print("Voice probability: \(result.probability)")
print("Is voice active: \(result.isVoiceActive)")
Installation
Add FluidAudio to your Swift project:
dependencies: [
.package(url:
"https://github.com/FluidAudio/FluidAudioSwift.git",
from: "1.0.0")
]
Performance
Benchmarks on Apple Silicon (M1/M2)
| Metric | Value |
|------------------|---------------------|
| Latency | <2ms per 32ms chunk |
| Real-time Factor | 0.02x |
| Memory Usage | ~15MB |
| CPU Usage | <5% (single core) |
Accuracy Metrics
Evaluated on common speech datasets:
- Precision: 94.2%
- Recall: 92.8%
- F1-Score: 93.5%
Model Files
This repository contains three CoreML models that work
together:
- silero_stft.mlmodel (650KB) - STFT feature extraction
- silero_encoder.mlmodel (254KB) - Feature encoding
- silero_rnn_decoder.mlmodel (527KB) - RNN-based
classification
Training Data
The original Silero VAD model was trained on a diverse
dataset including:
- Clean speech audio
- Noisy speech with various background conditions
- Music and non-speech audio for negative samples
Limitations and Bias
Known Limitations
- Optimized for 16kHz sample rate (other rates may reduce
accuracy)
- May struggle with very quiet speech (<-30dB SNR)
- Performance varies with microphone quality and
recording conditions
Technical Details
Model Architecture
Audio Input (512 samples, 16kHz)
STFT Model (spectral features)
Encoder Model (feature compression)
RNN Decoder (temporal modeling)
Voice Probability Output
Citation
@misc{silero-vad-coreml,
title={CoreML Silero VAD},
author={FluidAudio Team},
year={2024},
url={https://huggingface.co/alexwengg/coreml-silero-vad}
}
@misc{silero-vad,
title={Silero VAD},
author={Silero Team},
year={2021},
url={https://github.com/snakers4/silero-vad}
}
Related Models
Check out other CoreML audio models in the
https://huggingface.co/collections/bweng/coreml-685b12fd2
51f80552c08e2b9:
- https://huggingface.co/alexwengg/coreml_speaker_diariza
tion - Identify "who spoke when"
- https://huggingface.co/collections/bweng/coreml-685b12f
d251f80552c08e2b9 - Speech-to-text for Apple platforms
Repository and Support
- GitHub: https://github.com/FluidAudio/FluidAudioSwift
- Documentation:
https://github.com/FluidAudio/FluidAudioSwift/wiki
- Issues:
https://github.com/FluidAudio/FluidAudioSwift/issues
- Community:
https://github.com/FluidAudio/FluidAudioSwift/discussions
License
This project is licensed under the MIT License - see the
LICENSE file for details.
The original Silero VAD model is also under MIT license.
See https://github.com/snakers4/silero-vad/blob/master/LI
CENSE for details.