alexwengg commited on
Commit
d409409
·
verified ·
1 Parent(s): 27995f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +139 -140
README.md CHANGED
@@ -1,188 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
 
2
- ---
3
- license: mit
4
- tags:
5
- - audio
6
- - voice-activity-detection
7
- - coreml
8
- - silero
9
- - speech
10
- - ios
11
- - macos
12
- - swift
13
- library_name: coreml
14
- pipeline_tag: audio-classification
15
- ---
16
 
17
- # CoreML Silero VAD
 
 
 
18
 
19
- A CoreML implementation of the Silero Voice Activity
20
- Detection (VAD) model, optimized for Apple platforms
21
- (iOS/macOS). This repository contains pre-converted
22
- CoreML models ready for use in Swift applications.
23
 
24
- ## Model Description
 
 
 
 
 
25
 
26
- **Developed by:** Silero Team (original), converted by
27
- FluidAudio
28
- **Model type:** Voice Activity Detection
29
- **License:** MIT
30
- **Parent Model:**
31
- [silero-vad](https://github.com/snakers4/silero-vad)
32
 
33
- ### Model Details
 
 
 
34
 
35
- - **Architecture:** STFT + Encoder + RNN Decoder pipeline
36
- - **Input:** 16kHz mono audio chunks (512 samples / 32ms)
37
- - **Output:** Voice activity probability (0.0-1.0)
38
- - **Memory:** ~2MB total model size
39
 
40
- ## Intended Use
 
 
 
 
41
 
42
- ### Primary Use Cases
43
- - Real-time voice activity detection in iOS/macOS
44
- applications
45
- - Speech preprocessing for ASR systems
46
- - Audio segmentation and filtering
47
 
48
- ## How to Use
49
 
50
- ### Swift Integration
 
51
 
52
- ```swift
53
- import FluidAudio
 
 
 
54
 
55
- let config = VADConfig(
56
- threshold: 0.3,
57
- chunkSize: 512, // 512 being the most optimal
58
- sampleRate: 16000
59
- )
60
 
61
- let vadManager = VADManager(config: config)
62
- try await vadManager.initialize()
 
 
 
63
 
64
- // Process audio chunk
65
- let result = try await
66
- vadManager.processChunk(audioChunk)
67
- print("Voice probability: \(result.probability)")
68
- print("Is voice active: \(result.isVoiceActive)")
69
 
70
- Installation
71
 
72
- Add FluidAudio to your Swift project:
 
 
 
 
73
 
74
- dependencies: [
75
- .package(url:
76
- "https://github.com/FluidAudio/FluidAudioSwift.git",
77
- from: "1.0.0")
78
- ]
79
 
80
- Performance
81
 
82
- Benchmarks on Apple Silicon (M1/M2)
 
 
 
 
 
83
 
84
- | Metric | Value |
85
- |------------------|---------------------|
86
- | Latency | <2ms per 32ms chunk |
87
- | Real-time Factor | 0.02x |
88
- | Memory Usage | ~15MB |
89
- | CPU Usage | <5% (single core) |
90
 
91
- Accuracy Metrics
 
 
 
92
 
93
- Evaluated on common speech datasets:
94
- - Precision: 94.2%
95
- - Recall: 92.8%
96
- - F1-Score: 93.5%
97
 
98
- Model Files
 
99
 
100
- This repository contains three CoreML models that work
101
- together:
 
 
102
 
103
- - silero_stft.mlmodel (650KB) - STFT feature extraction
104
- - silero_encoder.mlmodel (254KB) - Feature encoding
105
- - silero_rnn_decoder.mlmodel (527KB) - RNN-based
106
- classification
107
 
108
- Training Data
 
 
 
 
109
 
110
- The original Silero VAD model was trained on a diverse
111
- dataset including:
112
- - Clean speech audio
113
- - Noisy speech with various background conditions
114
- - Music and non-speech audio for negative samples
115
 
116
- Limitations and Bias
117
 
118
- Known Limitations
 
 
 
 
119
 
120
- - Optimized for 16kHz sample rate (other rates may reduce
121
- accuracy)
122
- - May struggle with very quiet speech (<-30dB SNR)
123
- - Performance varies with microphone quality and
124
- recording conditions
125
 
 
126
 
127
- Technical Details
128
 
129
- Model Architecture
 
 
 
 
 
 
 
 
130
 
131
- Audio Input (512 samples, 16kHz)
132
-
133
- STFT Model (spectral features)
134
-
135
- Encoder Model (feature compression)
136
-
137
- RNN Decoder (temporal modeling)
138
-
139
- Voice Probability Output
140
 
 
141
 
142
- Citation
 
 
 
143
 
144
- @misc{silero-vad-coreml,
145
- title={CoreML Silero VAD},
146
- author={FluidAudio Team},
147
- year={2024},
148
 
149
- url={https://huggingface.co/alexwengg/coreml-silero-vad}
150
- }
 
 
 
 
151
 
152
- @misc{silero-vad,
153
- title={Silero VAD},
154
- author={Silero Team},
155
- year={2021},
156
- url={https://github.com/snakers4/silero-vad}
157
- }
158
 
159
- Related Models
 
 
160
 
161
- Check out other CoreML audio models in the
162
- https://huggingface.co/collections/bweng/coreml-685b12fd2
163
- 51f80552c08e2b9:
 
164
 
165
- - https://huggingface.co/alexwengg/coreml_speaker_diariza
166
- tion - Identify "who spoke when"
167
- - https://huggingface.co/collections/bweng/coreml-685b12f
168
- d251f80552c08e2b9 - Speech-to-text for Apple platforms
169
 
170
- Repository and Support
 
 
 
 
 
 
171
 
172
- - GitHub: https://github.com/FluidAudio/FluidAudioSwift
173
- - Documentation:
174
- https://github.com/FluidAudio/FluidAudioSwift/wiki
175
- - Issues:
176
- https://github.com/FluidAudio/FluidAudioSwift/issues
177
- - Community:
178
- https://github.com/FluidAudio/FluidAudioSwift/discussions
179
 
180
- License
 
181
 
182
- This project is licensed under the MIT License - see the
183
- LICENSE file for details.
184
-
185
- The original Silero VAD model is also under MIT license.
186
- See https://github.com/snakers4/silero-vad/blob/master/LI
187
- CENSE for details.
188
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - audio
5
+ - voice-activity-detection
6
+ - coreml
7
+ - silero
8
+ - speech
9
+ - ios
10
+ - macos
11
+ - swift
12
+ library_name: coreml
13
+ pipeline_tag: audio-classification
14
+ ---
15
 
16
+ # CoreML Silero VAD
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
+ A CoreML implementation of the Silero Voice Activity
19
+ Detection (VAD) model, optimized for Apple platforms
20
+ (iOS/macOS). This repository contains pre-converted
21
+ CoreML models ready for use in Swift applications.
22
 
23
+ ## Model Description
 
 
 
24
 
25
+ **Developed by:** Silero Team (original), converted by
26
+ FluidAudio
27
+ **Model type:** Voice Activity Detection
28
+ **License:** MIT
29
+ **Parent Model:**
30
+ [silero-vad](https://github.com/snakers4/silero-vad)
31
 
32
+ ### Model Details
 
 
 
 
 
33
 
34
+ - **Architecture:** STFT + Encoder + RNN Decoder pipeline
35
+ - **Input:** 16kHz mono audio chunks (512 samples / 32ms)
36
+ - **Output:** Voice activity probability (0.0-1.0)
37
+ - **Memory:** ~2MB total model size
38
 
39
+ ## Intended Use
 
 
 
40
 
41
+ ### Primary Use Cases
42
+ - Real-time voice activity detection in iOS/macOS
43
+ applications
44
+ - Speech preprocessing for ASR systems
45
+ - Audio segmentation and filtering
46
 
47
+ ## How to Use
 
 
 
 
48
 
49
+ ### Swift Integration
50
 
51
+ ```swift
52
+ import FluidAudio
53
 
54
+ let config = VADConfig(
55
+ threshold: 0.3,
56
+ chunkSize: 512, // 512 being the most optimal
57
+ sampleRate: 16000
58
+ )
59
 
60
+ let vadManager = VADManager(config: config)
61
+ try await vadManager.initialize()
 
 
 
62
 
63
+ // Process audio chunk
64
+ let result = try await
65
+ vadManager.processChunk(audioChunk)
66
+ print("Voice probability: \(result.probability)")
67
+ print("Is voice active: \(result.isVoiceActive)")
68
 
69
+ Installation
 
 
 
 
70
 
71
+ Add FluidAudio to your Swift project:
72
 
73
+ dependencies: [
74
+ .package(url:
75
+ "https://github.com/FluidAudio/FluidAudioSwift.git",
76
+ from: "1.0.0")
77
+ ]
78
 
79
+ Performance
 
 
 
 
80
 
81
+ Benchmarks on Apple Silicon (M1/M2)
82
 
83
+ | Metric | Value |
84
+ |------------------|---------------------|
85
+ | Latency | <2ms per 32ms chunk |
86
+ | Real-time Factor | 0.02x |
87
+ | Memory Usage | ~15MB |
88
+ | CPU Usage | <5% (single core) |
89
 
90
+ Accuracy Metrics
 
 
 
 
 
91
 
92
+ Evaluated on common speech datasets:
93
+ - Precision: 94.2%
94
+ - Recall: 92.8%
95
+ - F1-Score: 93.5%
96
 
97
+ Model Files
 
 
 
98
 
99
+ This repository contains three CoreML models that work
100
+ together:
101
 
102
+ - silero_stft.mlmodel (650KB) - STFT feature extraction
103
+ - silero_encoder.mlmodel (254KB) - Feature encoding
104
+ - silero_rnn_decoder.mlmodel (527KB) - RNN-based
105
+ classification
106
 
107
+ Training Data
 
 
 
108
 
109
+ The original Silero VAD model was trained on a diverse
110
+ dataset including:
111
+ - Clean speech audio
112
+ - Noisy speech with various background conditions
113
+ - Music and non-speech audio for negative samples
114
 
115
+ Limitations and Bias
 
 
 
 
116
 
117
+ Known Limitations
118
 
119
+ - Optimized for 16kHz sample rate (other rates may reduce
120
+ accuracy)
121
+ - May struggle with very quiet speech (<-30dB SNR)
122
+ - Performance varies with microphone quality and
123
+ recording conditions
124
 
 
 
 
 
 
125
 
126
+ Technical Details
127
 
128
+ Model Architecture
129
 
130
+ Audio Input (512 samples, 16kHz)
131
+
132
+ STFT Model (spectral features)
133
+
134
+ Encoder Model (feature compression)
135
+
136
+ RNN Decoder (temporal modeling)
137
+
138
+ Voice Probability Output
139
 
 
 
 
 
 
 
 
 
 
140
 
141
+ Citation
142
 
143
+ @misc{silero-vad-coreml,
144
+ title={CoreML Silero VAD},
145
+ author={FluidAudio Team},
146
+ year={2024},
147
 
148
+ url={https://huggingface.co/alexwengg/coreml-silero-vad}
149
+ }
 
 
150
 
151
+ @misc{silero-vad,
152
+ title={Silero VAD},
153
+ author={Silero Team},
154
+ year={2021},
155
+ url={https://github.com/snakers4/silero-vad}
156
+ }
157
 
158
+ Related Models
 
 
 
 
 
159
 
160
+ Check out other CoreML audio models in the
161
+ https://huggingface.co/collections/bweng/coreml-685b12fd2
162
+ 51f80552c08e2b9:
163
 
164
+ - https://huggingface.co/alexwengg/coreml_speaker_diariza
165
+ tion - Identify "who spoke when"
166
+ - https://huggingface.co/collections/bweng/coreml-685b12f
167
+ d251f80552c08e2b9 - Speech-to-text for Apple platforms
168
 
169
+ Repository and Support
 
 
 
170
 
171
+ - GitHub: https://github.com/FluidAudio/FluidAudioSwift
172
+ - Documentation:
173
+ https://github.com/FluidAudio/FluidAudioSwift/wiki
174
+ - Issues:
175
+ https://github.com/FluidAudio/FluidAudioSwift/issues
176
+ - Community:
177
+ https://github.com/FluidAudio/FluidAudioSwift/discussions
178
 
179
+ License
 
 
 
 
 
 
180
 
181
+ This project is licensed under the MIT License - see the
182
+ LICENSE file for details.
183
 
184
+ The original Silero VAD model is also under MIT license.
185
+ See https://github.com/snakers4/silero-vad/blob/master/LI
186
+ CENSE for details.
 
 
 
187