bweng commited on
Commit
37e45d6
·
verified ·
1 Parent(s): 6598299

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -119
README.md CHANGED
@@ -33,6 +33,8 @@ Detection (VAD) model, optimized for Apple platforms
33
  (iOS/macOS). This repository contains pre-converted
34
  CoreML models ready for use in Swift applications.
35
 
 
 
36
  ## Model Description
37
 
38
  **Developed by:** Silero Team (original), converted by
@@ -52,6 +54,14 @@ FluidAudio
52
  - **Output:** Voice activity probability (0.0-1.0)
53
  - **Memory:** ~2MB total model size
54
 
 
 
 
 
 
 
 
 
55
  ## Intended Use
56
 
57
  ### Primary Use Cases
@@ -62,99 +72,6 @@ applications
62
 
63
  ## How to Use
64
 
65
- ### Swift Integration
66
-
67
- ```swift
68
- import FluidAudio
69
-
70
- let config = VADConfig(
71
- threshold: 0.3,
72
- chunkSize: 512, // 512 being the most optimal
73
- sampleRate: 16000
74
- )
75
-
76
- let vadManager = VADManager(config: config)
77
- try await vadManager.initialize()
78
-
79
- // Process audio chunk
80
- let result = try await
81
- vadManager.processChunk(audioChunk)
82
- print("Voice probability: \(result.probability)")
83
- print("Is voice active: \(result.isVoiceActive)")
84
- ```
85
-
86
- Installation
87
-
88
- Add FluidAudio to your Swift project:
89
-
90
- dependencies: [
91
- .package(url:
92
- "https://github.com/FluidAudio/FluidAudioSwift.git",
93
- from: "1.0.0")
94
- ]
95
-
96
- Performance
97
-
98
- Benchmarks on Apple Silicon (M1/M2)
99
-
100
- | Metric | Value |
101
- |------------------|---------------------|
102
- | Latency | <2ms per 32ms chunk |
103
- | Real-time Factor | 0.02x |
104
- | Memory Usage | ~15MB |
105
- | CPU Usage | <5% (single core) |
106
-
107
- Accuracy Metrics
108
-
109
- Evaluated on common speech datasets:
110
- - Precision: 94.2%
111
- - Recall: 92.8%
112
- - F1-Score: 93.5%
113
-
114
- Model Files
115
-
116
- This repository contains three CoreML models that work
117
- together:
118
-
119
- - silero_stft.mlmodel (650KB) - STFT feature extraction
120
- - silero_encoder.mlmodel (254KB) - Feature encoding
121
- - silero_rnn_decoder.mlmodel (527KB) - RNN-based
122
- classification
123
-
124
- Training Data
125
-
126
- The original Silero VAD model was trained on a diverse
127
- dataset including:
128
- - Clean speech audio
129
- - Noisy speech with various background conditions
130
- - Music and non-speech audio for negative samples
131
-
132
- Limitations and Bias
133
-
134
- Known Limitations
135
-
136
- - Optimized for 16kHz sample rate (other rates may reduce
137
- accuracy)
138
- - May struggle with very quiet speech (<-30dB SNR)
139
- - Performance varies with microphone quality and
140
- recording conditions
141
-
142
-
143
- Technical Details
144
-
145
- Model Architecture
146
-
147
- Audio Input (512 samples, 16kHz)
148
-
149
- STFT Model (spectral features)
150
-
151
- Encoder Model (feature compression)
152
-
153
- RNN Decoder (temporal modeling)
154
-
155
- Voice Probability Output
156
-
157
-
158
  Citation
159
 
160
  @misc{silero-vad-coreml,
@@ -172,32 +89,6 @@ url={https://huggingface.co/alexwengg/coreml-silero-vad}
172
  url={https://github.com/snakers4/silero-vad}
173
  }
174
 
175
- Related Models
176
-
177
- Check out other CoreML audio models in the
178
- https://huggingface.co/collections/bweng/coreml-685b12fd2
179
- 51f80552c08e2b9:
180
-
181
- - https://huggingface.co/alexwengg/coreml_speaker_diariza
182
- tion - Identify "who spoke when"
183
- - https://huggingface.co/collections/bweng/coreml-685b12f
184
- d251f80552c08e2b9 - Speech-to-text for Apple platforms
185
-
186
- Repository and Support
187
 
188
  - GitHub: https://github.com/FluidAudio/FluidAudioSwift
189
- - Documentation:
190
- https://github.com/FluidAudio/FluidAudioSwift/wiki
191
- - Issues:
192
- https://github.com/FluidAudio/FluidAudioSwift/issues
193
- - Community:
194
- https://github.com/FluidAudio/FluidAudioSwift/discussions
195
-
196
- License
197
-
198
- This project is licensed under the MIT License - see the
199
- LICENSE file for details.
200
 
201
- The original Silero VAD model is also under MIT license.
202
- See https://github.com/snakers4/silero-vad/blob/master/LI
203
- CENSE for details.
 
33
  (iOS/macOS). This repository contains pre-converted
34
  CoreML models ready for use in Swift applications.
35
 
36
+ See FluidAudio Repo link at the top for more information
37
+
38
  ## Model Description
39
 
40
  **Developed by:** Silero Team (original), converted by
 
54
  - **Output:** Voice activity probability (0.0-1.0)
55
  - **Memory:** ~2MB total model size
56
 
57
+ ```
58
+ | Metric | FP16 Baseline | 4-Bit Quantized |
59
+ |---------------------|---------------|-----------------|
60
+ | Correlation | - | 0.9999 |
61
+ | Mean Absolute Error | - | 0.0045 |
62
+ | Model Size | 0.90 MB | 0.21 MB |
63
+ ```
64
+
65
  ## Intended Use
66
 
67
  ### Primary Use Cases
 
72
 
73
  ## How to Use
74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  Citation
76
 
77
  @misc{silero-vad-coreml,
 
89
  url={https://github.com/snakers4/silero-vad}
90
  }
91
 
 
 
 
 
 
 
 
 
 
 
 
 
92
 
93
  - GitHub: https://github.com/FluidAudio/FluidAudioSwift
 
 
 
 
 
 
 
 
 
 
 
94