FluidInference
/

silero-vad-coreml

@@ -1,188 +1,187 @@
-  ---
-  license: mit
-  tags:
-  - audio
-  - voice-activity-detection
-  - coreml
-  - silero
-  - speech
-  - ios
-  - macos
-  - swift
-  library_name: coreml
-  pipeline_tag: audio-classification
-  ---
-  # CoreML Silero VAD
-  A CoreML implementation of the Silero Voice Activity
-  Detection (VAD) model, optimized for Apple platforms
-  (iOS/macOS). This repository contains pre-converted
-  CoreML models ready for use in Swift applications.
-  ## Model Description
-  **Developed by:** Silero Team (original), converted by
-  FluidAudio
-  **Model type:** Voice Activity Detection
-  **License:** MIT
-  **Parent Model:**
-  [silero-vad](https://github.com/snakers4/silero-vad)
-  ### Model Details
-  - **Architecture:** STFT + Encoder + RNN Decoder pipeline
-  - **Input:** 16kHz mono audio chunks (512 samples / 32ms)
-  - **Output:** Voice activity probability (0.0-1.0)
-  - **Memory:** ~2MB total model size
-  ## Intended Use
-  ### Primary Use Cases
-  - Real-time voice activity detection in iOS/macOS
-  applications
-  - Speech preprocessing for ASR systems
-  - Audio segmentation and filtering
-  ## How to Use
-  ### Swift Integration
-  ```swift
-  import FluidAudio
-  let config = VADConfig(
-      threshold: 0.3,
-      chunkSize: 512, // 512 being the most optimal
-      sampleRate: 16000
-  )
-  let vadManager = VADManager(config: config)
-  try await vadManager.initialize()
-  // Process audio chunk
-  let result = try await
-  vadManager.processChunk(audioChunk)
-  print("Voice probability: \(result.probability)")
-  print("Is voice active: \(result.isVoiceActive)")
-  Installation
-  Add FluidAudio to your Swift project:
-  dependencies: [
-      .package(url:
-  "https://github.com/FluidAudio/FluidAudioSwift.git",
-  from: "1.0.0")
-  ]
-  Performance
-  Benchmarks on Apple Silicon (M1/M2)
-  | Metric           | Value               |
-  |------------------|---------------------|
-  | Latency          | <2ms per 32ms chunk |
-  | Real-time Factor | 0.02x               |
-  | Memory Usage     | ~15MB               |
-  | CPU Usage        | <5% (single core)   |
-  Accuracy Metrics
-  Evaluated on common speech datasets:
-  - Precision: 94.2%
-  - Recall: 92.8%
-  - F1-Score: 93.5%
-  Model Files
-  This repository contains three CoreML models that work
-  together:
-  - silero_stft.mlmodel (650KB) - STFT feature extraction
-  - silero_encoder.mlmodel (254KB) - Feature encoding
-  - silero_rnn_decoder.mlmodel (527KB) - RNN-based
-  classification
-  Training Data
-  The original Silero VAD model was trained on a diverse
-  dataset including:
-  - Clean speech audio
-  - Noisy speech with various background conditions
-  - Music and non-speech audio for negative samples
-  Limitations and Bias
-  Known Limitations
-  - Optimized for 16kHz sample rate (other rates may reduce
-   accuracy)
-  - May struggle with very quiet speech (<-30dB SNR)
-  - Performance varies with microphone quality and
-  recording conditions
-  Technical Details
-  Model Architecture
-  Audio Input (512 samples, 16kHz)
-      ↓
-  STFT Model (spectral features)
-      ↓
-  Encoder Model (feature compression)
-      ↓
-  RNN Decoder (temporal modeling)
-      ↓
-  Voice Probability Output
-  Citation
-  @misc{silero-vad-coreml,
-    title={CoreML Silero VAD},
-    author={FluidAudio Team},
-    year={2024},
-  url={https://huggingface.co/alexwengg/coreml-silero-vad}
-  }
-  @misc{silero-vad,
-    title={Silero VAD},
-    author={Silero Team},
-    year={2021},
-    url={https://github.com/snakers4/silero-vad}
-  }
-  Related Models
-  Check out other CoreML audio models in the
-  https://huggingface.co/collections/bweng/coreml-685b12fd2
-  51f80552c08e2b9:
-  - https://huggingface.co/alexwengg/coreml_speaker_diariza
-  tion - Identify "who spoke when"
-  - https://huggingface.co/collections/bweng/coreml-685b12f
-  d251f80552c08e2b9 - Speech-to-text for Apple platforms
-  Repository and Support
-  - GitHub: https://github.com/FluidAudio/FluidAudioSwift
-  - Documentation:
-  https://github.com/FluidAudio/FluidAudioSwift/wiki
-  - Issues:
-  https://github.com/FluidAudio/FluidAudioSwift/issues
-  - Community:
-  https://github.com/FluidAudio/FluidAudioSwift/discussions
-  License
-  This project is licensed under the MIT License - see the
-  LICENSE file for details.
-  The original Silero VAD model is also under MIT license.
-  See https://github.com/snakers4/silero-vad/blob/master/LI
-  CENSE for details.

+---
+license: mit
+tags:
+- audio
+- voice-activity-detection
+- coreml
+- silero
+- speech
+- ios
+- macos
+- swift
+library_name: coreml
+pipeline_tag: audio-classification
+---
+# CoreML Silero VAD
+A CoreML implementation of the Silero Voice Activity
+Detection (VAD) model, optimized for Apple platforms
+(iOS/macOS). This repository contains pre-converted
+CoreML models ready for use in Swift applications.
+## Model Description
+**Developed by:** Silero Team (original), converted by
+FluidAudio
+**Model type:** Voice Activity Detection
+**License:** MIT
+**Parent Model:**
+[silero-vad](https://github.com/snakers4/silero-vad)
+### Model Details
+- **Architecture:** STFT + Encoder + RNN Decoder pipeline
+- **Input:** 16kHz mono audio chunks (512 samples / 32ms)
+- **Output:** Voice activity probability (0.0-1.0)
+- **Memory:** ~2MB total model size
+## Intended Use
+### Primary Use Cases
+- Real-time voice activity detection in iOS/macOS
+applications
+- Speech preprocessing for ASR systems
+- Audio segmentation and filtering
+## How to Use
+### Swift Integration
+```swift
+import FluidAudio
+let config = VADConfig(
+    threshold: 0.3,
+    chunkSize: 512, // 512 being the most optimal
+    sampleRate: 16000
+)
+let vadManager = VADManager(config: config)
+try await vadManager.initialize()
+// Process audio chunk
+let result = try await
+vadManager.processChunk(audioChunk)
+print("Voice probability: \(result.probability)")
+print("Is voice active: \(result.isVoiceActive)")
+Installation
+Add FluidAudio to your Swift project:
+dependencies: [
+    .package(url:
+"https://github.com/FluidAudio/FluidAudioSwift.git",
+from: "1.0.0")
+]
+Performance
+Benchmarks on Apple Silicon (M1/M2)
+| Metric           | Value               |
+|------------------|---------------------|
+| Latency          | <2ms per 32ms chunk |
+| Real-time Factor | 0.02x               |
+| Memory Usage     | ~15MB               |
+| CPU Usage        | <5% (single core)   |
+Accuracy Metrics
+Evaluated on common speech datasets:
+- Precision: 94.2%
+- Recall: 92.8%
+- F1-Score: 93.5%
+Model Files
+This repository contains three CoreML models that work
+together:
+- silero_stft.mlmodel (650KB) - STFT feature extraction
+- silero_encoder.mlmodel (254KB) - Feature encoding
+- silero_rnn_decoder.mlmodel (527KB) - RNN-based
+classification
+Training Data
+The original Silero VAD model was trained on a diverse
+dataset including:
+- Clean speech audio
+- Noisy speech with various background conditions
+- Music and non-speech audio for negative samples
+Limitations and Bias
+Known Limitations
+- Optimized for 16kHz sample rate (other rates may reduce
+ accuracy)
+- May struggle with very quiet speech (<-30dB SNR)
+- Performance varies with microphone quality and
+recording conditions
+Technical Details
+Model Architecture
+Audio Input (512 samples, 16kHz)
+    ↓
+STFT Model (spectral features)
+    ↓
+Encoder Model (feature compression)
+    ↓
+RNN Decoder (temporal modeling)
+    ↓
+Voice Probability Output
+Citation
+@misc{silero-vad-coreml,
+  title={CoreML Silero VAD},
+  author={FluidAudio Team},
+  year={2024},
+url={https://huggingface.co/alexwengg/coreml-silero-vad}
+}
+@misc{silero-vad,
+  title={Silero VAD},
+  author={Silero Team},
+  year={2021},
+  url={https://github.com/snakers4/silero-vad}
+}
+Related Models
+Check out other CoreML audio models in the
+https://huggingface.co/collections/bweng/coreml-685b12fd2
+51f80552c08e2b9:
+- https://huggingface.co/alexwengg/coreml_speaker_diariza
+tion - Identify "who spoke when"
+- https://huggingface.co/collections/bweng/coreml-685b12f
+d251f80552c08e2b9 - Speech-to-text for Apple platforms
+Repository and Support
+- GitHub: https://github.com/FluidAudio/FluidAudioSwift
+- Documentation:
+https://github.com/FluidAudio/FluidAudioSwift/wiki
+- Issues:
+https://github.com/FluidAudio/FluidAudioSwift/issues
+- Community:
+https://github.com/FluidAudio/FluidAudioSwift/discussions
+License
+This project is licensed under the MIT License - see the
+LICENSE file for details.
+The original Silero VAD model is also under MIT license.
+See https://github.com/snakers4/silero-vad/blob/master/LI
+CENSE for details.