MarbleNet Frame-VAD — CoreML

NVIDIA MarbleNet multilingual frame-level VAD converted to Apple CoreML (mlprogram, fp32 + int8 weight-quant).

Upstream: https://huggingface.co/nvidia/Frame_VAD_Multilingual_MarbleNet_v2.0

Part of the VAD CoreML collection — CoreML conversions of community VAD models for use in iOS / macOS apps.

Files

File	Format	Notes
`MarbleNetVAD.mlpackage`	CoreML mlprogram (fp32)
`MarbleNetVAD_int8.mlpackage`	CoreML mlprogram (int8 weight-quant)	smaller, ~same accuracy

I/O

Inputs:

{
  "processed_signal": [
    1,
    64,
    "T(flex 50-4096)"
  ],
  "length": [
    1
  ]
}

Outputs:

(single output: speech probability)

Preprocessing (NOT included in the CoreML graph — implement on-device): AudioToMelSpectrogramPreprocessor(n_mels=64, win=25ms, hop=10ms, per_feature norm) — applied outside the CoreML model

Numerical parity vs reference

Variant	max abs diff	mean abs diff	Pearson
fp32	4.172e-07	1.596e-08	1.000000
int8	3.305e-02	2.567e-03	0.999893

Validated against vs NeMo torch ref.

License

This conversion is released under the same license as the upstream model (cc-by-4.0). Original credits remain with the upstream authors.

Conversion

Converted with coremltools 9.0, mlprogram format, minimum_deployment_target=iOS17.

Downloads last month: 4

Inference Providers NEW

Voice Activity Detection

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including TigreGotico/marblenet-vad-coreml

VAD CoreML

Collection

CoreML conversions of community VAD models (fp32 + int8) targeting iOS 17+. • 5 items • Updated Jun 27 • 1