MarbleNet Frame-VAD — CoreML
NVIDIA MarbleNet multilingual frame-level VAD converted to Apple CoreML (mlprogram, fp32 + int8 weight-quant).
Upstream: https://huggingface.co/nvidia/Frame_VAD_Multilingual_MarbleNet_v2.0
Part of the VAD CoreML collection — CoreML conversions of community VAD models for use in iOS / macOS apps.
Files
| File | Format | Notes |
|---|---|---|
MarbleNetVAD.mlpackage |
CoreML mlprogram (fp32) | |
MarbleNetVAD_int8.mlpackage |
CoreML mlprogram (int8 weight-quant) | smaller, ~same accuracy |
I/O
Inputs:
{
"processed_signal": [
1,
64,
"T(flex 50-4096)"
],
"length": [
1
]
}
Outputs:
(single output: speech probability)
Preprocessing (NOT included in the CoreML graph — implement on-device): AudioToMelSpectrogramPreprocessor(n_mels=64, win=25ms, hop=10ms, per_feature norm) — applied outside the CoreML model
Numerical parity vs reference
| Variant | max abs diff | mean abs diff | Pearson |
|---|---|---|---|
| fp32 | 4.172e-07 | 1.596e-08 | 1.000000 |
| int8 | 3.305e-02 | 2.567e-03 | 0.999893 |
Validated against vs NeMo torch ref.
License
This conversion is released under the same license as the upstream model (cc-by-4.0). Original credits remain with the upstream authors.
Conversion
Converted with coremltools 9.0, mlprogram format, minimum_deployment_target=iOS17.
- Downloads last month
- -