FunASR FSMN-VAD — CoreML

Alibaba FunASR FSMN-VAD (streaming cache I/O) converted to Apple CoreML (mlprogram, fp32 + int8 weight-quant).

Upstream: https://huggingface.co/funasr/fsmn-vad-onnx

Part of the VAD CoreML collection — CoreML conversions of community VAD models for use in iOS / macOS apps.

Files

File Format Notes
FsmnVAD.mlpackage CoreML mlprogram (fp32)
FsmnVAD_int8.mlpackage CoreML mlprogram (int8 weight-quant) smaller, ~same accuracy

I/O

Inputs:

{
  "speech": [
    1,
    "T(flex 20-8192)",
    400
  ],
  "in_cache0": [
    1,
    128,
    19,
    1
  ],
  "in_cache1": [
    1,
    128,
    19,
    1
  ],
  "in_cache2": [
    1,
    128,
    19,
    1
  ],
  "in_cache3": [
    1,
    128,
    19,
    1
  ]
}

Outputs:

{
  "logits": [
    1,
    "T",
    248
  ]
}

Preprocessing (NOT included in the CoreML graph — implement on-device): Kaldi 80-d fbank (25/10ms) -> LFR(m=5,n=1) -> CMVN(vad.mvn) — 400-d frames

Notes: Streaming caches exposed as I/O; zero caches for non-streaming use. Frame stride after LFR is 60 ms.

Numerical parity vs reference

Variant max abs diff mean abs diff Pearson
fp32 1.937e-06 4.423e-09 1.000000
int8 6.652e-02 1.102e-04 0.999802

Validated against vs ORT ref logits.

CMVN normalization

Required preprocessing — apply to the 400-d LFR features (80-d Kaldi fbank stacked with m=5, n=1) before feeding to the model:

lfr_normalized[t, d] = (lfr[t, d] + add[d]) * scale[d]

Two forms of the same statistics are provided:

File Format Layout
vad.mvn Kaldi-style text (<AddShift> / <Rescale>) The original FunASR FSMN-VAD normalization config.
fsmn_cmvn.bin Raw little-endian fp32 400 floats add[0..399] followed by 400 floats scale[0..399]. Trivially mmap-able from Swift / C.

Both encode the same 400-d additive and multiplicative coefficients. The .bin is derived from vad.mvn and exists for iOS / Swift consumers that don't want to parse the Kaldi text format.

License

This conversion is released under the same license as the upstream model (apache-2.0). Original credits remain with the upstream authors.

Conversion

Converted with coremltools 9.0, mlprogram format, minimum_deployment_target=iOS17.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including TigreGotico/fsmn-vad-coreml