FunASR FSMN-VAD — CoreML
Alibaba FunASR FSMN-VAD (streaming cache I/O) converted to Apple CoreML (mlprogram, fp32 + int8 weight-quant).
Upstream: https://huggingface.co/funasr/fsmn-vad-onnx
Part of the VAD CoreML collection — CoreML conversions of community VAD models for use in iOS / macOS apps.
Files
| File | Format | Notes |
|---|---|---|
FsmnVAD.mlpackage |
CoreML mlprogram (fp32) | |
FsmnVAD_int8.mlpackage |
CoreML mlprogram (int8 weight-quant) | smaller, ~same accuracy |
I/O
Inputs:
{
"speech": [
1,
"T(flex 20-8192)",
400
],
"in_cache0": [
1,
128,
19,
1
],
"in_cache1": [
1,
128,
19,
1
],
"in_cache2": [
1,
128,
19,
1
],
"in_cache3": [
1,
128,
19,
1
]
}
Outputs:
{
"logits": [
1,
"T",
248
]
}
Preprocessing (NOT included in the CoreML graph — implement on-device): Kaldi 80-d fbank (25/10ms) -> LFR(m=5,n=1) -> CMVN(vad.mvn) — 400-d frames
Notes: Streaming caches exposed as I/O; zero caches for non-streaming use. Frame stride after LFR is 60 ms.
Numerical parity vs reference
| Variant | max abs diff | mean abs diff | Pearson |
|---|---|---|---|
| fp32 | 1.937e-06 | 4.423e-09 | 1.000000 |
| int8 | 6.652e-02 | 1.102e-04 | 0.999802 |
Validated against vs ORT ref logits.
CMVN normalization
Required preprocessing — apply to the 400-d LFR features (80-d Kaldi fbank stacked with m=5, n=1) before feeding to the model:
lfr_normalized[t, d] = (lfr[t, d] + add[d]) * scale[d]
Two forms of the same statistics are provided:
| File | Format | Layout |
|---|---|---|
vad.mvn |
Kaldi-style text (<AddShift> / <Rescale>) |
The original FunASR FSMN-VAD normalization config. |
fsmn_cmvn.bin |
Raw little-endian fp32 | 400 floats add[0..399] followed by 400 floats scale[0..399]. Trivially mmap-able from Swift / C. |
Both encode the same 400-d additive and multiplicative coefficients. The .bin is derived from vad.mvn and exists for iOS / Swift consumers that don't want to parse the Kaldi text format.
License
This conversion is released under the same license as the upstream model (apache-2.0). Original credits remain with the upstream authors.
Conversion
Converted with coremltools 9.0, mlprogram format, minimum_deployment_target=iOS17.
- Downloads last month
- -