FireRedVAD — CoreML
Xiaohongshu FireRedVAD (DFSMN, non-streaming) converted to Apple CoreML (mlprogram, fp32 + int8 weight-quant).
Upstream: https://huggingface.co/FireRedTeam/FireRedVAD
Part of the VAD CoreML collection — CoreML conversions of community VAD models for use in iOS / macOS apps.
Files
| File | Format | Notes |
|---|---|---|
FireRedVAD.mlpackage |
CoreML mlprogram (fp32) | |
FireRedVAD_int8.mlpackage |
CoreML mlprogram (int8 weight-quant) | smaller, ~same accuracy |
I/O
Inputs:
{
"fbank": [
1,
"T(flex 20-8192)",
80
]
}
Outputs:
(single output: speech probability)
Preprocessing (NOT included in the CoreML graph — implement on-device): Kaldi 80-d fbank (25/10ms) + CMVN from cmvn.ark — applied outside the CoreML model
Numerical parity vs reference
| Variant | max abs diff | mean abs diff | Pearson |
|---|---|---|---|
| fp32 | 6.557e-07 | 3.455e-08 | 1.000000 |
| int8 | 3.855e-02 | 3.428e-03 | 0.999879 |
Validated against vs PyTorch ref.
CMVN normalization
Required preprocessing — apply to the 80-d Kaldi fbank before feeding to the model:
fbank_normalized[t, d] = (fbank[t, d] - mean[d]) * inverse_std[d]
Two forms of the same statistics are provided:
| File | Format | Layout |
|---|---|---|
cmvn.ark |
Kaldi binary DM matrix |
shape (2, 81) — first row sum stats, second row sum-of-squares, last column is count. Use kaldiio.load_mat. |
firered_cmvn.bin |
Raw little-endian fp32 | 80 floats mean[0..79] followed by 80 floats inverse_std[0..79]. Trivially mmap-able from Swift / C. |
Both encode the same 80-d mean and inverse-std. The .bin is derived from cmvn.ark and exists for iOS / Swift consumers that don't want to link Kaldi.
License
This conversion is released under the same license as the upstream model (apache-2.0). Original credits remain with the upstream authors.
Conversion
Converted with coremltools 9.0, mlprogram format, minimum_deployment_target=iOS17.
- Downloads last month
- -