FireRedVAD — CoreML

Xiaohongshu FireRedVAD (DFSMN, non-streaming) converted to Apple CoreML (mlprogram, fp32 + int8 weight-quant).

Upstream: https://huggingface.co/FireRedTeam/FireRedVAD

Part of the VAD CoreML collection — CoreML conversions of community VAD models for use in iOS / macOS apps.

Files

File Format Notes
FireRedVAD.mlpackage CoreML mlprogram (fp32)
FireRedVAD_int8.mlpackage CoreML mlprogram (int8 weight-quant) smaller, ~same accuracy

I/O

Inputs:

{
  "fbank": [
    1,
    "T(flex 20-8192)",
    80
  ]
}

Outputs:

(single output: speech probability)

Preprocessing (NOT included in the CoreML graph — implement on-device): Kaldi 80-d fbank (25/10ms) + CMVN from cmvn.ark — applied outside the CoreML model

Numerical parity vs reference

Variant max abs diff mean abs diff Pearson
fp32 6.557e-07 3.455e-08 1.000000
int8 3.855e-02 3.428e-03 0.999879

Validated against vs PyTorch ref.

CMVN normalization

Required preprocessing — apply to the 80-d Kaldi fbank before feeding to the model:

fbank_normalized[t, d] = (fbank[t, d] - mean[d]) * inverse_std[d]

Two forms of the same statistics are provided:

File Format Layout
cmvn.ark Kaldi binary DM matrix shape (2, 81) — first row sum stats, second row sum-of-squares, last column is count. Use kaldiio.load_mat.
firered_cmvn.bin Raw little-endian fp32 80 floats mean[0..79] followed by 80 floats inverse_std[0..79]. Trivially mmap-able from Swift / C.

Both encode the same 80-d mean and inverse-std. The .bin is derived from cmvn.ark and exists for iOS / Swift consumers that don't want to link Kaldi.

License

This conversion is released under the same license as the upstream model (apache-2.0). Original credits remain with the upstream authors.

Conversion

Converted with coremltools 9.0, mlprogram format, minimum_deployment_target=iOS17.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including TigreGotico/fireredvad-coreml