TEN VAD — CoreML

TEN Framework streaming VAD converted to Apple CoreML (mlprogram, fp32 + int8 weight-quant). Feature extractor (STFT + pitch + biquad) NOT included — see upstream src/.

Upstream: https://github.com/TEN-framework/ten-vad

Part of the VAD CoreML collection — CoreML conversions of community VAD models for use in iOS / macOS apps.

Files

File Format Notes
TenVAD.mlpackage CoreML mlprogram (fp32)
TenVAD_int8.mlpackage CoreML mlprogram (int8 weight-quant) smaller, ~same accuracy

I/O

Inputs:

{
  "input_1": [
    1,
    3,
    41
  ],
  "input_2": [
    1,
    64
  ],
  "input_3": [
    1,
    64
  ],
  "input_6": [
    1,
    64
  ],
  "input_7": [
    1,
    64
  ]
}

Outputs:

{
  "output_1": "(1,T,1) prob",
  "output_2": [
    1,
    64
  ],
  "output_3": [
    1,
    64
  ],
  "output_6": [
    1,
    64
  ],
  "output_7": [
    1,
    64
  ]
}

Preprocessing (NOT included in the CoreML graph — implement on-device): STFT + pitch + biquad feature stack from ten-vad-src/src/ (C++). NOT ported. Must port to Swift/Accelerate for iOS integration; or wrap the C sources via a thin ObjC++ bridge.

Notes: Parity validated against ONNX Runtime on 5 random input vectors (no real audio path). Graph parity is sufficient to trust the CoreML conversion; end-to-end ATC validation requires the feature extractor port.

Numerical parity vs reference

Variant max abs diff mean abs diff Pearson
fp32 5.960e-08 2.384e-08 1.000000
int8 8.149e-04 4.537e-04 0.999990

Validated against vs ORT prob on 5 random inputs.

License

This conversion is released under the same license as the upstream model (apache-2.0). Original credits remain with the upstream authors.

Conversion

Converted with coremltools 9.0, mlprogram format, minimum_deployment_target=iOS17.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including TigreGotico/ten-vad-coreml