TEN VAD — CoreML
TEN Framework streaming VAD converted to Apple CoreML (mlprogram, fp32 + int8 weight-quant). Feature extractor (STFT + pitch + biquad) NOT included — see upstream src/.
Upstream: https://github.com/TEN-framework/ten-vad
Part of the VAD CoreML collection — CoreML conversions of community VAD models for use in iOS / macOS apps.
Files
| File | Format | Notes |
|---|---|---|
TenVAD.mlpackage |
CoreML mlprogram (fp32) | |
TenVAD_int8.mlpackage |
CoreML mlprogram (int8 weight-quant) | smaller, ~same accuracy |
I/O
Inputs:
{
"input_1": [
1,
3,
41
],
"input_2": [
1,
64
],
"input_3": [
1,
64
],
"input_6": [
1,
64
],
"input_7": [
1,
64
]
}
Outputs:
{
"output_1": "(1,T,1) prob",
"output_2": [
1,
64
],
"output_3": [
1,
64
],
"output_6": [
1,
64
],
"output_7": [
1,
64
]
}
Preprocessing (NOT included in the CoreML graph — implement on-device): STFT + pitch + biquad feature stack from ten-vad-src/src/ (C++). NOT ported. Must port to Swift/Accelerate for iOS integration; or wrap the C sources via a thin ObjC++ bridge.
Notes: Parity validated against ONNX Runtime on 5 random input vectors (no real audio path). Graph parity is sufficient to trust the CoreML conversion; end-to-end ATC validation requires the feature extractor port.
Numerical parity vs reference
| Variant | max abs diff | mean abs diff | Pearson |
|---|---|---|---|
| fp32 | 5.960e-08 | 2.384e-08 | 1.000000 |
| int8 | 8.149e-04 | 4.537e-04 | 0.999990 |
Validated against vs ORT prob on 5 random inputs.
License
This conversion is released under the same license as the upstream model (apache-2.0). Original credits remain with the upstream authors.
Conversion
Converted with coremltools 9.0, mlprogram format, minimum_deployment_target=iOS17.
- Downloads last month
- -