campplus-coreml / README.md

alexwengg

docs: AISHELL EER 0.48%

daa02e7 verified 9 days ago

preview code

raw

history blame contribute delete

1.62 kB

metadata

license: other
license_name: campplus-upstream
license_link: https://github.com/modelscope/FunASR
language:
  - zh
library_name: coreml
tags:
  - coreml
  - ane
  - speaker-verification
  - speaker-diarization
  - campplus
  - funasr
  - fluidaudio
pipeline_tag: audio-classification

CAM++ — CoreML (Apple Neural Engine)

CoreML conversion of FunASR's CAM++ speaker-embedding model (~7.2M params), for on-device speaker verification / diarization on Apple Silicon. Upstream: iic/speech_campplus_sv_zh-cn_16k-common.

Files

File	Precision	Compute unit	Role
`CamPlusPreprocessor.mlmodelc`	FP32	CPU	waveform → 80-d fbank features
`CamPlusPlus.mlmodelc`	FP16	ANE	fbank → 192-d speaker embedding

Pipeline

waveform → [Preprocessor fp32/CPU] → fbank [1,T,80]
        → [CAM++ fp16/ANE] → embedding [1,192]  (L2-normalize, then cosine for verification/clustering)

CAM++ normalizes the fbank internally. The 192-d embedding is used with cosine similarity for speaker verification and diarization clustering.

Benchmark — AISHELL-1 speaker verification

Metric	Value
EER	0.48% (20 speakers, 6000 same / 6000 diff trials)
same-speaker cosine	0.805
different-speaker cosine	0.256

AISHELL-1 (clean read Mandarin) is easier than the official CN-Celeb (~6-7%). CoreML↔torch embedding cosine 0.9997-0.99999.

License

Weights derive from FunASR's CAM++; upstream license applies. Format conversion only.