| --- |
| license: other |
| license_name: campplus-upstream |
| license_link: https://github.com/modelscope/FunASR |
| language: [zh] |
| library_name: coreml |
| tags: [coreml, ane, speaker-verification, speaker-diarization, campplus, funasr, fluidaudio] |
| pipeline_tag: audio-classification |
| --- |
| |
| # CAM++ — CoreML (Apple Neural Engine) |
|
|
| CoreML conversion of FunASR's **CAM++** speaker-embedding model (~7.2M params), for |
| on-device speaker verification / diarization on Apple Silicon. Upstream: |
| [iic/speech_campplus_sv_zh-cn_16k-common](https://www.modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common). |
|
|
| ## Files |
|
|
| | File | Precision | Compute unit | Role | |
| |------|-----------|--------------|------| |
| | `CamPlusPreprocessor.mlmodelc` | FP32 | CPU | waveform → 80-d fbank features | |
| | `CamPlusPlus.mlmodelc` | FP16 | ANE | fbank → 192-d speaker embedding | |
|
|
| ## Pipeline |
|
|
| ``` |
| waveform → [Preprocessor fp32/CPU] → fbank [1,T,80] |
| → [CAM++ fp16/ANE] → embedding [1,192] (L2-normalize, then cosine for verification/clustering) |
| ``` |
|
|
| CAM++ normalizes the fbank internally. The 192-d embedding is used with cosine |
| similarity for speaker verification and diarization clustering. |
|
|
| ## Benchmark — AISHELL-1 speaker verification |
|
|
| | Metric | Value | |
| |--------|-------| |
| | **EER** | **0.48%** (20 speakers, 6000 same / 6000 diff trials) | |
| | same-speaker cosine | 0.805 | |
| | different-speaker cosine | 0.256 | |
|
|
| AISHELL-1 (clean read Mandarin) is easier than the official CN-Celeb (~6-7%). CoreML↔torch embedding cosine 0.9997-0.99999. |
|
|
| ## License |
|
|
| Weights derive from FunASR's CAM++; upstream license applies. Format conversion only. |
|
|