CAM++ Speaker Embedding (CoreML)

CoreML-converted CAM++ (Context-Aware Masking++) speaker embedding model for Apple Silicon.

Produces 192-dimensional speaker embeddings compatible with CosyVoice3 voice cloning.

Model Details

Architecture: D-TDNN (Densely-connected Time Delay Neural Network) with context-aware masking and multi-granularity pooling
Parameters: 6.9M
Input: 80-dim log-mel features, variable length
Output: 192-dim speaker embedding
Format: CoreML .mlmodelc (compiled, FP16)
Size: ~14 MB

Input/Output

Tensor	Shape	Description
`mel_features`	`[1, T, 80]`	80-dim log-mel spectrogram (T = 10-3000 frames)
`embedding`	`[1, 192]`	L2-normalizable speaker embedding

Conversion

Converted from the official campplus.onnx shipped with Fun-CosyVoice3-0.5B-2512:

ONNX → onnx2torch (PyTorch) → torch.jit.trace → coremltools → CoreML FP16

One ONNX op patched: ReduceProd → ReduceSum in stats pooling (single-element tensor, mathematically equivalent).

Verified: CoreML vs ONNX max diff = 0.015 (FP16 precision).

Usage

Used by speech-swift for CosyVoice3 voice cloning:

// Extract 192-dim speaker embedding for CosyVoice3 voice cloning
let embedding = try camPlusPlus.embed(audio: samples, sampleRate: 16000)
let audio = model.synthesize(text: "Hello", speakerEmbedding: embedding)

Original Model

Source: 3D-Speaker / CAM++ (Alibaba DAMO Academy)
Checkpoint: iic/speech_campplus_sv_zh-cn_16k-common (ModelScope)
ONNX: campplus.onnx from FunAudioLLM/Fun-CosyVoice3-0.5B-2512

License

Apache-2.0 (same as original 3D-Speaker)

Guide: soniqo.audio/guides/embed-speaker
Docs: soniqo.audio
GitHub: soniqo/speech-swift

Downloads last month: 440

Collection including aufklarer/CamPlusPlus-Speaker-CoreML

CoreML Speech Models

Collection

Speech AI models for Apple Neural Engine via CoreML. iOS/macOS ready. ASR, TTS, VAD, diarization. • 34 items • Updated 1 day ago • 4

Paper for aufklarer/CamPlusPlus-Speaker-CoreML

CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking

Paper • 2303.00332 • Published Mar 1, 2023