| # CAM++ Speaker Recognition Model (MLX) | |
| Converted from: `iic/speech_campplus_sv_zh_en_16k-common_advanced` | |
| ## Model Details | |
| - **Architecture**: CAM++ (Context-Aware Masking++) | |
| - **Framework**: MLX (Apple Silicon optimized) | |
| - **Input**: Mel-spectrogram features (320 dimensions) | |
| - **Output**: Speaker embedding (192 dimensions) | |
| - **Quantized**: False | |
| ## Usage | |
| ```python | |
| from huggingface_hub import snapshot_download | |
| import mlx.core as mx | |
| import sys | |
| # Download model | |
| model_path = snapshot_download("mlx-community/campp-mlx") | |
| sys.path.append(model_path) | |
| from model import CAMPPModel | |
| import json | |
| # Load model | |
| with open(f"{model_path}/config.json") as f: | |
| config = json.load(f) | |
| model = CAMPPModel( | |
| input_dim=config["input_dim"], | |
| embedding_dim=config["embedding_dim"], | |
| input_channels=config.get("input_channels", 64) | |
| ) | |
| weights = mx.load(f"{model_path}/weights.npz") | |
| model.load_weights(weights) | |
| # Use model | |
| audio_features = mx.random.normal((1, 320, 200)) # Your audio features | |
| embedding = model(audio_features) | |
| ``` | |
| ## Performance | |
| - Optimized for Apple Silicon (M1/M2/M3/M4) | |
| - Faster inference than PyTorch on Mac | |
| - Lower memory usage with MLX unified memory | |
| ## Original Paper | |
| CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking | |
| https://arxiv.org/abs/2303.00332 | |