Convert iic/speech_campplus_sv_zh_en_16k-common_advanced to MLX format

2e82ca2 verified about 1 month ago

1.32 kB

	# CAM++ Speaker Recognition Model (MLX)

	Converted from: `iic/speech_campplus_sv_zh_en_16k-common_advanced`

	## Model Details

	- Architecture: CAM++ (Context-Aware Masking++)
	- Framework: MLX (Apple Silicon optimized)
	- Input: Mel-spectrogram features (320 dimensions)
	- Output: Speaker embedding (192 dimensions)
	- Quantized: False

	## Usage

	```python
	from huggingface_hub import snapshot_download
	import mlx.core as mx
	import sys

	# Download model
	model_path = snapshot_download("mlx-community/campp-mlx")
	sys.path.append(model_path)

	from model import CAMPPModel
	import json

	# Load model
	with open(f"{model_path}/config.json") as f:
	config = json.load(f)

	model = CAMPPModel(
	input_dim=config["input_dim"],
	embedding_dim=config["embedding_dim"],
	input_channels=config.get("input_channels", 64)
	)
	weights = mx.load(f"{model_path}/weights.npz")
	model.load_weights(weights)

	# Use model
	audio_features = mx.random.normal((1, 320, 200)) # Your audio features
	embedding = model(audio_features)
	```

	## Performance

	- Optimized for Apple Silicon (M1/M2/M3/M4)
	- Faster inference than PyTorch on Mac
	- Lower memory usage with MLX unified memory

	## Original Paper

	CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking
	https://arxiv.org/abs/2303.00332