Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.14.0
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
This is a PyTorch-to-MLX model converter for CAM++ (Context-Aware Masking++) speaker recognition models. It provides a Gradio web interface for converting speaker verification models from ModelScope to Apple's MLX format, optimized for Apple Silicon (M1/M2/M3/M4).
Core Purpose: Convert PyTorch CAM++ models (D-TDNN architecture) from ModelScope to MLX format with optional quantization (Q2/Q4/Q8), then upload to HuggingFace's mlx-community organization.
Development Commands
Running the Application
# Start Gradio interface (default port 7865)
python app.py
Testing
# Test conversion utilities and parameter mapping
python conversion_utils.py
# Test specific parameter mapping logic
python test_mapping.py
Environment Setup
# Activate virtual environment (if exists)
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
Architecture Overview
Three-Layer System
Web Interface (
app.py)- Gradio UI for model conversion workflow
- Orchestrates download β conversion β testing β upload pipeline
- Critical: Only uploads models that pass verification with 100% success rate and no errors
Model Architecture (
mlx_campp.py)- MLX implementation of CAM++ model
- Key components:
DenseBlock: D-TDNN backbone with dense connectionsContextAwareMasking: Multi-scale (1x1, 3x3, 5x5) context extraction + maskingChannelContextGating: Channel-wise attention mechanismMultiGranularityPooling: Statistical pooling with learnable attention
Conversion Engine (
conversion_utils.py)- Maps PyTorch xvector parameter names to MLX architecture
- Handles weight format conversions (Conv1d, Linear, BatchNorm)
- Does NOT add fake/random weights - only maps existing parameters
- Provides comprehensive verification and status checking
Parameter Mapping Logic
The converter maps from PyTorch xvector naming to MLX CAMPPModel naming:
Example mappings:
xvector.tdnn.linear.weightβinput_conv.weightxvector.block1.tdnnd{i}.linear1.weightβdense_blocks.0.layers.{i-1}.conv.weightxvector.cam_layer.linear1.weightβcam.context_conv1.weightxvector.transit1.linear.weightβtransitions.0.layers.2.weightxvector.dense.linear.weightβchannel_gating.fc.layers.0.weightxvector.output.linear.weightβpooling.attention_weights.weight
MLX model structure:
- Block 0: 4 dense layers (maps from PyTorch block1, layers 1-4)
- Block 1: 6 dense layers (maps from PyTorch block2, layers 1-6)
- Block 2: 8 dense layers (maps from PyTorch block3, layers 1-8)
- 2 transition layers between blocks
- CAM layer with 3 parallel context paths (1x1, 3x3, 5x5 convolutions)
Conversion Safety Checks
The conversion process includes multi-stage verification (app.py:199-227):
- Pre-upload testing:
_test_converted_model()loads converted weights and runs forward pass - Parameter verification: Checks for missing/extra parameters, shape mismatches
- Upload gating: ONLY uploads if no warnings or errors detected
- Status checking: Uses
check_conversion_status()to verify:- 100% verification rate required
- No NaN/Inf values in weights
- All parameters successfully mapped
- Shape consistency maintained
Weight Format Notes
- Conv1d: MLX uses same format as PyTorch
(out_channels, in_channels, kernel_size)- no transpose needed - Linear: Same format
(out_features, in_features)- no transpose needed - BatchNorm: Includes running_mean/running_var for inference mode
- Quantization: Applied via MLX quantization utils, skips bias/batchnorm/small tensors
Key Implementation Details
Conversion Flow (app.py:55-160)
- Download model from ModelScope using
modelscope.snapshot_download - Find PyTorch model file (prioritizes files with 'campplus' in name)
- Load weights (supports .bin, .pt, .safetensors)
- Validate CAM++ architecture (checks for conv + dense/tdnn patterns)
- Convert weights to MLX via
ConversionUtils.convert_weights_to_mlx() - Create versions: regular + optional Q2/Q4/Q8 quantized
- Test each version - only upload if tests pass
- Upload to HuggingFace
mlx-community/{output_name}
File Generation
For each converted model, generates:
weights.npz: MLX weight arraysconfig.json: Model metadata (architecture, dimensions, quantization info)model.py: Copy ofmlx_campp.pyfor loadingusage_example.py: Code example for loading and inferenceREADME.md: Model card with usage instructions
Important Constants (app.py:22-28)
- Error messages prefixed with
ERROR_ - Success message template:
SUCCESS_CONVERSION - Default target organization:
mlx-community - Default server port: 7865
Common Patterns
Adding New Parameter Mappings
Modify conversion_utils.py:_xvector_to_mlx_name():
- Identify PyTorch parameter pattern (e.g.,
xvector.block1.tdnnd3.*) - Determine corresponding MLX parameter (e.g.,
dense_blocks.0.layers.2.*) - Add conditional mapping with exact string matching
- Return
Noneto skip parameters without MLX equivalents
Updating Model Architecture
When modifying mlx_campp.py:
- Update
CAMPPModel.__init__()to add new layers - Update
__call__()to integrate in forward pass - Update conversion mapping in
conversion_utils.py - Test with
python conversion_utils.pyto verify shapes
Debugging Conversion Issues
- Check logs for "Filtered out" messages showing skipped parameters
- Run
test_mapping.pyto verify parameter name transformations - Use
verify_conversion()to compare PyTorch vs MLX shapes/values - Check
check_conversion_status()output for detailed diagnostics
Model Sources
Primary models (app.py:503-510):
- Chinese (Basic):
iic/speech_campplus_sv_zh-cn_16k-common - Chinese-English (Advanced):
iic/speech_campplus_sv_zh_en_16k-common_advanced
These are downloaded from ModelScope, not HuggingFace.
Testing Requirements
The conversion is conservative by design:
- Will NOT upload if any parameter verification fails
- Will NOT upload if NaN/Inf values detected
- Will NOT upload if test produces warnings
- Requires 100% verification rate for deployment
This ensures only correctly-converted models reach production.
- create a clean implementation from scratch based on these findings