Spaces:

BMP
/

campp-mlx-converter

Sleeping

App Files Files Community

campp-mlx-converter / CLAUDE.md

BMP

feat: Add batch conversion scripts for CAM++ models

656e7f6 4 months ago

preview code

raw

history blame contribute delete

6.62 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a PyTorch-to-MLX model converter for CAM++ (Context-Aware Masking++) speaker recognition models. It provides a Gradio web interface for converting speaker verification models from ModelScope to Apple's MLX format, optimized for Apple Silicon (M1/M2/M3/M4).

Core Purpose: Convert PyTorch CAM++ models (D-TDNN architecture) from ModelScope to MLX format with optional quantization (Q2/Q4/Q8), then upload to HuggingFace's mlx-community organization.

Development Commands

Running the Application

# Start Gradio interface (default port 7865)
python app.py

Testing

# Test conversion utilities and parameter mapping
python conversion_utils.py

# Test specific parameter mapping logic
python test_mapping.py

Environment Setup

# Activate virtual environment (if exists)
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Architecture Overview

Three-Layer System

Web Interface (app.py)
- Gradio UI for model conversion workflow
- Orchestrates download → conversion → testing → upload pipeline
- Critical: Only uploads models that pass verification with 100% success rate and no errors
Model Architecture (mlx_campp.py)
- MLX implementation of CAM++ model
- Key components:
  - DenseBlock: D-TDNN backbone with dense connections
  - ContextAwareMasking: Multi-scale (1x1, 3x3, 5x5) context extraction + masking
  - ChannelContextGating: Channel-wise attention mechanism
  - MultiGranularityPooling: Statistical pooling with learnable attention
Conversion Engine (conversion_utils.py)
- Maps PyTorch xvector parameter names to MLX architecture
- Handles weight format conversions (Conv1d, Linear, BatchNorm)
- Does NOT add fake/random weights - only maps existing parameters
- Provides comprehensive verification and status checking

Parameter Mapping Logic

The converter maps from PyTorch xvector naming to MLX CAMPPModel naming:

Example mappings:

xvector.tdnn.linear.weight → input_conv.weight
xvector.block1.tdnnd{i}.linear1.weight → dense_blocks.0.layers.{i-1}.conv.weight
xvector.cam_layer.linear1.weight → cam.context_conv1.weight
xvector.transit1.linear.weight → transitions.0.layers.2.weight
xvector.dense.linear.weight → channel_gating.fc.layers.0.weight
xvector.output.linear.weight → pooling.attention_weights.weight

MLX model structure:

Block 0: 4 dense layers (maps from PyTorch block1, layers 1-4)
Block 1: 6 dense layers (maps from PyTorch block2, layers 1-6)
Block 2: 8 dense layers (maps from PyTorch block3, layers 1-8)
2 transition layers between blocks
CAM layer with 3 parallel context paths (1x1, 3x3, 5x5 convolutions)

Conversion Safety Checks

The conversion process includes multi-stage verification (app.py:199-227):

Pre-upload testing: _test_converted_model() loads converted weights and runs forward pass
Parameter verification: Checks for missing/extra parameters, shape mismatches
Upload gating: ONLY uploads if no warnings or errors detected
Status checking: Uses check_conversion_status() to verify:
- 100% verification rate required
- No NaN/Inf values in weights
- All parameters successfully mapped
- Shape consistency maintained

Weight Format Notes

Conv1d: MLX uses same format as PyTorch (out_channels, in_channels, kernel_size) - no transpose needed
Linear: Same format (out_features, in_features) - no transpose needed
BatchNorm: Includes running_mean/running_var for inference mode
Quantization: Applied via MLX quantization utils, skips bias/batchnorm/small tensors

Key Implementation Details

Conversion Flow (app.py:55-160)

Download model from ModelScope using modelscope.snapshot_download
Find PyTorch model file (prioritizes files with 'campplus' in name)
Load weights (supports .bin, .pt, .safetensors)
Validate CAM++ architecture (checks for conv + dense/tdnn patterns)
Convert weights to MLX via ConversionUtils.convert_weights_to_mlx()
Create versions: regular + optional Q2/Q4/Q8 quantized
Test each version - only upload if tests pass
Upload to HuggingFace mlx-community/{output_name}

File Generation

For each converted model, generates:

weights.npz: MLX weight arrays
config.json: Model metadata (architecture, dimensions, quantization info)
model.py: Copy of mlx_campp.py for loading
usage_example.py: Code example for loading and inference
README.md: Model card with usage instructions

Important Constants (app.py:22-28)

Error messages prefixed with ERROR_
Success message template: SUCCESS_CONVERSION
Default target organization: mlx-community
Default server port: 7865

Common Patterns

Adding New Parameter Mappings

Modify conversion_utils.py:_xvector_to_mlx_name():

Identify PyTorch parameter pattern (e.g., xvector.block1.tdnnd3.*)
Determine corresponding MLX parameter (e.g., dense_blocks.0.layers.2.*)
Add conditional mapping with exact string matching
Return None to skip parameters without MLX equivalents

Updating Model Architecture

When modifying mlx_campp.py:

Update CAMPPModel.__init__() to add new layers
Update __call__() to integrate in forward pass
Update conversion mapping in conversion_utils.py
Test with python conversion_utils.py to verify shapes

Debugging Conversion Issues

Check logs for "Filtered out" messages showing skipped parameters
Run test_mapping.py to verify parameter name transformations
Use verify_conversion() to compare PyTorch vs MLX shapes/values
Check check_conversion_status() output for detailed diagnostics

Model Sources

Primary models (app.py:503-510):

Chinese (Basic): iic/speech_campplus_sv_zh-cn_16k-common
Chinese-English (Advanced): iic/speech_campplus_sv_zh_en_16k-common_advanced

These are downloaded from ModelScope, not HuggingFace.

Testing Requirements

The conversion is conservative by design:

Will NOT upload if any parameter verification fails
Will NOT upload if NaN/Inf values detected
Will NOT upload if test produces warnings
Requires 100% verification rate for deployment

This ensures only correctly-converted models reach production.

create a clean implementation from scratch based on these findings