campp-mlx-converter / CLAUDE.md
BMP's picture
feat: Add batch conversion scripts for CAM++ models
656e7f6

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a PyTorch-to-MLX model converter for CAM++ (Context-Aware Masking++) speaker recognition models. It provides a Gradio web interface for converting speaker verification models from ModelScope to Apple's MLX format, optimized for Apple Silicon (M1/M2/M3/M4).

Core Purpose: Convert PyTorch CAM++ models (D-TDNN architecture) from ModelScope to MLX format with optional quantization (Q2/Q4/Q8), then upload to HuggingFace's mlx-community organization.

Development Commands

Running the Application

# Start Gradio interface (default port 7865)
python app.py

Testing

# Test conversion utilities and parameter mapping
python conversion_utils.py

# Test specific parameter mapping logic
python test_mapping.py

Environment Setup

# Activate virtual environment (if exists)
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Architecture Overview

Three-Layer System

  1. Web Interface (app.py)

    • Gradio UI for model conversion workflow
    • Orchestrates download β†’ conversion β†’ testing β†’ upload pipeline
    • Critical: Only uploads models that pass verification with 100% success rate and no errors
  2. Model Architecture (mlx_campp.py)

    • MLX implementation of CAM++ model
    • Key components:
      • DenseBlock: D-TDNN backbone with dense connections
      • ContextAwareMasking: Multi-scale (1x1, 3x3, 5x5) context extraction + masking
      • ChannelContextGating: Channel-wise attention mechanism
      • MultiGranularityPooling: Statistical pooling with learnable attention
  3. Conversion Engine (conversion_utils.py)

    • Maps PyTorch xvector parameter names to MLX architecture
    • Handles weight format conversions (Conv1d, Linear, BatchNorm)
    • Does NOT add fake/random weights - only maps existing parameters
    • Provides comprehensive verification and status checking

Parameter Mapping Logic

The converter maps from PyTorch xvector naming to MLX CAMPPModel naming:

Example mappings:

  • xvector.tdnn.linear.weight β†’ input_conv.weight
  • xvector.block1.tdnnd{i}.linear1.weight β†’ dense_blocks.0.layers.{i-1}.conv.weight
  • xvector.cam_layer.linear1.weight β†’ cam.context_conv1.weight
  • xvector.transit1.linear.weight β†’ transitions.0.layers.2.weight
  • xvector.dense.linear.weight β†’ channel_gating.fc.layers.0.weight
  • xvector.output.linear.weight β†’ pooling.attention_weights.weight

MLX model structure:

  • Block 0: 4 dense layers (maps from PyTorch block1, layers 1-4)
  • Block 1: 6 dense layers (maps from PyTorch block2, layers 1-6)
  • Block 2: 8 dense layers (maps from PyTorch block3, layers 1-8)
  • 2 transition layers between blocks
  • CAM layer with 3 parallel context paths (1x1, 3x3, 5x5 convolutions)

Conversion Safety Checks

The conversion process includes multi-stage verification (app.py:199-227):

  1. Pre-upload testing: _test_converted_model() loads converted weights and runs forward pass
  2. Parameter verification: Checks for missing/extra parameters, shape mismatches
  3. Upload gating: ONLY uploads if no warnings or errors detected
  4. Status checking: Uses check_conversion_status() to verify:
    • 100% verification rate required
    • No NaN/Inf values in weights
    • All parameters successfully mapped
    • Shape consistency maintained

Weight Format Notes

  • Conv1d: MLX uses same format as PyTorch (out_channels, in_channels, kernel_size) - no transpose needed
  • Linear: Same format (out_features, in_features) - no transpose needed
  • BatchNorm: Includes running_mean/running_var for inference mode
  • Quantization: Applied via MLX quantization utils, skips bias/batchnorm/small tensors

Key Implementation Details

Conversion Flow (app.py:55-160)

  1. Download model from ModelScope using modelscope.snapshot_download
  2. Find PyTorch model file (prioritizes files with 'campplus' in name)
  3. Load weights (supports .bin, .pt, .safetensors)
  4. Validate CAM++ architecture (checks for conv + dense/tdnn patterns)
  5. Convert weights to MLX via ConversionUtils.convert_weights_to_mlx()
  6. Create versions: regular + optional Q2/Q4/Q8 quantized
  7. Test each version - only upload if tests pass
  8. Upload to HuggingFace mlx-community/{output_name}

File Generation

For each converted model, generates:

  • weights.npz: MLX weight arrays
  • config.json: Model metadata (architecture, dimensions, quantization info)
  • model.py: Copy of mlx_campp.py for loading
  • usage_example.py: Code example for loading and inference
  • README.md: Model card with usage instructions

Important Constants (app.py:22-28)

  • Error messages prefixed with ERROR_
  • Success message template: SUCCESS_CONVERSION
  • Default target organization: mlx-community
  • Default server port: 7865

Common Patterns

Adding New Parameter Mappings

Modify conversion_utils.py:_xvector_to_mlx_name():

  1. Identify PyTorch parameter pattern (e.g., xvector.block1.tdnnd3.*)
  2. Determine corresponding MLX parameter (e.g., dense_blocks.0.layers.2.*)
  3. Add conditional mapping with exact string matching
  4. Return None to skip parameters without MLX equivalents

Updating Model Architecture

When modifying mlx_campp.py:

  1. Update CAMPPModel.__init__() to add new layers
  2. Update __call__() to integrate in forward pass
  3. Update conversion mapping in conversion_utils.py
  4. Test with python conversion_utils.py to verify shapes

Debugging Conversion Issues

  1. Check logs for "Filtered out" messages showing skipped parameters
  2. Run test_mapping.py to verify parameter name transformations
  3. Use verify_conversion() to compare PyTorch vs MLX shapes/values
  4. Check check_conversion_status() output for detailed diagnostics

Model Sources

Primary models (app.py:503-510):

  • Chinese (Basic): iic/speech_campplus_sv_zh-cn_16k-common
  • Chinese-English (Advanced): iic/speech_campplus_sv_zh_en_16k-common_advanced

These are downloaded from ModelScope, not HuggingFace.

Testing Requirements

The conversion is conservative by design:

  • Will NOT upload if any parameter verification fails
  • Will NOT upload if NaN/Inf values detected
  • Will NOT upload if test produces warnings
  • Requires 100% verification rate for deployment

This ensures only correctly-converted models reach production.

  • create a clean implementation from scratch based on these findings