WavLM-Large for MLX-Swift

Pre-converted weights of microsoft/wavlm-large for the WavLMLargeMLX Swift package.

Looking for the matching vocoder? See gwenn-ha-dev/hifigan-wavlm-mlx — the companion bshall/knn-vc HiFi-GAN port. Together they implement the full kNN-VC pipeline (SSL features → audio) in pure Swift, no Python at runtime.

  • Format: safetensors, float32
  • Size: 1.26 GB, 487 entries, 315.45 M parameters
  • SHA-256: 21867dd6b9755b970676199c88a50b72010fae8b1a87467aa6977825aea9b2b3

What's in this repo

File Purpose
wavlm-large.safetensors Pretrained weights for WavLM-Large.
fixtures/wavlm/*.safetensors Reference forward outputs (sine / noise / mix) for parity testing.
fixtures/wavlm/*.wav The corresponding test waveforms (16 kHz mono).

Difference vs upstream microsoft/wavlm-large

The upstream HF state dict has 488 entries; this one has 487. The single difference: pos_conv_embed.conv ships with weight_norm materialized. Instead of weight_g (norm scalar) + weight_v (direction tensor), we store a single fused weight = weight_g · weight_v / ‖weight_v‖. This is mathematically equivalent at inference (no numerical change) and lets MLX-Swift use a plain Conv1d since it doesn't ship a weight_norm parametrization.

The materialization is a one-line PyTorch op:

from torch.nn.utils.parametrize import remove_parametrizations
remove_parametrizations(
    model.encoder.pos_conv_embed.conv, "weight", leave_parametrized=True
)

Numerical parity vs PyTorch reference

Max absolute difference, float32, on three deterministic fixtures:

Stage sine_440_1s noise_2s mixed_3s
Feature Extractor 2.6e-7 3.6e-7 3.6e-7
Layer 6 3.2e-4 1.7e-4 1.7e-4
Layer 24 (post final-LN) 9.2e-5 6.1e-5 7.8e-5

Usage

See the WavLMLargeMLX Swift package for full integration. Three lines:

let arrays = try SafetensorsLoader.load(url: url)
let model = WavLM()
try model.loadWeights(from: arrays)

To download from this repo:

huggingface-cli download gwenn-ha-dev/wavlm-large-mlx \
    wavlm-large.safetensors --local-dir ./weights

License

Code (the Swift package): MIT.

Weights derive from microsoft/wavlm-large — refer to the upstream model card for the model's own terms.

Citation

@article{Chen2022WavLM,
  title={WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing},
  author={Chen, Sanyuan and Wang, Chengyi and Chen, Zhengyang and Wu, Yu and Liu, Shujie and Chen, Zhuo and Li, Jinyu and Kanda, Naoyuki and Yoshioka, Takuya and Xiao, Xiong and Wu, Jian and Zhou, Long and Ren, Shuo and Qian, Yanmin and Qian, Yao and Zeng, Michael and Wei, Furu},
  journal={IEEE Journal of Selected Topics in Signal Processing},
  year={2022}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gwenn-ha-dev/wavlm-large-mlx

Finetuned
(21)
this model