WavLM-Large for MLX-Swift

Pre-converted weights of microsoft/wavlm-large for the WavLMLargeMLX Swift package.

Looking for the matching vocoder? See gwenn-ha-dev/hifigan-wavlm-mlx — the companion bshall/knn-vc HiFi-GAN port. Together they implement the full kNN-VC pipeline (SSL features → audio) in pure Swift, no Python at runtime.

Format: safetensors, float32
Size: 1.26 GB, 487 entries, 315.45 M parameters
SHA-256: 21867dd6b9755b970676199c88a50b72010fae8b1a87467aa6977825aea9b2b3

What's in this repo

File	Purpose
`wavlm-large.safetensors`	Pretrained weights for WavLM-Large.
`fixtures/wavlm/*.safetensors`	Reference forward outputs (sine / noise / mix) for parity testing.
`fixtures/wavlm/*.wav`	The corresponding test waveforms (16 kHz mono).

Difference vs upstream `microsoft/wavlm-large`

The upstream HF state dict has 488 entries; this one has 487. The single difference: pos_conv_embed.conv ships with weight_norm materialized. Instead of weight_g (norm scalar) + weight_v (direction tensor), we store a single fused weight = weight_g · weight_v / ‖weight_v‖. This is mathematically equivalent at inference (no numerical change) and lets MLX-Swift use a plain Conv1d since it doesn't ship a weight_norm parametrization.

The materialization is a one-line PyTorch op:

from torch.nn.utils.parametrize import remove_parametrizations
remove_parametrizations(
    model.encoder.pos_conv_embed.conv, "weight", leave_parametrized=True
)

Numerical parity vs PyTorch reference

Max absolute difference, float32, on three deterministic fixtures:

Stage	sine_440_1s	noise_2s	mixed_3s
Feature Extractor	2.6e-7	3.6e-7	3.6e-7
Layer 6	3.2e-4	1.7e-4	1.7e-4
Layer 24 (post final-LN)	9.2e-5	6.1e-5	7.8e-5

Usage

See the WavLMLargeMLX Swift package for full integration. Three lines:

let arrays = try SafetensorsLoader.load(url: url)
let model = WavLM()
try model.loadWeights(from: arrays)

To download from this repo:

huggingface-cli download gwenn-ha-dev/wavlm-large-mlx \
    wavlm-large.safetensors --local-dir ./weights

License

Code (the Swift package): MIT.

Weights derive from microsoft/wavlm-large — refer to the upstream model card for the model's own terms.

Citation

@article{Chen2022WavLM,
  title={WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing},
  author={Chen, Sanyuan and Wang, Chengyi and Chen, Zhengyang and Wu, Yu and Liu, Shujie and Chen, Zhuo and Li, Jinyu and Kanda, Naoyuki and Yoshioka, Takuya and Xiao, Xiong and Wu, Jian and Zhou, Long and Ren, Shuo and Qian, Yanmin and Qian, Yao and Zeng, Michael and Wei, Furu},
  journal={IEEE Journal of Selected Topics in Signal Processing},
  year={2022}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gwenn-ha-dev/wavlm-large-mlx

Base model

microsoft/wavlm-large

Finetuned

(21)

this model